What is Glutton

Glutton is a RAG (Retrieval-Augmented Generation) backend system for document ingestion and semantic search. It processes documents (PDFs, DOCX, text files), converts them into semantic embeddings, stores them in a vector database, and allows you to query them using natural language.

Why I Built It

I wanted to understand how context retrieval could be made more efficient. After researching RAG, I decided to build a simple pipeline to get a better understanding of how things work under the hood. This project also gave me a chance to experiment with Spec Driven Design using GitHub's spec-kit.

The Architecture

While searching for a solution that would allow for loose scaffolding and easily swappable pieces, I came across hexagonal architecture (ports and adapters pattern). It created the perfect mental model for me to understand how to design a system around core business logic.

The codebase is structured into three layers:

Domain Layer: Pure business logic, text chunking, configuration management
Application Layer: Services that orchestrate the workflow (IngestService, QueryService)
Adapters: Swappable implementations for document loading (Docling), vector storage (Weaviate), and audit logging

This means I can swap out Weaviate for Pinecone, or replace the embedding model, without touching any of the core application logic.

Tech Stack

Python 3.11+ with strict typing (mypy)
Docling for document processing
Sentence Transformers (BGE-small-en-v1.5) for embeddings
Weaviate as the vector database
Pydantic for configuration management
pytest with embedded Weaviate for testing

What I Learned

Building this gave me a clearer picture of how document chunking strategies, embedding dimensions, and vector similarity search all fit together. There's still a lot of depth I haven't explored, like fine-tuning embedding models or multi-modal RAG pipelines, but the goal was to understand the basics and that's what I got out of it.