RAG Implemention without open AI

 ðŸš€ Built a Retrieval-Augmented Generation (RAG) System with Hugging Face Transformers! (without OpenAI with free source )

I recently implemented a RAG pipeline using open-source tools — without relying on OpenAI APIs. This project combines retrieval and generation to deliver grounded, context-aware answers from company documents.
🔎 What is RAG?
RAG connects a retriever (to fetch relevant knowledge base chunks) with a generator (to produce natural language responses). This ensures answers are accurate and backed by real data.
🧩 How My System Works
SentenceTransformer (all-MiniLM-L6-v2) → embeddings for documents and queries.
ChromaDB → vector database for fast similarity search.
Document Chunking → splits knowledge.txt into overlapping word chunks for context continuity.
Retrieval → queries are embedded and matched against stored chunks.
FLAN-T5 → Hugging Face generator produces answers using retrieved context.
💻 Source Code (Core Implementation)
python
import os
import chromadb
from sentence_transformers import SentenceTransformer
from transformers import pipeline

# Initialize Clients
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
chroma_client = chromadb.Client()
generator = pipeline("text-generation", model="google/flan-t5-base")

# Chunking Function
def load_and_chunk_document(filepath, chunk_size_words=80, overlap_words=20):
if not os.path.exists(filepath):
raise FileNotFoundError(f"Could not find the file: {filepath}")
with open(filepath, "r", encoding="utf-8") as f:
text = f.read()
words = text.split()
chunks, i = [], 0
while i < len(words):
chunk_words = words[i : i + chunk_size_words]
chunks.append(" ".join(chunk_words))
i += (chunk_size_words - overlap_words)
return chunks

# Ingestion
file_path = os.path.join(os.path.dirname(__file__), "knowledge.txt")
document_chunks = load_and_chunk_document(file_path)
collection = chroma_client.create_collection(name="omnicorp_policies")
for idx, chunk in enumerate(document_chunks):
vector = embedding_model.encode(chunk).tolist()
collection.add(ids=[f"chunk_{idx}"], embeddings=[vector], documents=[chunk])

# Retrieval
def retrieve_relevant_context(query, limit=2):
query_vector = embedding_model.encode(query).tolist()
results = collection.query(query_embeddings=[query_vector], n_results=limit)
return results["documents"][0]

# Generation
def generate_rag_answer(query):
context_chunks = retrieve_relevant_context(query)
context_text = "\n---\n".join(context_chunks)
prompt = f"""
You are a helpful company assistant.
Use ONLY the provided context below to answer the user's question.
If you do not know the answer, say:
"I cannot find that information in the company guidelines."

Context:
{context_text}

User Question:
{query}
"""
return generator(prompt, max_new_tokens=256)[0]["generated_text"]

Comments