Coding Example of Retrieval Augmented Generation (RAG)

In this lesson, we will see a coding example of RAG and upload our document. We will summarize the same document using RAG. The code will be implemented on Google Colab.

Complete RAG System with PDF upload

Let us see the code to implement on Google Colab:

1. Setup & Installation

First, install required packages:

We have used the following packages above:
  • pypdf/PyMuPDF: PDF text extraction
  • sentence-transformers: For creating embeddings
  • faiss-cpu: Efficient similarity search (CPU version)
  • transformers: For question answering

2. Uploading Your PDF

The code snippet below is for Google Colab, allowing users to upload files from their local machine. Using files.upload(), it opens an upload interface and stores the uploaded files in a dictionary-like object called uploaded.

The filename is extracted with next(iter(uploaded)), which retrieves the first key (filename) from the dictionary.

Finally, it prints a confirmation message displaying the filename being processed, enabling further actions on the uploaded file. It’s particularly handy for working with files like PDFs or datasets within the Colab environment.

Here is the code snippet:

Here, we uploaded our PDF:

Upload PDF with RAG

3. PDF Text Extraction

The below code extracts text from a PDF file using the PyMuPDF library (fitz). The function extract_text_from_pdf() takes the PDF’s file path as input, opens the file, and iterates through its pages to collect text from each page using the get_text() method. All the extracted text is then combined into a single string, with pages separated by spaces.

The function is called with the uploaded PDF file (pdf_filename), and the length of the extracted text in characters is printed to indicate successful extraction. This script is useful for text analysis or data retrieval from PDF documents.

Here is the code snippet:

4. Text Chunking

This code splits a long text into smaller, manageable chunks using the chunk_text() function, which includes an overlap of words between chunks for continuity. The function takes the input text and splits it into words, iterating through them in steps of chunk_size – overlap.

Each chunk contains a specified number of words (chunk_size), ensuring some overlap between consecutive chunks. These chunks are stored in a list and returned. When applied to pdf_text, the code creates text chunks and prints the total number of chunks generated, useful for tasks like text analysis or language processing.

5. Embeddings & FAISS Index

The below code uses the SentenceTransformer model to generate vector embeddings for text chunks (document_chunks), enabling semantic understanding and efficient search. The embedding model, all-MiniLM-L6-v2, creates numerical representations of each chunk, which are then stored in a FAISS index. FAISS is an efficient library for similarity search, and here it uses inner product similarity (IndexFlatIP) for comparisons. The code calculates the embedding dimensions (dimension) and adds the embeddings to the index, creating a searchable structure for tasks like document retrieval or semantic similarity analysis.

Here is the code snippet:

6. Question Answering Setup

This code uses the transformers library to create a question-answering pipeline with a lightweight, CPU-friendly model called distilbert-base-case-distilled-squad. The pipeline is designed to take a question and context as input and provide the best answer based on the context. By specifying the model and tokenizer, the code ensures efficient processing on devices without GPUs. This setup is ideal for extracting precise answers from given text or documents using natural language understanding.

Here is the code snippet:

7. Complete RAG Pipeline

This code defines a rag_pipeline() function that combines retrieval and answer generation for a given query. First, it encodes the query into an embedding using the embedding_model and searches the FAISS index to find k relevant text chunks (document_chunks) based on semantic similarity. These retrieved chunks are joined to form a context for answering the query. Then, the context and query are passed to the qa_pipeline, which generates an answer using a question-answering model.

The function also displays the retrieved context snippets and the generated answer, providing clear insights into the process. This setup is valuable for tasks like document-based question answering or search engines.

Here is the code snippet:

8. Testing the System

This part of the code demonstrates how to use the RAG system to query your PDF document. The first example asks about the main topics in the document by calling rag_pipeline(query), which retrieves relevant chunks (default: 3) and generates an answer.

The second example requests a summary of key points and increases the retrieved chunks to 5 (k=5) to provide broader context for better summarization. Both queries show the retrieved context snippets followed by a generated answer, making it easy to understand where the response comes from. This is the interactive part where you test the system with your own questions.

Here is the code snippet:

Output

Key features of this implementation

  1. CPU-Compatible: Uses models that work well without GPU
  2. Efficient Retrieval: FAISS for fast similarity search
  3. Modular Design: Easy to replace components
  4. Memory Efficient: Processes documents in chunks
  5. Interactive: Clear output formatting

Troubleshooting Tips

If you encounter issues:

  1. Restart the runtime if you change GPU/CPU settings
  2. Reduce chunk size if memory errors occur
  3. Try smaller models if performance is slow:
    • 'multi-qa-MiniLM-L6-cos-v1' for embeddings
    • 'bert-large-uncased-whole-word-masking-finetuned-squad' for QA

This implementation provides a complete, working RAG system that you can use with any PDF document in Google Colab without GPU requirements.


If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.


For Videos, Join Our YouTube Channel: Join Now


Read More:

RAG Tutorial
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment