LangChain for RAG – Final Coding Example

For our example, we have implemented a local Retrieval-Augmented Generation (RAG) system for PDF documents. This is a comprehensive implementation that uses several key libraries to create a question-answering system based on the content of uploaded PDFs.

The code snippets in the previous lesson were displayed as the process of LangChain. We will now collaborate it with our complete code.

Here is what our code does:

  1. Setup and Dependencies: The code installs necessary packages including LangChain, PyPDF, sentence-transformers, and FAISS.
  2. Document Processing Pipeline:
    • PDF uploading via Google Colab’s interface
    • Document loading and parsing using PyPDFLoader
    • Text splitting with RecursiveCharacterTextSplitter to create manageable chunks
  3. Vector Database Creation:
    • Uses sentence-transformers (specifically “all-MiniLM-L6-v2”) to generate embeddings
    • Creates a FAISS vector store from the document chunks
  4. Local LLM Integration:
    • Sets up a Hugging Face model (default is “google/flan-t5-base”). It is publicly accessible on Hugging Face. It does not require an access token for basic usage, as it’s an openly available model. You can easily load it in your project using libraries like Hugging Face’s transformers without authentication.
    • Creates a text generation pipeline for answering questions
  5. Question-Answering System:
    • Establishes a RetrievalQA chain that connects the vector store and LLM
    • When given a question, it retrieves relevant document chunks and generates an answer

The implementation is organized as a class (LocalRAGSystem) with methods for each step of the process, making it modular and easy to understand.

Step 1: Install the necessary packages

This code installs and imports necessary packages for a natural language processing (NLP) project.

  • It uses !pip install to install packages from the Python Package Index (PyPI).
  • The installed packages include langchain, sentence-transformers, and faiss-cpu.
  • The code then imports various modules and classes from these packages.
  • The imported modules enable functionalities such as text splitting, document loading, embeddings, and language modeling.

Here is the code snippet:

Step 2: Set up logging

The given code snippet is used to configure the logging module in Python. Here’s a breakdown of what each line does:

The line logging.basicConfig(level=logging.INFO)sets the basic configuration for the logging module.

  • logging.basicConfig(): This function is used to configure the logging module.
  • level=logging.INFO: This sets the logging level to INFO. This means that all log messages with a level of INFO or higher (WARNING, ERROR, CRITICAL) will be processed.

The line logger = logging.getLogger(__name__)creates a logger instance:

  • logging.getLogger(): This function returns a logger instance.
  • __name__: This is a built-in Python variable that holds the name of the current module. Passing __name__ to getLogger() creates a logger instance with a name that corresponds to the current module.

Here is the code snippet:

Step 3: Create a class representing a local RAG (Retrieval-Augmented Generation) system.

Within this class, we will add all our functions to create a RAG mode:

Step 4: Initialize the LocalRAGSystem object

The following are the attributes:

  • documents (list): A list to store the loaded documents.
  • vector_store: A vector store to store the document embeddings.
  • embeddings: An embeddings model to generate embeddings for the documents.
  • llm: A local language model to generate answers.
  • qa_chain: A QA chain to retrieve answers from the vector store

Here is the code snippet:

Step 5: Upload PDFs from the local machine to Colab

This code defines a method upload_pdfs to upload PDF files from a local machine to Google Colab.

  • It uses the files.upload() function from Google Colab.
  • The uploaded file names are logged and returned.
  • Any exceptions during upload are caught, logged, and re-raised.
  • The method is part of a class and uses a logger for logging messages.
  • It returns a list, i.e., a list of uploaded PDF paths.

Here is the code snippet:

Step 6: Load and parse the PDF documents

This code defines a method load_documents to load and parse PDF documents from given file paths.

  • It iterates through each PDF file path, attempts to load the document using PyPDFLoader, and appends the loaded pages to the self.documents list.
  • Any exceptions during loading are caught, logged, and the method continues with the next file.
  • The number of loaded pages for each file and the total number of pages are logged.
  • The method updates the self.documents list with the loaded document pages.

The following is the code snippet:

Step 7: Split documents into chunks

This code splits loaded documents into chunks using a RecursiveCharacterTextSplitter.

  • It takes optional parameters chunk_size (default 1000) and chunk_overlap (default 200).
  • The splitter divides the documents into chunks, which are stored in self.document_chunks.
  • The number of chunks created is logged.
  • Any exceptions during splitting are caught, logged, and re-raised.

Here is the code snippet:

Step 8: Set up the embedding model

This code sets up an embedding model using Hugging Face’s Transformers.

  • It takes an optional model_name parameter (defaulting to a specific MiniLM model).
  • The chosen model is used to create a HuggingFaceEmbeddings instance.
  • The setup process is logged, including the model name used.
  • Any exceptions during setup are caught, logged, and re-raised.

Here is the code snippet:

Step 9:  Create a vector store from the document chunks

This code creates a vector store from document chunks using the FAISS library.

  • It uses the previously set up embeddings model to generate vectors from the document chunks.
  • The FAISS.from_documents method creates the vector store.
  • The creation process is logged with a success message.
  • Any exceptions during creation are caught, logged, and re-raised.

Here is the code snippet:

Step 10: Set up a local LLM using Hugging Face

This code sets up a local Large Language Model (LLM) using Hugging Face’s Transformers.

  • It loads a pre-trained model and tokenizer using the specified model_id.
  • A text-to-text generation pipeline is created with the loaded model and tokenizer.
  • The pipeline is wrapped in a HuggingFacePipeline instance and stored in self.llm.
  • The setup process is logged, and any exceptions are caught, logged, and re-raised.

Here is the code snippet:

Step 11: Create a QA chain using the vector store and LLM

This code sets up a Question Answering (QA) chain using the vector store and Large Language Model (LLM).

  • It creates a RetrievalQA instance with the specified LLM and vector store as retriever.
  • The retriever is configured to return the top k results (defaulting to 3).
  • The QA chain setup process is logged with the value of k.
  • Any exceptions during setup are caught, logged, and re-raised.

Here is the code snippet:

Step 12: Answer a question using the RAG system

This code defines a method to answer a question using a Retrieval-Augmented Generator (RAG) system.

  • It takes a question as input and uses the qa_chain to generate an answer.
  • The question and answer are logged for tracking purposes.
  • The answer is returned by the method.
  • Any exceptions during the answering process are caught, logged, and re-raised.

Here is the code snippet:

Step 13: Run the complete setup process

This code defines a method run_setup to execute the complete setup process for a Retrieval-Augmented Generator (RAG) system.

  • It calls various methods in sequence to upload PDFs, load documents, split documents, set up embeddings, create a vector store, set up a local LLM, and set up a QA chain.
  • The method takes optional parameters to customize the setup process.
  • The completion of the setup process is logged.
  • Any exceptions during setup are caught, logged, and re-raised.

Here is the code snippet:

Step 14: Example Usage

This code creates an instance of the LocalRAGSystem class and runs its setup process.

  • The setup process configures the system with specific parameters, such as chunk size and language model.
  • After setup, the code asks two questions to the system and prints the answers.
  • The questions demonstrate the system’s ability to understand the main topic and summarize key points from the documents.
  • The answers are generated using the Retrieval-Augmented Generator (RAG) system.

Here is the code snippet:

Let us run the above code on Google Colab. After running the following upload button will be visible:

LangChain with RAG upload files

Upload the PDFs you want to use. We uploaded two PDFs from our local system:

LangChain with RAG PDF files uploaded

Here, it is showing us the answer to the two prompts we mentioned in the code:

LangChain with RAG output with prompt responses

Note 1: It is showing us the following warning:

do_sample is set to False. However, temperature is set to 0.7 — this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.

Here is what we can do:

  • Use do_sample=True if you want the model to generate diverse and creative responses. This is useful for tasks like creative writing or open-ended question answering.
  • Use temperature=0 or remove it if you want deterministic and reproducible responses. This is useful for tasks where consistency is important, such as factual question answering.

Note 2: Above, we did not create the token since we are accessing public models for our basic example.

As said above, the secret HF_TOKEN does not exist in our Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab, and restart your session.

Authentication is recommended but still optional to access public models or datasets.

Let us understand this more.

When a Hugging Face token is required

Hugging Face access tokens are only required in specific cases:

  • When accessing private or gated models
  • When using the Inference API for remote model hosting
  • When downloading models with usage restrictions
  • When using some enterprise features

For this local RAG implementation, all processing happens on your machine after downloading the models once, with no need to authenticate with Hugging Face’s servers for continued use. You don’t need a Hugging Face access token in the above implementation because it’s using publicly available models and resources that don’t require authentication.

When the code loads models with AutoTokenizer.from_pretrained() and AutoModelForSeq2SeqLM.from_pretrained(), it’s accessing:

  1. The “google/flan-t5-base” model for text generation
  2. The “sentence-transformers/all-MiniLM-L6-v2” model for embeddings

Both of these are open-source models that Hugging Face hosts for public use without authentication. They’re downloaded automatically when you run the code.

Key points about the above code

1. Error Handling: Error handling is added for common issues, such as:

  • Failed PDF uploads.
  • Incorrect model names or paths.
  • Issues with document splitting or embedding generation.

2. Dynamic Chunk Size:

  • Allows the user to specify chunk_sizeand chunk_overlap dynamically when calling run_setup. This makes the system more flexible for different types of documents.

3. Customizable Search Parameters:

  • Allow the user to customize the search_kwargs(e.g., k value) in the setup_qa_chain This can improve the relevance of retrieved documents.

If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.


For Videos, Join Our YouTube Channel: Join Now


Read More:

LangChain with RAG - Process
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment

Discover more from Studyopedia

Subscribe now to keep reading and get access to the full archive.

Continue reading