Introduction
In the age of artificial intelligence, chatbots are becoming essential tools for automating tasks and providing quick access to information. One such innovative solution is the Local PDF RAG Chatbot, which uses Retrieval-Augmented Generation (RAG) to answer questions based on the content of PDFs.
In this step-by-step tutorial, you will learn how to build a Local PDF RAG Chatbot using Streamlit, Ollama, FAISS, and PyMuPDF. By the end of this guide, you’ll have a fully functional chatbot capable of answering questions by extracting and analyzing text from PDF files, all running locally on your system.
What Youβll Learn in This Guide
Prerequisites
Before you begin, make sure you have the following:
- Python 3.x installed on your system.
- Basic knowledge of Python and Streamlit.
- A running instance of Ollama (for generating embeddings).
Project Structure: Organizing Your Files
Hereβs how your project should be structured:
π rag-chatbot/
β
βββ app.py # Main Streamlit app
βββ rag_utils.py # Handles PDF text splitting, embedding, and search
βββ cache_utils.py # Manages FAISS index and chunk caching
βββ docs/ # Stores uploaded PDF files
βββ .cache_xxx.pkl # Cached embeddings per uploaded file
Setting Up Your Development Environment
Create and Activate a Python Virtual Environment
To begin building your Local PDF RAG Chatbot, first, create and activate a virtual environment. This helps in isolating the project dependencies:
# Create a virtual environment
python3 -m venv rag-chatbot-env
# Activate the virtual environment
source rag-chatbot-env/bin/activate
When activated, youβll see the environment name appear in your terminal prompt.
Install the Required Python Libraries
Once the environment is ready, install the necessary libraries:
pip install pymupdf faiss-cpu streamlit requests numpy
- PyMuPDF: For extracting text from PDFs.
- FAISS: For efficient similarity search and indexing of embeddings.
- Streamlit: To create the interactive app for your Local PDF RAG Chatbot.
- Requests and Numpy: For handling API requests and numerical operations.
Install and run Ollama
Ollama lets you run LLMs and embedding models on your own machine.
Check out our detailed guide to learn how to set up and install Ollama locally.
Pull the required models:
ollama pull llama3.2:latest
ollama pull nomic-embed-text:latest
Make sure the Ollama service is running:
ollama serve
By default, it runs on http://localhost:11434.
How the Local PDF RAG Chatbot Works
Hereβs a simplified breakdown of how your chatbot operates:
- PDF Upload β You upload a document using the UI.
- Text Splitting β It splits the PDF into smaller overlapping chunks.
- Embedding β Each chunk is embedded using
nomic-embed-text
via Ollama. - Indexing β All chunks are stored in FAISS for similarity search.
- Querying β When you ask a question, it finds the top matching chunks.
- Response β These chunks + your question are sent to an LLM (like
llama3
) to get a smart, contextual answer.
Step-by-Step Guide to Building Your Local PDF RAG Chatbot
1. Create the Utility Files
Letβs start by creating the utility files that will manage PDF processing, embedding, and search functionality.
rag_utils.py
This file handles the logic for extracting text from PDFs, splitting it, embedding it with Ollama, and searching through it using FAISS.
# rag_utils.py import fitz # PyMuPDF import requests import numpy as np import faiss def load_pdf_chunks(filepath, chunk_size=300, chunk_overlap=50): try: doc = fitz.open(filepath) full_text = "\n".join(page.get_text() for page in doc) return split_text(full_text, chunk_size, chunk_overlap) except Exception as e: raise RuntimeError(f" Error reading or splitting PDF: {e}") def split_text(text, chunk_size=300, chunk_overlap=50): """ Splits the input text into overlapping chunks. """ words = text.split() chunks = [] start = 0 while start < len(words): end = min(start + chunk_size, len(words)) chunk = " ".join(words[start:end]) if chunk.strip(): chunks.append(chunk) start += chunk_size - chunk_overlap # Move window forward return chunks def embed_with_ollama(chunks): try: embeddings = [] for i, chunk in enumerate(chunks): if not chunk.strip() or len(chunk) > 1000: continue print(f"Embedding chunk {i+1}/{len(chunks)}") response = requests.post( "http://localhost:11434/api/embeddings", json={"model": "nomic-embed-text", "prompt": chunk} ) if response.status_code != 200: print(" Embedding error:", response.text) response.raise_for_status() data = response.json() embeddings.append(data["embedding"]) return np.array(embeddings) except Exception as e: raise RuntimeError(f" Embedding failed: {e}") def build_faiss_index(embeddings): try: dimension = embeddings.shape[1] index = faiss.IndexFlatL2(dimension) index.add(embeddings) return index except Exception as e: raise RuntimeError(f" FAISS index creation failed: {e}") def get_top_chunks(query, chunks, index, k=4): try: query_embedding = embed_with_ollama([query])[0] _, I = index.search(np.array([query_embedding]), k) return [chunks[i] for i in I[0]] except Exception as e: raise RuntimeError(f" Search failed: {e}")
cache_utils.py
This file is responsible for saving and loading the FAISS index and PDF chunks to avoid reprocessing every time the app is run.
# cache_utils.py import os import pickle def get_cache_path(filepath): base = os.path.basename(filepath) return f".cache_{base}.pkl" def save_index(chunks, index, filepath): try: cache_path = get_cache_path(filepath) with open(cache_path, "wb") as f: pickle.dump((chunks, index), f) except Exception as e: raise RuntimeError(f"Error saving cache: {e}") def load_index(filepath): try: cache_path = get_cache_path(filepath) if os.path.exists(cache_path): with open(cache_path, "rb") as f: return pickle.load(f) return None, None except Exception as e: raise RuntimeError(f"Error loading cache: {e}")
2. Create the Main Streamlit App
Now, letβs create the main application that allows users to interact with the Local PDF RAG Chatbot.
# app.py import streamlit as st import tempfile import requests import json from rag_utils import load_pdf_chunks, embed_with_ollama, build_faiss_index, get_top_chunks from cache_utils import save_index, load_index st.set_page_config(page_title="π RAG Chatbot", layout="centered") st.title("π Chat with your PDF (Ollama + FAISS)") # Initialize session state if "history" not in st.session_state: st.session_state.history = [] if "rag_chunks" not in st.session_state: st.session_state.rag_chunks = None if "index" not in st.session_state: st.session_state.index = None # Upload and process PDF uploaded_file = st.file_uploader("π Upload a PDF", type=["pdf"]) if uploaded_file is not None: try: with tempfile.NamedTemporaryFile(delete=False) as tmp: tmp.write(uploaded_file.read()) filepath = tmp.name chunks, index = load_index(filepath) if chunks is None or index is None: with st.spinner("π Processing PDF..."): chunks = load_pdf_chunks(filepath) embeddings = embed_with_ollama(chunks) index = build_faiss_index(embeddings) save_index(chunks, index, filepath) st.session_state.rag_chunks = chunks st.session_state.index = index st.success("β PDF is ready for chatting!") except Exception as e: st.error(f" Error while processing the PDF: {e}") # Display full chat history for msg in st.session_state.history: with st.chat_message(msg["role"]): st.markdown(msg["content"]) # Chat input if st.session_state.rag_chunks and st.session_state.index: query = st.chat_input("π¬ Ask a question about the PDF") if query: st.session_state.history.append({"role": "user", "content": query}) with st.chat_message("user"): st.markdown(query) try: top_chunks = get_top_chunks(query, st.session_state.rag_chunks, st.session_state.index) system_prompt = ( "You are an assistant answering questions based on the following PDF content:\n\n" + "\n\n".join(top_chunks) ) full_response = "" with st.chat_message("assistant"): chat_box = st.empty() response = requests.post( "http://localhost:11434/api/chat", json={ "model": "llama3.2", "messages": [ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], "stream": True }, stream=True ) for line in response.iter_lines(): if line: try: data = json.loads(line.decode("utf-8")) token = data.get("message", {}).get("content", "") full_response += token chat_box.markdown(full_response + "β") except Exception as stream_error: st.error(f" Stream error: {stream_error}") chat_box.markdown(full_response) st.session_state.history.append({"role": "assistant", "content": full_response}) except Exception as e: st.error(f" Error generating response: {e}")
Running Your Local PDF RAG Chatbot
Hereβs how to run everything together:
Start the Ollama model:
ollama run llama3.2
Run your Streamlit app:
streamlit run app.py
Visit http://localhost:8501 in your browser, upload a PDF, and start chatting!
Conclusion
Youβve successfully built a Local PDF RAG Chatbot using Streamlit, Ollama, FAISS, and PyMuPDF. This RAG chatbot is capable of processing uploaded PDFs, extracting relevant content, and providing intelligent responses to user queries, all in a fully localized environment.
This tutorial has provided a detailed, step-by-step guide to creating a Local PDF RAG Chatbot, which can be a powerful tool for various use cases, such as document analysis, automated support systems, and content search.
For more insightful tutorials, visit our Tech Blogs and explore the latest in Laravel, AI, and Vue.js development!