Build an Amazing Local PDF RAG Chatbot with Streamlit and Ollama

Introduction

In the age of artificial intelligence, chatbots are becoming essential tools for automating tasks and providing quick access to information. One such innovative solution is the Local PDF RAG Chatbot, which uses Retrieval-Augmented Generation (RAG) to answer questions based on the content of PDFs.

In this step-by-step tutorial, you will learn how to build a Local PDF RAG Chatbot using Streamlit, Ollama, FAISS, and PyMuPDF. By the end of this guide, you’ll have a fully functional chatbot capable of answering questions by extracting and analyzing text from PDF files, all running locally on your system.

What You’ll Learn in This Guide

How to create a Local PDF RAG Chatbot with Streamlit for interactive chatting.
Use PyMuPDF for extracting text from PDF files.
Generate embeddings using Ollama and index them with FAISS for fast search and retrieval.
Set up a local environment to process and interact with PDFs through a chatbot.

Prerequisites

Before you begin, make sure you have the following:

Python 3.x installed on your system.
Basic knowledge of Python and Streamlit.
A running instance of Ollama (for generating embeddings).

Project Structure: Organizing Your Files

Here’s how your project should be structured:


📁 rag-chatbot/
│
├── app.py                # Main Streamlit app
├── rag_utils.py          # Handles PDF text splitting, embedding, and search
├── cache_utils.py        # Manages FAISS index and chunk caching
├── docs/                 # Stores uploaded PDF files
└── .cache_xxx.pkl        # Cached embeddings per uploaded file

Setting Up Your Development Environment

Create and Activate a Python Virtual Environment

To begin building your Local PDF RAG Chatbot, first, create and activate a virtual environment. This helps in isolating the project dependencies:

# Create a virtual environment
python3 -m venv rag-chatbot-env
# Activate the virtual environment
source rag-chatbot-env/bin/activate

When activated, you’ll see the environment name appear in your terminal prompt.

Install the Required Python Libraries

Once the environment is ready, install the necessary libraries:

pip install pymupdf faiss-cpu streamlit requests numpy

PyMuPDF: For extracting text from PDFs.
FAISS: For efficient similarity search and indexing of embeddings.
Streamlit: To create the interactive app for your Local PDF RAG Chatbot.
Requests and Numpy: For handling API requests and numerical operations.

Install and run Ollama

Ollama lets you run LLMs and embedding models on your own machine.

Check out our detailed guide to learn how to set up and install Ollama locally.

Pull the required models:

ollama pull llama3.2:latest
 ollama pull nomic-embed-text:latest

Make sure the Ollama service is running:

ollama serve

By default, it runs on http://localhost:11434.

How the Local PDF RAG Chatbot Works

Here’s a simplified breakdown of how your chatbot operates:

PDF Upload – You upload a document using the UI.
Text Splitting – It splits the PDF into smaller overlapping chunks.
Embedding – Each chunk is embedded using nomic-embed-text via Ollama.
Indexing – All chunks are stored in FAISS for similarity search.
Querying – When you ask a question, it finds the top matching chunks.
Response – These chunks + your question are sent to an LLM (like llama3) to get a smart, contextual answer.

Step-by-Step Guide to Building Your Local PDF RAG Chatbot

1. Create the Utility Files

Let’s start by creating the utility files that will manage PDF processing, embedding, and search functionality.

rag_utils.py

This file handles the logic for extracting text from PDFs, splitting it, embedding it with Ollama, and searching through it using FAISS.

# rag_utils.py

import fitz  # PyMuPDF
import requests
import numpy as np
import faiss

def load_pdf_chunks(filepath, chunk_size=300, chunk_overlap=50):
    try:
        doc = fitz.open(filepath)
        full_text = "\n".join(page.get_text() for page in doc)
        return split_text(full_text, chunk_size, chunk_overlap)
    except Exception as e:
        raise RuntimeError(f" Error reading or splitting PDF: {e}")

def split_text(text, chunk_size=300, chunk_overlap=50):
    """
    Splits the input text into overlapping chunks.
    """
    words = text.split()
    chunks = []
    start = 0

    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk = " ".join(words[start:end])
        if chunk.strip():
            chunks.append(chunk)
        start += chunk_size - chunk_overlap  # Move window forward

    return chunks

def embed_with_ollama(chunks):
    try:
        embeddings = []

        for i, chunk in enumerate(chunks):
            if not chunk.strip() or len(chunk) > 1000:
                continue

            print(f"Embedding chunk {i+1}/{len(chunks)}")

            response = requests.post(
                "http://localhost:11434/api/embeddings",
                json={"model": "nomic-embed-text", "prompt": chunk}
            )

            if response.status_code != 200:
                print(" Embedding error:", response.text)
                response.raise_for_status()

            data = response.json()
            embeddings.append(data["embedding"])

        return np.array(embeddings)

    except Exception as e:
        raise RuntimeError(f" Embedding failed: {e}")


def build_faiss_index(embeddings):
    try:
        dimension = embeddings.shape[1]
        index = faiss.IndexFlatL2(dimension)
        index.add(embeddings)
        return index
    except Exception as e:
        raise RuntimeError(f" FAISS index creation failed: {e}")

def get_top_chunks(query, chunks, index, k=4):
    try:
        query_embedding = embed_with_ollama([query])[0]
        _, I = index.search(np.array([query_embedding]), k)
        return [chunks[i] for i in I[0]]
    except Exception as e:
        raise RuntimeError(f" Search failed: {e}")

cache_utils.py

This file is responsible for saving and loading the FAISS index and PDF chunks to avoid reprocessing every time the app is run.

# cache_utils.py

import os
import pickle

def get_cache_path(filepath):
    base = os.path.basename(filepath)
    return f".cache_{base}.pkl"

def save_index(chunks, index, filepath):
    try:
        cache_path = get_cache_path(filepath)
        with open(cache_path, "wb") as f:
            pickle.dump((chunks, index), f)
    except Exception as e:
        raise RuntimeError(f"Error saving cache: {e}")

def load_index(filepath):
    try:
        cache_path = get_cache_path(filepath)
        if os.path.exists(cache_path):
            with open(cache_path, "rb") as f:
                return pickle.load(f)
        return None, None
    except Exception as e:
        raise RuntimeError(f"Error loading cache: {e}")

2. Create the Main Streamlit App

Now, let’s create the main application that allows users to interact with the Local PDF RAG Chatbot.

# app.py
import streamlit as st
import tempfile
import requests
import json

from rag_utils import load_pdf_chunks, embed_with_ollama, build_faiss_index, get_top_chunks
from cache_utils import save_index, load_index

st.set_page_config(page_title="📚 RAG Chatbot", layout="centered")
st.title("📚 Chat with your PDF (Ollama + FAISS)")

# Initialize session state
if "history" not in st.session_state:
    st.session_state.history = []

if "rag_chunks" not in st.session_state:
    st.session_state.rag_chunks = None
if "index" not in st.session_state:
    st.session_state.index = None

# Upload and process PDF
uploaded_file = st.file_uploader("📁 Upload a PDF", type=["pdf"])

if uploaded_file is not None:
    try:
        with tempfile.NamedTemporaryFile(delete=False) as tmp:
            tmp.write(uploaded_file.read())
            filepath = tmp.name

        chunks, index = load_index(filepath)

        if chunks is None or index is None:
            with st.spinner("🔍 Processing PDF..."):
                chunks = load_pdf_chunks(filepath)
                embeddings = embed_with_ollama(chunks)
                index = build_faiss_index(embeddings)
                save_index(chunks, index, filepath)

        st.session_state.rag_chunks = chunks
        st.session_state.index = index
        st.success("✅ PDF is ready for chatting!")

    except Exception as e:
        st.error(f" Error while processing the PDF: {e}")

# Display full chat history
for msg in st.session_state.history:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

# Chat input
if st.session_state.rag_chunks and st.session_state.index:
    query = st.chat_input("💬 Ask a question about the PDF")
    if query:
        st.session_state.history.append({"role": "user", "content": query})
        with st.chat_message("user"):
            st.markdown(query)

        try:
            top_chunks = get_top_chunks(query, st.session_state.rag_chunks, st.session_state.index)
            system_prompt = (
                "You are an assistant answering questions based on the following PDF content:\n\n"
                + "\n\n".join(top_chunks)
            )

            full_response = ""
            with st.chat_message("assistant"):
                chat_box = st.empty()

                response = requests.post(
                    "http://localhost:11434/api/chat",
                    json={
                        "model": "llama3.2",
                        "messages": [
                            {"role": "system", "content": system_prompt},
                            {"role": "user", "content": query}
                        ],
                        "stream": True
                    },
                    stream=True
                )

                for line in response.iter_lines():
                    if line:
                        try:
                            data = json.loads(line.decode("utf-8"))
                            token = data.get("message", {}).get("content", "")
                            full_response += token
                            chat_box.markdown(full_response + "▌")
                        except Exception as stream_error:
                            st.error(f" Stream error: {stream_error}")
                chat_box.markdown(full_response)

            st.session_state.history.append({"role": "assistant", "content": full_response})

        except Exception as e:
            st.error(f" Error generating response: {e}")

Running Your Local PDF RAG Chatbot

Here’s how to run everything together:

Start the Ollama model:

ollama run llama3.2

Run your Streamlit app:

streamlit run app.py

Visit http://localhost:8501 in your browser, upload a PDF, and start chatting!

Conclusion

You’ve successfully built a Local PDF RAG Chatbot using Streamlit, Ollama, FAISS, and PyMuPDF. This RAG chatbot is capable of processing uploaded PDFs, extracting relevant content, and providing intelligent responses to user queries, all in a fully localized environment.

This tutorial has provided a detailed, step-by-step guide to creating a Local PDF RAG Chatbot, which can be a powerful tool for various use cases, such as document analysis, automated support systems, and content search.

For more insightful tutorials, visit our Tech Blogs and explore the latest in Laravel, AI, and Vue.js development!