Build AI PDF Chatbot Using Python FastAPI | RAG Chatbot with Local LLM (No OpenAI API)

AI Prompt for Building the PDF AI Chatbot System

In this tutorial, we build a fully self-hosted AI chatbot that can answer questions from uploaded PDF documents. The entire system can be generated using an AI coding assistant by simply using the prompt below. You can copy the prompt and paste it into any AI coding tool to automatically generate the complete backend project.


You are a senior AI engineer and Python backend architect.

Your task is to build a fully custom AI chatbot system using Python and FastAPI that can answer questions from uploaded PDF documents.

Important Requirements

* Do NOT use any paid APIs (OpenAI, Claude, Gemini etc.)
* The system must be completely self-hosted and open-source
* Use local embedding models and vector databases
* The chatbot must retrieve answers from uploaded PDFs.

Technology Stack

* Backend Framework: FastAPI
* Language: Python
* Vector Database: FAISS or ChromaDB
* Embedding Model: sentence-transformers (local)
* LLM: Local LLM (Llama / Mistral / GPT4All / Ollama supported models)
* PDF Processing: PyPDF / pdfplumber
* Text Chunking: RecursiveCharacterTextSplitter
* RAG Architecture (Retrieval Augmented Generation)

System Architecture

1. User uploads multiple PDF files

2. PDFs are parsed and text is extracted

3. Extracted text is split into smaller chunks

4. Each chunk is converted into embeddings

5. Embeddings are stored in a vector database

6. When a user asks a question:

   * Convert the question into an embedding
   * Retrieve similar chunks from the vector database
   * Send retrieved context to the local LLM
   * LLM generates the final answer

Project Structure

project-root/

app/
main.py

api/
upload.py
chat.py

services/
pdf_processor.py
embedding_service.py
vector_store.py
rag_pipeline.py

models/
schemas.py

utils/
text_splitter.py

uploads/
vector_db/

requirements.txt
README.md

Features Required

1. PDF Upload API

Endpoint:
POST /upload-pdf

Requirements:

* Accept multiple PDF files
* Save uploaded files in uploads folder
* Extract text from each PDF
* Split extracted text into chunks
* Generate embeddings
* Store embeddings in FAISS or ChromaDB vector database


2. Chat API

Endpoint:
POST /chat

Input

{
"question": "What is this document about?"
}

Process

* Convert the question into embedding
* Search vector database
* Retrieve top 5 similar chunks
* Send retrieved context and question to local LLM
* LLM generates final answer

Output

{
"answer": "..."
}


3. Support Unlimited PDFs

* System should allow uploading unlimited PDF documents
* All documents should be indexed inside the vector database
* Chat queries must search across all indexed documents


4. Performance Optimizations

* Async FastAPI endpoints
* Background tasks for PDF indexing
* Embedding caching
* Efficient chunking strategy


5. Local AI Model Support

Use one of the following local LLM systems:

* Ollama
* GPT4All
* Llama.cpp
* HuggingFace Transformers


6. Logging and Error Handling

* Implement proper logging
* Handle API errors properly
* Return meaningful error responses


7. Automatic API Documentation

* FastAPI should automatically generate Swagger documentation
* API docs should be available at:

/docs


8. Code Quality Requirements

* Write production-ready clean code
* Add comments explaining important logic
* Avoid placeholders
* Provide fully working code for every file


Also Include

* requirements.txt
* README.md
* server run instructions
* example curl requests
* docker support

Server Run Command

uvicorn app.main:app --reload