Detailed RAG Pipeline Architecture

Focus: Ingestion -> Vector Retrieval -> Context Fusion -> LLM Synthesis. Key areas: PDF, Confluence, GitHub Markdown.

Use this as a block diagram of the system when explaining architecture.

Preview
Edit this example
Diagram caption: Detailed RAG Pipeline Architecture (Ingestion -> Vector Retrieval -> Context Fusion -> LLM Synthesis) has 4 layers: Data Sources Layer, Ingestion Pipeline (Index Build & Refresh), Retrieval & Generation Pipeline (Online Serving), Supporting Services (Governance, Observability, Security).

Prompt

Generate a detailed RAG (Retrieval-Augmented Generation) pipeline architecture. The flow should start with an Ingestion Pipeline where unstructured documents (PDFs/Wikis) are processed via a Text Chunker and passed to an Embedding Model to generate vectors, stored in a Vector Database (e.g., Pinecone). The Retrieval Pipeline should show a User Query being vectorized, matching against the Vector DB for top-k relevant chunks, and then being fused into a Context Window sent to an LLM (e.g., GPT-4) for final answer synthesis. Include an Orchestration layer (e.g., LangChain) managing this workflow.
Highlights
  • Layer details · Data Sources Layer: Modules include Unstructured Document Sources, Document Connectors.
  • Module responsibilities · Data Sources Layer / Unstructured Document Sources: Provide raw knowledge content; Serve as the system-of-record for documents
  • Layer details · Ingestion Pipeline (Index Build & Refresh): Modules include Document Parser & Cleaner, Text Chunker, Embedding Service, Vector Index Writer.

Overview

Detailed RAG Pipeline Architecture (Ingestion -> Vector Retrieval -> Context Fusion -> LLM Synthesis) has 4 layers: Data Sources Layer, Ingestion Pipeline (Index Build & Refresh), Retrieval & Generation Pipeline (Online Serving), Supporting Services (Governance, Observability, Security).

Layer details

Show all (4)
  • Data Sources Layer: Modules include Unstructured Document Sources, Document Connectors.
  • Ingestion Pipeline (Index Build & Refresh): Modules include Document Parser & Cleaner, Text Chunker, Embedding Service, Vector Index Writer.
  • Retrieval & Generation Pipeline (Online Serving): Modules include Query API / Chat Gateway, RAG Orchestration Layer, Query Embedding Model, Vector Retrieval (Top-K).
  • Supporting Services (Governance, Observability, Security): Modules include Metadata Store & ACL, Caching & Rate Control, Monitoring & Evaluation.

Module responsibilities

Show all (13)
  • Data Sources Layer / Unstructured Document Sources: Provide raw knowledge content; Serve as the system-of-record for documents
  • Data Sources Layer / Document Connectors: Fetch documents and metadata; Detect updates and deletions; Normalize into ingestion payloads
  • Ingestion Pipeline (Index Build & Refresh) / Document Parser & Cleaner: Convert heterogeneous formats to clean text; Preserve structure (headings, sections, tables tags); Attach metadata (source, author, timestamp)
  • Ingestion Pipeline (Index Build & Refresh) / Text Chunker: Split documents into retrieval-friendly chunks; Balance recall vs precision using overlap; Emit stable chunk identifiers for updates
  • Ingestion Pipeline (Index Build & Refresh) / Embedding Service: Generate dense vectors for chunks; Ensure consistent embedding model/version usage; Handle rate limits and failures
  • Ingestion Pipeline (Index Build & Refresh) / Vector Index Writer: Persist embeddings into vector store; Maintain metadata for filtering and citations; Support incremental updates and deletions
  • Retrieval & Generation Pipeline (Online Serving) / Query API / Chat Gateway: Receive user queries and context; Enforce access control and quotas; Route requests to RAG orchestrator
  • Retrieval & Generation Pipeline (Online Serving) / RAG Orchestration Layer: Manage end-to-end RAG workflow; Apply routing, filtering, and policies; Assemble prompts and control context budget
  • Retrieval & Generation Pipeline (Online Serving) / Query Embedding Model: Produce query embedding compatible with chunk vectors; Ensure stable retrieval behavior across versions; Optimize latency via batching/caching
  • Retrieval & Generation Pipeline (Online Serving) / Vector Retrieval (Top-K): Retrieve the most relevant chunks for the query; Enforce security filters before returning chunks; Provide candidates for reranking and fusion
  • Supporting Services (Governance, Observability, Security) / Metadata Store & ACL: Enforce authorization during retrieval; Provide traceable provenance for citations; Support document lifecycle (delete/expire)
  • Supporting Services (Governance, Observability, Security) / Caching & Rate Control: Reduce latency and cost; Protect backend services under load; Stabilize performance during bursts
  • Supporting Services (Governance, Observability, Security) / Monitoring & Evaluation: Measure RAG effectiveness and drift; Detect failures (empty retrieval, hallucinations); Support iterative tuning of chunking and prompts

Key flows

Show all (3)
  • Ingestion pipeline: connectors pull PDFs/Wikis, a parser cleans and normalizes content, the Text Chunker splits into token-aware overlapping chunks, the Embedding Model generates vectors, and the Vector Index Writer upserts vectors + metadata into Pinecone for incremental refresh and deletion handling.
  • Retrieval pipeline: a user query enters the Query API, the orchestrator (LangChain) normalizes it and generates a query embedding, then performs top-k similarity search in Pinecone with metadata/ACL filters; optional reranking and MMR improve precision and diversity.
  • Generation pipeline: the orchestrator fuses the selected chunks into a context window within the token budget, injects it into a prompt, and calls the LLM (GPT-4) to synthesize a grounded final answer while preserving citations back to chunk_id/doc sources.