# LightRAG ADC System Architecture This document describes the full architecture of the LightRAG-based Adverse Drug Condition (ADC) system for processing and querying Slovak pharmaceutical leaflets. The system consists of three main components running locally: - **Embedding Server** (port 8010) — wraps a sentence-transformers model for vector generation - **LightRAG Server** (port 9621) — core RAG engine managing the knowledge graph and vector DB - **OpenWebUI LLM** (remote) — hosts the Qwen3.5-122B model used for entity extraction and answer generation Both local servers are launched via `start_servers.py`. Source data is 6929 Slovak pharmaceutical leaflets stored in `cleaned_general_info_additional.json`. --- ## Flow 1: Ingestion (Loading Leaflets) ```mermaid flowchart TD A([šŸ‘¤ User runs load_leaflets.py]) --> B B[("šŸ“„ cleaned_general_info_additional.json\n6929 Slovak leaflets")] B --> C{Filter:\nclinical leaflets only\ninteractions +\ncontraindications} C -->|Filtered leaflets| D["šŸ” For each leaflet\n(loop)"] D --> E["POST http://localhost:9621\n/documents/text\n\nBody: { text, metadata }"] subgraph LightRAG_Server ["āš™ļø LightRAG Server — port 9621"] E --> F["Text chunker\n600 tokens per chunk"] F --> G["šŸ” For each chunk\n(loop)"] G --> H["POST https://ui.tukekemt.xyz\n/api/v1/chat/completions\n\nModel: model2 (Qwen3.5-122B)\nTask: extract entities & relations"] H --> I["Extracted:\n• Entities (drugs, conditions, etc.)\n• Relations between entities"] I --> J["šŸ” For each entity / chunk\n(loop)"] J --> K["POST http://localhost:8010\n/embeddings\n\nBody: { input: text }"] end subgraph Embedding_Server ["🧠 Embedding Server — port 8010"] K --> L["paraphrase-multilingual\n-MiniLM-L12-v2\n(sentence-transformers)"] L --> M["Float vector\n(384 dimensions)"] end subgraph OpenWebUI ["ā˜ļø OpenWebUI — ui.tukekemt.xyz"] H end M --> N subgraph RAG_Storage ["šŸ’¾ rag_storage/"] N["graph_chunk_entity_relation.graphml\n— knowledge graph (NetworkX)"] O["vdb_entities.json\n— entity vectors (NanoVectorDB)"] P["vdb_relationships.json\n— relation vectors (NanoVectorDB)"] Q["kv_store_*.json\n— chunk text cache & metadata"] end I --> N I --> P M --> O F --> Q ``` --- ## Flow 2: Query (Answering Questions) ```mermaid flowchart TD A([šŸ‘¤ User sends query]) --> B B["POST http://localhost:9621/query\n\nBody:\n{ query: string,\n mode: hybrid | local | global | naive }"] subgraph LightRAG_Server ["āš™ļø LightRAG Server — port 9621"] B --> C["Parse query\n& select retrieval mode"] C --> D["POST http://localhost:8010\n/embeddings\n\nEmbed the query text"] subgraph Retrieval ["šŸ” Retrieval (parallel)"] E["Vector search\nNanoVectorDB\n(vdb_entities.json,\nvdb_relationships.json)"] F["Graph traversal\nNetworkX\n(graph_chunk_entity_relation.graphml)"] end D --> Retrieval Retrieval --> G["Merge & rank\nrelevant entities,\nrelations & text chunks"] G --> H["Build context prompt\nfrom top-K results\n+ retrieved chunk texts\n(kv_store_*.json)"] H --> I["POST https://ui.tukekemt.xyz\n/api/v1/chat/completions\n\nModel: model2 (Qwen3.5-122B)\nTask: generate answer\nfrom context"] end subgraph Embedding_Server ["🧠 Embedding Server — port 8010"] D2["paraphrase-multilingual\n-MiniLM-L12-v2"] D --> D2 D2 --> E end subgraph OpenWebUI ["ā˜ļø OpenWebUI — ui.tukekemt.xyz"] I end subgraph RAG_Storage ["šŸ’¾ rag_storage/"] VDB["vdb_entities.json\nvdb_relationships.json"] GRAPH["graph_chunk_entity_relation.graphml"] KV["kv_store_*.json"] end E --- VDB F --- GRAPH H --- KV I --> J["Generated answer\n+ source references"] J --> K([šŸ‘¤ User receives response]) ``` --- ## Component Summary | Component | Type | Address | Key Endpoints | |---|---|---|---| | `embedding_server.py` | FastAPI (local) | `http://localhost:8010` | `GET /health`, `POST /embeddings`, `POST /v1/embeddings` | | LightRAG Server | FastAPI (local) | `http://localhost:9621` | `GET /health`, `POST /documents/text`, `POST /documents/scan`, `GET /documents/pipeline_status`, `POST /query` | | OpenWebUI (model2) | Remote LLM API | `https://ui.tukekemt.xyz` | `POST /api/v1/chat/completions` | | `rag_storage/` | File system | Local disk | `.graphml`, `.json` files | | `cleaned_general_info_additional.json` | Source data | Local disk | 6929 Slovak pharmaceutical leaflets | | `start_servers.py` | Launcher script | — | Starts embedding server + LightRAG server |