home writings projects

Kira - AI-Powered RAG Customer Service Chatbot

Nov 14, 2024

·

3 min read

Overview

Kira is a self-hosted AI customer service chatbot built for companies that need complete data ownership and privacy.

Using Retrieval-Augmented Generation (RAG), Kira grounds AI responses in actual company documentation—preventing hallucinations and ensuring accurate information.

Status: Coming soon

Key Features

RAG Architecture

  • Qdrant vector database for semantic search
  • Gemini embeddings for document vectorization
  • Source attribution for every response
  • Prevents AI hallucinations through grounded retrieval

Knowledge Management

  • Admin dashboard for knowledge base management
  • Support for TXT and Markdown file ingestion
  • Automatic document processing and embedding
  • Version control for documentation updates

Real-time Communication

  • Server-Sent Events (SSE) for streaming responses
  • Conversation history tracking
  • Multi-language support
  • Context-aware conversations

Privacy-First Design

  • Fully self-hosted solution
  • Complete data ownership
  • No external API dependencies for data storage
  • On-premise deployment option

Production Infrastructure

  • Deployed on k3s cluster
  • Full CI/CD with Drone
  • Complete observability stack:
    • OpenTelemetry for tracing
    • Grafana for monitoring
    • Loki for logs

Tech Stack

  • Backend: Golang (Gin framework)
  • Frontend: React with TanStack Query
  • Database: PostgreSQL
  • Vector Store: Qdrant
  • AI: Gemini AI for embeddings and generation
  • Infrastructure: k3s, Drone CI
  • Observability: OpenTelemetry, Grafana, Loki

Why This Project?

Many companies face challenges with AI chatbots:

  • Privacy concerns: Sensitive data sent to third-party APIs
  • Hallucinations: AI making up incorrect information
  • Vendor lock-in: Dependence on external services
  • Cost: Expensive per-query pricing models

Kira solves these by:

  • Self-hosting: Complete control over data and infrastructure
  • RAG: Grounding responses in actual documentation
  • Open-source stack: No vendor lock-in
  • Transparency: Source attribution for every answer

This is a proof-of-concept demonstrating how companies can deploy privacy-first, hallucination-resistant AI chatbots without sacrificing functionality.

Technical Highlights

RAG Pipeline

  1. Documents ingested and split into chunks
  2. Chunks embedded using Gemini embeddings
  3. Stored in Qdrant vector database
  4. User queries semantically searched
  5. Relevant chunks retrieved and provided as context
  6. LLM generates response grounded in retrieved data
  7. Sources cited for verification

Scalability

  • Horizontal scaling with k3s
  • Vector search optimization
  • Caching for frequently asked questions
  • Async processing for document ingestion
Need a privacy-first AI chatbot solution? Let's discuss your requirements!

P.S. This project will be open-sourced soon. Follow me on GitHub for updates!