Skip to content

Semantic Search MCP

A powerful Model Context Protocol (MCP) server for semantic code search using vector embeddings and LLM-powered reranking.


What This Does

This MCP server provides semantic search capabilities for your codebase. Instead of keyword matching, it understands the meaning of your queries and finds the most relevant code, even if it uses different terminology.

This is the search/query component. For indexing your codebase, see the companion project: codebase-index-cli (Node.js real-time indexer with file watching and git commit tracking).


How These Projects Work Together

┌─────────────────────────────────────────────────────────────────┐
│                    Your Workflow                                │
└─────────────────────────────────────────────────────────────────┘

1. INDEX (codebase-index-cli - Node.js)
   ├─ Watches your codebase for changes
   ├─ Parses code with tree-sitter (29+ languages)
   ├─ Creates embeddings with your chosen model
   ├─ Stores vectors in Qdrant or SQLite
   ├─ Tracks git commits with LLM analysis
   └─ Maintains .codebase/state.json with collection info

2. SEARCH (semantic-search - Python MCP Server)
   ├─ Reads collection info from .codebase/state.json
   ├─ Queries the indexed vectors (Qdrant or SQLite)
   ├─ Uses same embedding model for consistency
   ├─ Returns semantically relevant code
   └─ Optionally reranks with LLM analysis

3. CONSUME (Your AI Coding Assistant)
   ├─ Claude Code / Claude Desktop
   ├─ Cline (VS Code)
   ├─ Windsurf
   └─ Any MCP-compatible client

In short:

  • codebase-index-cli = Indexer (creates the searchable vectors)
  • semantic-search = MCP Server (provides search tools to AI assistants)

MCP in action

The MCP server in action: Agent naturally using semantic_search tool as part of its workflow, with todo list tracking and "Thinking..." state visible


Core Features

  • Semantic Search - Natural language queries to find code by intent
  • LLM-Powered Reranking - AI-assisted relevance scoring and filtering
  • Git Commit History Search - Search through analyzed commit history
  • Multi-Project Search - Query other codebases/workspaces
  • SQLite & Qdrant Support - Flexible vector storage backends

Critical Configuration Requirements

1. Embedder Model Consistency

CRITICAL

The embedder model used for search MUST MATCH the model used for indexing.

# Your indexer used this:
EMBEDDER_MODEL_ID=text-embedding-3-small
EMBEDDER_DIMENSION=1536

# Your search server MUST use the EXACT same:
MCP_CODEBASE_EMBEDDER_MODEL_ID=text-embedding-3-small
MCP_CODEBASE_EMBEDDER_DIMENSION=1536

Why this matters:

  • Different models produce incompatible vector spaces
  • Mismatched dimensions will cause search failures
  • Using text-embedding-3-large when indexed with -small = broken search
  • OpenAI models vs other providers = different vector spaces

Compatibility Matrix:

Indexer Model Search Model Compatible?
text-embedding-3-small text-embedding-3-small ✅ YES
text-embedding-3-small text-embedding-3-large ❌ NO
text-embedding-3-small (1536d) text-embedding-3-small (512d) ❌ NO
nomic-embed-text-v1.5 nomic-embed-text-v1.5 ✅ YES
OpenAI model HuggingFace model ❌ NO

2. LLM for Refined Results

When using refined_answer=True, the system uses an LLM (Judge) to:

  • Analyze relevance of each code fragment
  • Filter out noise and boilerplate
  • Identify missing imports/references
  • Generate a structured brief summary

Configuration:

# Reranking: Choose between Voyage AI or LLM Judge

# Option 1: Voyage AI Native Reranker
# Fast, specialized reranking model (~200ms)
# Get FREE API key at: https://www.voyageai.com/ (generous free tier)
MCP_CODEBASE_NATIVE_RERANK=true
MCP_CODEBASE_VOYAGE_API_KEY=pa-your-voyage-api-key
MCP_CODEBASE_VOYAGE_RERANK_MODEL=rerank-2.5

# Option 2: LLM Judge
# Flexible, can provide detailed explanations (~2-5s)
# Only used when NATIVE_RERANK=false
MCP_CODEBASE_JUDGE_PROVIDER=openai-compatible
MCP_CODEBASE_JUDGE_API_KEY=your-api-key
MCP_CODEBASE_JUDGE_BASE_URL=https://your-llm-endpoint.com/v1
MCP_CODEBASE_JUDGE_MODEL_ID=gpt-4o-mini
MCP_CODEBASE_JUDGE_MAX_TOKENS=32000
MCP_CODEBASE_JUDGE_TEMPERATURE=0.0

Reranking Options

Voyage AI Native Rerank:

  • Fast (~200ms response time)
  • Specialized reranking models
  • Cost-effective for high-volume usage
  • Free tier available at voyageai.com
  • +42% improvement in relevance scores vs pure vector search

LLM Judge:

  • Flexible, can provide explanations and reasoning
  • Uses general-purpose LLM
  • Response time: ~2-5s
  • Good for complex analysis with detailed context

When to use reranking:

  • Complex architectural questions
  • Finding patterns across multiple files
  • Understanding relationships between components
  • Production deployments requiring high accuracy

When NOT to use reranking:

  • Quick searches where raw speed is priority
  • Simple queries with obvious answers
  • When you want raw vector similarity scores

3. Vector Storage Backends

Qdrant (Primary)

MCP_CODEBASE_QDRANT_URL=http://localhost:6333
MCP_CODEBASE_QDRANT_API_KEY=optional-api-key
  • Scalable for large codebases
  • Supports filtering and complex queries
  • Required for commit history search
  • Collection names: auto-generated from workspace path hash

SQLite (Alternative)

# No config needed - auto-detected at:
# <workspace>/.codebase/vectors.db
  • Good for single-user local development
  • Lower memory footprint
  • Embedded in workspace directory
  • Limitations: No commit history support

4. Two Server Modes

Mode 1: server_qdrant.py (Qdrant-based)

Tools: - semantic_search - Basic semantic search - visit_other_project - Search other workspaces - search_commit_history - Query git history

Best for: - Multi-project workflows - Teams using shared Qdrant instance - Git commit analysis - Production deployments

Mode 2: server_sqlite.py (SQLite-based)

Tools: - semantic_search - Basic semantic search - visit_other_project - Search other workspaces (SQLite or Qdrant)

Best for: - Single developer local workflows - Offline development - Lower resource usage - Quick prototyping


Installation

Prerequisites

Before using this MCP server

You MUST index your codebase first.

Step 1: Install and Run the Indexer

Use codebase-index-cli to index your codebase:

# Install indexer globally
npm install -g codebase-index-cli

# Navigate to your project
cd /path/to/your/project

# Index with SQLite (local, portable)
codesql

# OR index with Qdrant (scalable, remote)
codebase

The indexer will:

  • Create .codebase/ directory in your project
  • Generate state.json with collection info
  • Parse and embed your code
  • Watch for changes in real-time
  • Track git commits (optional)

Step 2: Install This MCP Server

Requirements for the search server:

  • Python 3.10+
  • Embedder API access (OpenAI, OpenRouter, local, etc.)
  • LLM API access (for refined results - optional)
  • Qdrant server (if using Qdrant mode) OR SQLite database (auto-created by indexer)

Setup:

# Clone repository
git clone https://github.com/dudufcb1/semantic-search
cd semantic_search

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install fastmcp qdrant-client httpx pydantic pydantic-settings python-dotenv

# Configure environment
cp .env.example .env
# Edit .env with your settings

Environment Configuration

Minimal configuration:

# Embedder (REQUIRED)
MCP_CODEBASE_EMBEDDER_PROVIDER=openai-compatible
MCP_CODEBASE_EMBEDDER_API_KEY=your-key
MCP_CODEBASE_EMBEDDER_BASE_URL=https://api.example.com/v1
MCP_CODEBASE_EMBEDDER_MODEL_ID=text-embedding-3-small
MCP_CODEBASE_EMBEDDER_DIMENSION=1536  # Optional, model default if omitted

# Qdrant (REQUIRED for server_qdrant.py)
MCP_CODEBASE_QDRANT_URL=http://localhost:6333
MCP_CODEBASE_QDRANT_API_KEY=optional-key

# Judge/LLM (REQUIRED for refined_answer=True)
MCP_CODEBASE_JUDGE_PROVIDER=openai-compatible
MCP_CODEBASE_JUDGE_API_KEY=your-key
MCP_CODEBASE_JUDGE_BASE_URL=https://api.example.com/v1
MCP_CODEBASE_JUDGE_MODEL_ID=gpt-4o-mini
MCP_CODEBASE_JUDGE_MAX_TOKENS=32000
MCP_CODEBASE_JUDGE_TEMPERATURE=0.0

MCP Client Configuration

Claude Desktop (claude_desktop_config.json)

{
  "mcpServers": {
    "semantic-search": {
      "command": "/absolute/path/to/semantic_search/venv/bin/python",
      "args": ["/absolute/path/to/semantic_search/src/server_qdrant.py"],
      "env": {
        "MCP_CODEBASE_EMBEDDER_PROVIDER": "openai-compatible",
        "MCP_CODEBASE_EMBEDDER_API_KEY": "your-key",
        "MCP_CODEBASE_EMBEDDER_BASE_URL": "https://api.example.com/v1",
        "MCP_CODEBASE_EMBEDDER_MODEL_ID": "text-embedding-3-small",
        "MCP_CODEBASE_QDRANT_URL": "http://localhost:6333"
      }
    }
  }
}

Cline/Windsurf (.mcp.toml)

[mcp_servers.semantic-search]
type = "stdio"
command = "/absolute/path/to/semantic_search/venv/bin/python"
args = ["/absolute/path/to/semantic_search/src/server_qdrant.py"]
timeout = 3600

[mcp_servers.semantic-search.env]
MCP_CODEBASE_EMBEDDER_PROVIDER = "openai-compatible"
MCP_CODEBASE_EMBEDDER_API_KEY = "your-key"
MCP_CODEBASE_EMBEDDER_BASE_URL = "https://api.example.com/v1"
MCP_CODEBASE_EMBEDDER_MODEL_ID = "text-embedding-3-small"
MCP_CODEBASE_QDRANT_URL = "http://localhost:6333"

Tools Reference

Basic semantic code search.

Parameters:

  • query (string, required) - Natural language query
  • qdrant_collection (string, required) - Collection name from .codebase/state.json
  • max_results (int, optional) - Max results to return (default: 20)
  • refined_answer (bool, optional) - Use LLM analysis (default: false)

Example:

{
  "query": "authentication middleware implementation",
  "qdrant_collection": "codebase-7a1480dc62504bc490",
  "max_results": 15,
  "refined_answer": true
}

Returns:

  • Without refined_answer: Ranked code fragments with similarity scores
  • With refined_answer: AI-analyzed brief + ranked relevant files + noise detection

search_commit_history

Search through git commit history that has been indexed and analyzed by LLM.

Parameters:

  • query (string, required) - What to search for in commit history
  • qdrant_collection (string, required) - Collection name
  • max_results (int, optional) - Max commits to return (default: 10)

Example:

{
  "query": "when was SQLite storage implemented",
  "qdrant_collection": "codebase-7a1480dc62504bc490",
  "max_results": 5
}

Requirements:

  • Git tracking must be enabled during indexing
  • Commits must have been analyzed by LLM
  • Qdrant backend required (not available with SQLite)

visit_other_project

Search in a different workspace/codebase.

Parameters:

  • query (string, required) - Search query
  • workspace_path (string, optional) - Absolute path to workspace
  • qdrant_collection (string, optional) - Explicit collection name
  • storage_type (string, optional) - "sqlite" or "qdrant" (default: "qdrant")
  • refined_answer (bool, optional) - Use LLM analysis (default: false)
  • max_results (int, optional) - Max results (default: 20)

Resolution logic:

  1. If qdrant_collection specified → use Qdrant with that collection
  2. If storage_type="sqlite" + workspace_path → try SQLite at <workspace>/.codebase/vectors.db
  3. If SQLite not found → fallback to Qdrant (calculate collection from workspace_path)
  4. If storage_type="qdrant" → calculate collection from workspace_path

Example:

{
  "query": "payment processing flow",
  "workspace_path": "/home/user/other-project",
  "storage_type": "sqlite",
  "refined_answer": true
}

Big Game Changer

This is the most powerful search tool available.

Why this matters:

Previously, every semantic search relied on a single query or on LLM-generated variations. That made it hard to control coverage and slowed things down when the model had to invent extra prompts.

What it does:

  • Directed multi-query: You decide up to five additional variations (queries) that run in parallel. No extra LLM round-trip needed.
  • Smart deduplication: Merges results by file + line range, removing noise and duplicates.
  • Optional rerank: If settings.native_rerank is enabled, Voyage still reranks just like before.
  • Optional brief: The LLM is only used when you request refined_answer=True; otherwise you get raw merged results straight from Qdrant.

Why it's a game changer:

  • Tighter coverage: Hit different angles (webhooks, validation, edge cases) in a single call.
  • Faster execution: asyncio.gather builds embeddings and runs Qdrant searches concurrently without waiting for LLM creativity.
  • Traceable output: Each file in the response lists the exact queries that surfaced it ("Consultas que lo devolvieron").

How to use it:

result = semantic_parallel_search(
    query="How do we handle payment processing?",
    qdrant_collection="codebase-1d85d0a83c1348b3be",
    queries=[
        "payment gateway integration stripe",
        "transaction validation error handling",
        "payment confirmation webhooks"
    ],
    max_results=20,
    refined_answer=False
)

Note

If queries is None or empty, the tool just reuses the base query. The response annotates each file with the list of queries that returned it, so you can see which phrasing worked.


Understanding .codebase/state.json

The indexer (codebase-index-cli) creates a .codebase/ directory in your project with this structure:

your-project/
└── .codebase/
    ├── state.json       # Collection info, indexing status, stats
    ├── cache.json       # File hashes for change detection
    └── vectors.db       # SQLite database (if using `codesql` command)

state.json Format

This file contains critical information that the MCP server reads to find your indexed vectors:

{
  "workspacePath": "/absolute/path/to/your/project",
  "qdrantCollection": "codebase-1d85d0a83c1348b3be",
  "createdAt": "2025-10-17T10:13:48.454Z",
  "updatedAt": "2025-10-17T10:39:00.715Z",
  "indexingStatus": {
    "state": "watching"
  },
  "lastActivity": {
    "timestamp": "2025-10-17T10:39:00.712Z",
    "action": "indexed",
    "filePath": "README.md",
    "details": {
      "blockCount": 48
    }
  },
  "qdrantStats": {
    "totalVectors": 396,
    "uniqueFiles": 22,
    "vectorDimension": 1536,
    "lastUpdated": "2025-10-17T10:30:03.891Z"
  }
}

Key Fields:

  • qdrantCollection - The collection name to use when calling semantic_search()
  • workspacePath - Absolute path to the indexed project
  • vectorDimension - The dimension of the embedding model used (MUST match your MCP server config)
  • indexingStatus.state - Current state: "watching", "indexing", "idle", or "error"
  • qdrantStats.totalVectors - Number of indexed code chunks
  • qdrantStats.uniqueFiles - Number of files in the index

How the MCP Server Uses state.json

When you call semantic_search(), the server:

  1. Reads <workspace>/.codebase/state.json
  2. Extracts qdrantCollection (e.g., "codebase-1d85d0a83c1348b3be")
  3. Connects to Qdrant/SQLite using that collection
  4. Performs the semantic search
  5. Returns ranked results

Info

You don't need to manually create or edit this file - the indexer manages it automatically.


Common Pitfalls

0. Indexer Not Running

Problem: No .codebase/state.json file found or "collection doesn't exist" errors

Cause: You haven't indexed your codebase yet

Solution:

# Install and run the indexer first
npm install -g codebase-index-cli

# Navigate to your project
cd /path/to/your/project

# Run the indexer (choose one)
codesql        # For SQLite storage
codebase       # For Qdrant storage

Warning

The indexer MUST be running or have completed indexing before you can use this MCP server. See codebase-index-cli for details.

1. Model Mismatch

Problem: Search returns irrelevant results or errors

Cause: Embedder model doesn't match indexing model

Solution:

# Check your indexer config
cat .codebase/config.json

# Match these settings exactly:
MCP_CODEBASE_EMBEDDER_MODEL_ID=<same-as-indexer>
MCP_CODEBASE_EMBEDDER_DIMENSION=<same-as-indexer>

2. Missing LLM Config

Problem: refined_answer=True fails with authentication errors

Cause: Judge/LLM credentials not configured

Solution:

# Configure LLM for refined results
MCP_CODEBASE_JUDGE_API_KEY=your-llm-api-key
MCP_CODEBASE_JUDGE_BASE_URL=https://your-llm-endpoint/v1

3. Empty Search Results

Problem: No results returned for valid queries

Causes:

  • Workspace not indexed yet
  • Wrong collection name
  • Qdrant server not running
  • Score threshold too high

Solutions:

# Verify collection exists
curl http://localhost:6333/collections

# Check state file for correct collection name
cat <workspace>/.codebase/state.json

# Lower score threshold
MCP_CODEBASE_SEARCH_MIN_SCORE=0.1  # Default: 0.4

4. Commit History Not Working

Problem: search_commit_history returns no results

Causes:

  • Git tracking not enabled during indexing
  • Using SQLite backend (not supported)
  • No commits analyzed yet

Solution:

  • Ensure Qdrant backend is used
  • Enable git tracking in indexer
  • Wait for commits to be analyzed and indexed

Performance Considerations

Search Speed

  • Basic search: ~100-500ms (depends on collection size)
  • With refined_answer: +2-10s (LLM processing overhead)
  • Commit history: ~200-800ms (filtered search)

Token Usage (with refined_answer=True)

  • Input tokens: ~2000-8000 per search (depends on result count)
  • Output tokens: ~500-3000 (depends on complexity)
  • Cost estimate: $0.01-0.05 per refined search (varies by LLM provider)

Memory Usage

  • Qdrant mode: ~50-200MB (depends on client connections)
  • SQLite mode: ~20-100MB (embedded database)
  • Peak during search: +100-300MB (vector operations)


License

MIT

Contributing

This project is designed to be feature-complete for its intended use case. For bugs, documentation improvements, or critical features, see the Contributing guidelines in the repository.