CodeGraphContext: 6-Month Evolution & Feature Roadmap
CodeGraphContext (CGC) is a polyglot code intelligence tool that maps static codebases into queryable graph databases, exposing this context to AI assistants and developers. This document provides a complete inventory of CGC's existing features, details its current limitations, and outlines a comprehensive 6-month evolution path split into 50 distinct milestones.
1. Inventory of Current CGC Capabilities
Below is the inventory of features currently implemented, shipped, and operational within the CodeGraphContext codebase:
1.1 Ingestion & Parsing
- Polyglot Tree-Sitter Parsers: Parser classes for 20 target languages:
- Python (with Jupyter Notebook/
.ipynbcells support vianbformat/nbconvert) - JavaScript, TypeScript, TSX
- Go, Rust, C, C++
- Java, Kotlin, Scala
- Ruby, C#, PHP, Swift
- Dart, Perl, Haskell, Elixir, Lua
- SCIP Parsing Pipeline: Optional protobuf-based precise indexer leveraging external
scip-*language tools for deep AST symbol extraction. - Dependency Resolvers: Package path resolution logic for 9 languages to link external dependencies into the graph representation.
- Incremental Watcher: Multi-threaded
watchdogsystem for automatic file modification, creation, and deletion detection with debounced graph writes. .cgcignoreFilter: Custom pattern matcher adhering to.gitignore-style rules to prevent noise (vendor folders, binaries) from entering the graph database.
1.2 Graph Persistence & Schemas
- Database Adapters: Consistent connection wrappers for 5 backend drivers:
- FalkorDB Lite: Local embedded UNIX DB via Redislite.
- FalkorDB Remote: Remote FalkorDB client.
- KuzuDB: Local embedded relational-graph DB (default for Windows).
- Neo4j: Server-side graph database (AuraDB/Docker compatible).
- Nornic DB: Neo4j-compatible embedded database driver.
- Graph Schema: Schema contracts enforcing 17 node labels (e.g.,
Repository,File,Function,Class,Variable,Interface,Enum,Parameter) and 7 relationship types (CONTAINS,CALLS,IMPORTS,INHERITS,IMPLEMENTS,HAS_PARAMETER,INCLUDES).
1.3 Query & Analysis (CodeFinder)
- Fuzzy Search: In-memory Levenshtein distance fallback search for classes and functions across all DB backends.
- Ast & Structural Queries: Out-of-the-box Cypher queries for:
- Transitive call chains and callers/callees.
- Class inheritance and C# interface implementations.
- Complexity analysis (Cyclomatic complexity calculations).
- Dead code analysis (unreferenced files and function declarations).
- Variable scope tracking and usage analysis.
- Named Contexts: Local configuration profiles allowing switching between global contexts, per-repository contexts, or shared workspace contexts.
1.4 Interfaces & Client Integration
- MCP Server: JSON-RPC over
stdioimplementing 20 tools for tools/list and tools/call, allowing cursor/claude to interact with the database. - CLI Commands: Over 55 interactive and command-line scripts for configuration, index wizardry, health checks (
cgc doctor), and query execution. - Viz Server & Website:
- FastAPI server serving static visual assets locally.
- React SPA with force-directed graphs (2D, 3D, 3D City visual structures, and Mermaid flowchart SVG exports).
- In-browser parsing worker utilizing
web-tree-sitterWASM files to parse local uploads or cloned GitHub repositories without Python dependencies. - Bundles & Registry:
.cgcarchive format for exporting/importing graph snapshots.- GitHub-backed registry search and on-demand trigger mechanism via GitHub Actions dispatch.
1.5 VS Code Extension
- Early-stage vsix extension (
extensions/vscode): - Setup wizard commands and activity bar viewer stubs.
- Config management matching core CLI options.
- Interactive menus and control panel webview.
2. Current Known Bugs & Technical Limitations
| Code | Type | Limitation / Bug | Severity | Impact |
|---|---|---|---|---|
| L1 | Arch | Single-process MCP Server | Medium | The standard stdio JSON-RPC transport limits the server to one IDE wrapper instance at a time; no concurrent shared connections. |
| L2 | Arch | Sync-over-Async Handlers | Low | Handlers run in threads (asyncio.to_thread). True non-blocking asynchronous drivers for Neo4j/Kuzu are not utilized. |
| L3 | Arch | In-Memory Job Manager | Medium | Background indexing job states are lost on server restart, leading to broken job polling. |
| L4 | Arch | Monolithic cli/main.py |
Medium | CLI commands are structured in a single 2386-line file, increasing maintenance overhead and making testing difficult. |
| L5 | Arch | Monolithic CodeGraphViewer.tsx |
High | Renders layout, handles Cytoscape/Force-graph state, and processes files in a single 1579-line file. |
| L6 | DB | FalkorDB UNIX Restriction | Medium | FalkorDB Lite is blocked on Windows due to redislite binaries, causing silent fallbacks. |
| L7 | DB | KuzuDB Cypher Dialect Discrepancies | High | Specific Cypher queries (e.g. UNWIND, aggregations) behave differently between Kuzu and Neo4j, resulting in query failures. |
| L8 | Parse | Syntactic Boundary | Medium | Tree-sitter has no type solver; dynamic imports or duplicate class names across folders can result in false connections in the call graph. |
| L9 | Parse | Stubbed Advanced Toolkits | High | All 16 language *Toolkit classes in query_tool_languages/ raise NotImplementedError when advanced queries are invoked. |
| L10 | Test | Flaky Integration Tests | Medium | test_cgcignore_patterns.py requires a fully installed workspace and a live DB, leading to CI failures. |
| L11 | Test | Ruby Mixins and C++ Duplicate Tests | Low | Stubbed test fixtures like test_mixins.py refer to invalid fixtures, and C++ tests are duplicated. |
3. The 6-Month Evolutionary Roadmap (50 Milestones)
Here is the week-by-week and month-by-month execution plan to address the constraints, scale the architecture, and implement the planned integrations (Ollama, cloud LLMs, browser extensions, benchmarking, and VS Code upgrades).
Month 1: Architectural Refactoring, Testing Isolation & Benchmarking (Milestones 1–9)
Focus: De-monolithing the CLI and frontend, isolating tests, and implementing a real indexing performance bench.
Milestone 1: Deconstruct cli/main.py Monolith
- Difficulty: Medium
- Knowledge Needed: Typer CLI, Python Package structures.
- Deliverable: Split
cli/main.pyinto separate sub-command modules undercodegraphcontext/cli/commands/(e.g.,index.py,find.py,analyze.py,bundle.py). - Behavioral Improvement: Developer maintenance increases; CLI startup overhead drops because only required commands are imported.
Milestone 2: Refactor CodeGraphViewer.tsx
- Difficulty: Hard
- Knowledge Needed: React, TypeScript, state synchronization.
- Deliverable: Split the React viewer into subcomponents (
GraphCanvas,CodeViewerSidebar,SearchAndFilter,VisualSettingsPanel). - Behavioral Improvement: Frontend codebase becomes modular, making it easier to fix rendering bugs and add custom layout managers.
Milestone 3: Database Query Interface Protocol (R4)
- Difficulty: Hard
- Knowledge Needed: Cypher dialects (Kuzu vs Neo4j vs FalkorDB), Abstract Base Classes.
- Deliverable: Extract database queries from
CodeFinderinto a dedicated translation layer (GraphQueryInterface), with subclassed providers for KuzuDB and Neo4j. - Behavioral Improvement: Eliminates Cypher dialect differences; KuzuDB queries no longer crash on unsupported Cypher syntax.
Milestone 4: Clean and Isolate Test Suite (L11, L10)
- Difficulty: Medium
- Knowledge Needed: pytest, mocks, CI environment configurations.
- Deliverable: Remove the dead
test_mixins.pyRuby fixture; deduplicate C++ tests; isolatetest_cgcignore_patterns.pyby mocking the database connection. - Behavioral Improvement: CI run succeeds on every commit without needing local DB servers or pre-installed environment binaries.
Milestone 5: Standardized Error Schema and Handler Layer (R10)
- Difficulty: Easy
- Knowledge Needed: MCP protocol, error-handling conventions.
- Deliverable: Establish structured error codes and messages for the MCP response payloads (e.g.,
INDEX_NOT_FOUND,DB_CONNECTION_LOST). - Behavioral Improvement: AI assistants understand why a tool call failed and can recover gracefully (e.g., prompting the user to run an indexer).
Milestone 6: Bundle Schema Versioning & Validation (R12, R13)
- Difficulty: Medium
- Knowledge Needed: ZIP archiving, JSON schema validation, version parsing.
- Deliverable: Add a version header inside the
.cgcbundlemetadata.jsonand create thecgc bundle validate <path>CLI command. - Behavioral Improvement: Prevents older versions of CGC from loading newer, incompatible database structures, alerting the user with clear instructions.
Milestone 7: Build Real-World Ingestion Benchmarking Suite (R9)
- Difficulty: Medium
- Knowledge Needed: Benchmarking methodologies, performance telemetry.
- Deliverable: Create
scripts/run_benchmarks.pyusing a standard corpus of target repositories (e.g., a 100k LOC Python/Go codebase). Track parsing throughput (LOC/sec) and database insertion latencies. - Behavioral Improvement: Provides quantitative metrics on indexing speed, preventing regressions during parser upgrades.
Milestone 8: Establish Query Latency Profiling
- Difficulty: Easy
- Knowledge Needed: Python timing utilities, Cypher EXPLAIN.
- Deliverable: Include Cypher query execution time metrics in debug logs and
cgcCLI output. - Behavioral Improvement: Developers can identify slow queries and optimize database constraints/indexes accordingly.
Milestone 9: Persistent Job Manager (L3, R6)
- Difficulty: Medium
- Knowledge Needed: SQLite, async job states.
- Deliverable: Replace the in-memory dict in
JobManagerwith a lightweight, embedded SQLite table (jobs.db) under.codegraphcontext/. - Behavioral Improvement: Long-running index jobs resume or report correct failed/completed states if the IDE or MCP server restarts.
Month 2: Core Database Optimization & Advanced Language Toolkits (Milestones 10–18)
Focus: True asynchronous driver interfaces, query optimizations, and implementing the stubbed programming language query toolkits.
Milestone 10: Implement Core Python *Toolkit Queries (L9)
- Difficulty: Medium
- Knowledge Needed: Python Tree-sitter AST, import hooks.
- Deliverable: Implement
PythonToolkitqueries for advanced tasks (e.g., identifying decorators, resolving dynamic import boundaries). - Behavioral Improvement: The AI assistant can execute target queries tailored specifically to Pythonic patterns instead of generic text searches.
Milestone 11: Implement JS/TS and TSX *Toolkit Queries
- Difficulty: Medium
- Knowledge Needed: JS/TS AST structures.
- Deliverable: Fill in the JS/TS and TSX toolkit query stubs to handle class overrides, export patterns, and React hook dependencies.
- Behavioral Improvement: Yields accurate query results for JS/TS codebases.
Milestone 12: Implement Go and Rust *Toolkit Queries
- Difficulty: Medium
- Knowledge Needed: Go and Rust syntax structures (traits, interfaces, structs, impl blocks).
- Deliverable: Implement toolkit stubs for Go (struct composition) and Rust (trait implementations, lifetimes).
- Behavioral Improvement: Allows the AI to query traits and interface compositions accurately.
Milestone 13: Implement Java and C# *Toolkit Queries
- Difficulty: Medium
- Knowledge Needed: JVM and .NET syntax patterns.
- Deliverable: Implement toolkit stubs for Java and C# to support generic parameter constraints, annotations, and properties.
- Behavioral Improvement: Provides accurate class inheritance hierarchies and interface implementations.
Milestone 14: Implement C and C++ *Toolkit Queries
- Difficulty: Hard
- Knowledge Needed: C/C++ AST, preprocessor patterns.
- Deliverable: Implement toolkits to track macro expansions and header inclusion graphs.
- Behavioral Improvement: AI can trace complex C++ macro chains and header-source relationships.
Milestone 15: Non-Blocking Asynchronous Database Drivers (L2)
- Difficulty: Hard
- Knowledge Needed: Python
asyncio, asynchronous DB drivers (neo4j.AsyncDriver,kuzuasync routines). - Deliverable: Refactor the database connection layer to use async calls, eliminating thread pools (
asyncio.to_thread) for database operations. - Behavioral Improvement: Enhances server throughput and reduces thread overhead under heavy concurrent MCP tool calls.
Milestone 16: DB Connection Pooling (L11)
- Difficulty: Medium
- Knowledge Needed: Database connection pooling.
- Deliverable: Implement connection pooling for Neo4j and KuzuDB adapters.
- Behavioral Improvement: Eliminates connection handshake overhead for consecutive tool calls, reducing query latency.
Milestone 17: Query Result Streaming (L8)
- Difficulty: Medium
- Knowledge Needed: Python generators, streaming JSON serialization.
- Deliverable: Implement a generator-based streaming query pipeline for large Cypher query results.
- Behavioral Improvement: Eliminates out-of-memory crashes when querying large graphs.
Milestone 18: KuzuDB Dialect Compatibility Layer
- Difficulty: Medium
- Knowledge Needed: KuzuDB Cypher constraints.
- Deliverable: Implement a query rewriter that converts standard Neo4j Cypher functions into KuzuDB-compatible Cypher.
- Behavioral Improvement: Standardizes Cypher features across all backends.
Month 3: Deep AST Parsing & Semantic Ingestion Upgrades (Milestones 19–27)
Focus: Enhancing parsers, supporting non-code assets, improving incremental ingestion, and adding type inference patterns.
Milestone 19: C++ Header Parser Disambiguation (L16)
- Difficulty: Easy
- Knowledge Needed: Tree-sitter C vs C++ ASTs.
- Deliverable: Check for pure C markers in
.hfiles to select either the C or C++ parser. - Behavioral Improvement: Reduces parse errors for pure C libraries.
Milestone 20: HTML and CSS Tree-Sitter Parsers (L17)
- Difficulty: Medium
- Knowledge Needed: HTML/CSS syntax trees.
- Deliverable: Add parsers for HTML tags and CSS class declarations.
- Behavioral Improvement: Bridges the gap between frontend templates and backend logic by connecting component classes to styles.
Milestone 21: SQL, Shell & YAML Parsers (L17)
- Difficulty: Medium
- Knowledge Needed: SQL dialects, Bash syntax, Tree-sitter.
- Deliverable: Extract database queries from source code and link them to parsed SQL schemas.
- Behavioral Improvement: Extends the dependency graph to cover database interactions and configuration files.
Milestone 22: Incremental Ingestion Concurrency Tuning
- Difficulty: Medium
- Knowledge Needed: Multi-processing, file lock queues.
- Deliverable: Implement worker pools using Python's
multiprocessingfor parsing, while serializing writes to the database. - Behavioral Improvement: Speeds up initial parsing on multi-core machines.
Milestone 23: Type Inference & Symbol Reference Resolution
- Difficulty: Hard
- Knowledge Needed: AST scope analysis, basic type inference.
- Deliverable: Implement a cross-file reference resolver to link function call parameters to class instantiations.
- Behavioral Improvement: Improves the accuracy of the call graph by reducing ambiguous function name links.
Milestone 24: Incremental SCIP Ingestion (L15)
- Difficulty: Hard
- Knowledge Needed: SCIP protocol specifications, git diffs.
- Deliverable: Implement incremental SCIP indexing based on git diffs.
- Behavioral Improvement: Reduces indexing times for large projects when using SCIP.
Milestone 25: Automated SCIP Installer Script (L14)
- Difficulty: Easy
- Knowledge Needed: Shell scripting, platform binaries.
- Deliverable: Create
cgc index setup-scipto download and install language-specific SCIP binaries. - Behavioral Improvement: Reduces setup friction for SCIP indexing.
Milestone 26: AST Cognitive Complexity Calculations
- Difficulty: Medium
- Knowledge Needed: Static analysis metrics.
- Deliverable: Implement cognitive complexity parsing alongside cyclomatic complexity.
- Behavioral Improvement: AI can identify hard-to-maintain code blocks, not just branch-heavy ones.
Milestone 27: Workspace Index Size Estimation Utility
- Difficulty: Easy
- Knowledge Needed: CLI user interface design.
- Deliverable: Create an indexing pre-flight check command showing estimated node count and DB disk usage.
- Behavioral Improvement: Helps users budget disk space before indexing large codebases.
Month 4: VS Code Extension Upgrades (Milestones 28–35)
Focus: Turning the VS Code extension into a fully featured visual and analytical assistant.
Milestone 28: Interactive Webview Control Dashboard
- Difficulty: Medium
- Knowledge Needed: VS Code Extension API, React build integration.
- Deliverable: Embed the local React dashboard within a VS Code webview panel.
- Behavioral Improvement: Users can view the codebase graph directly inside the IDE.
Milestone 29: CodeLens Complexity & Dependency Markers
- Difficulty: Medium
- Knowledge Needed: VS Code CodeLens API, CGC CLI queries.
- Deliverable: Overlay cyclomatic complexity and class hierarchies above code declarations.
- Behavioral Improvement: Developers see code metrics contextually while writing code.
Milestone 30: VS Code Inline Cypher Console
- Difficulty: Medium
- Knowledge Needed: VS Code Webview panels, Cypher execution.
- Deliverable: Implement an inline Cypher query editor with syntax highlighting and table previews.
- Behavioral Improvement: Power users can query the graph without leaving the IDE.
Milestone 31: Automatic Watcher Lifecycle Integration (L18)
- Difficulty: Easy
- Knowledge Needed: VS Code Workspace Event listeners.
- Deliverable: Automatically start the file watcher thread when a workspace with
.codegraphcontext/is opened. - Behavioral Improvement: Code modifications are indexed in the background without manual CLI intervention.
Milestone 32: Diagnostics Provider for Dead Code
- Difficulty: Medium
- Knowledge Needed: VS Code DiagnosticCollection API.
- Deliverable: Expose dead code detections as warnings in the VS Code "Problems" tab.
- Behavioral Improvement: Warns developers about unused parameters and dead functions in real-time.
Milestone 33: Context-Aware Navigation (Go to Definition)
- Difficulty: Hard
- Knowledge Needed: VS Code DefinitionProvider.
- Deliverable: Implement a definition provider powered by the CGC database graph.
- Behavioral Improvement: Accelerates navigation in dynamic languages where standard VS Code definitions fail.
Milestone 34: Graph-Guided Refactoring Previews
- Difficulty: Hard
- Knowledge Needed: VS Code WorkspaceEdit API.
- Deliverable: Show a refactoring preview panel listing files that will be impacted by renaming a symbol.
- Behavioral Improvement: Reduces regression risks during large refactors.
Milestone 35: One-Click Bundle Export UI
- Difficulty: Easy
- Knowledge Needed: VS Code extension commands.
- Deliverable: Add a button to export
.cgcbundles directly from the sidebar. - Behavioral Improvement: Simplifies sharing indexed codebase contexts with team members.
Month 5: ChatGPT Web & External LLM Integration (Milestones 36–43)
Focus: Supporting remote connections, writing browser extensions, and improving the website.
Milestone 36: WebSocket & SSE MCP Transport Protocol (L1)
- Difficulty: Hard
- Knowledge Needed: WebSockets, Server-Sent Events, JSON-RPC.
- Deliverable: Add WebSocket and SSE servers to the MCP server process (
cgc mcp start --transport ws). - Behavioral Improvement: Multiple clients and IDEs can connect to a single, shared CGC database concurrently.
Milestone 37: Web LLM Browser Extension (Chrome & Firefox)
- Difficulty: Hard
- Knowledge Needed: Web Extensions API, Content Scripts, IPC.
- Deliverable: Build a browser extension that securely connects ChatGPT, Claude, and Gemini web interfaces to the local CGC MCP daemon.
- Behavioral Improvement: Web-based LLMs can run code queries against local codebases securely.
Milestone 38: Web Extension Workspace Matcher
- Difficulty: Medium
- Knowledge Needed: Chrome Tab APIs, Local storage.
- Deliverable: Detect the GitHub URL or active tab project name and select the matching local database context automatically.
- Behavioral Improvement: Standardizes LLM responses without manual context switching.
Milestone 39: In-Browser Worker Parsing Optimizations
- Difficulty: Hard
- Knowledge Needed: Web Workers, WASM memory structures, Tree-sitter WASM.
- Deliverable: Optimize
parser.worker.tswith streaming uploads and file chunking. - Behavioral Improvement: Allows the browser explorer to parse large repositories without browser tab freezes.
Milestone 40: Multi-Engine Web Visualizer Upgrades
- Difficulty: Medium
- Knowledge Needed: React-force-graph, WebGL rendering.
- Deliverable: Update
CodeGraphViewer.tsxto support WebGL for rendering large graphs. - Behavioral Improvement: Renders repositories exceeding 10,000 files smoothly.
Milestone 41: Browser-Based Cypher Builder
- Difficulty: Medium
- Knowledge Needed: React, visual query builders.
- Deliverable: Add a drag-and-drop visual Cypher query builder to the website's explore tab.
- Behavioral Improvement: Simplifies querying the graph for users unfamiliar with Cypher syntax.
Milestone 42: Web-Based Bundle Comparison Panel
- Difficulty: Medium
- Knowledge Needed: React diff libraries.
- Deliverable: Build a visual dashboard to compare two
.cgcbundles and highlight structural changes. - Behavioral Improvement: Simplifies tracking structural changes across commits.
Milestone 43: Secure Origin Policy Configuration
- Difficulty: Easy
- Knowledge Needed: Web security, CORS headers.
- Deliverable: Add strict origin validation filters to CLI configs for external connections.
- Behavioral Improvement: Protects local database ports from unauthorized web requests.
Month 6: LLM API & Local Ollama Integrations (Milestones 44–50)
Focus: Adding AI-guided summarization, local vector embeddings, and creating documentation tutorials.
Milestone 44: LLM API Key Configuration CLI
- Difficulty: Easy
- Knowledge Needed: CLI inputs, config file management.
- Deliverable: Create the
cgc config set-keycommand to securely store OpenAI, Anthropic, and Gemini API keys. - Behavioral Improvement: Provides a unified interface for cloud LLM integrations.
Milestone 45: Local Ollama Model Integration
- Difficulty: Medium
- Knowledge Needed: Ollama HTTP API, local LLM configurations.
- Deliverable: Add an Ollama adapter supporting models like
qwen2.5-coderorllama3. - Behavioral Improvement: Enables offline code analysis and semantic summarization.
Milestone 46: AI-Guided Semantic Summarizer
- Difficulty: Hard
- Knowledge Needed: LLM prompts, batch processing.
- Deliverable: Build an ingestion pipeline stage that uses LLMs to generate summaries of functions and classes, saving them as properties in the graph.
- Behavioral Improvement: Allows AI assistants to search the graph using natural language concepts.
Milestone 47: Graph RAG Vector Embedding Ingestion
- Difficulty: Hard
- Knowledge Needed: Vector embeddings, Kuzu/Neo4j vector indices.
- Deliverable: Generate embeddings of code summaries and store them in the graph database.
- Behavioral Improvement: Combines keyword search with structural graph queries for more accurate results.
Milestone 48: High-Level Architecture Blogs
- Difficulty: Easy
- Knowledge Needed: Technical writing, blogging structure.
- Deliverable: Publish a blog series detailing CGC's design (e.g., Tree-sitter parsers, database adapters, and MCP servers).
- Behavioral Improvement: Enhances community engagement and adoption.
Milestone 50: Interactive Walkthrough and Demos
- Difficulty: Easy
- Knowledge Needed: Video editing, documentation design.
- Deliverable: Produce video tutorials demonstrating VS Code integrations, browser extensions, and CLI commands.
- Behavioral Improvement: Lowers the barrier to entry for new users.
Milestone 50: Production-Ready Release (v1.0.0)
- Difficulty: Medium
- Knowledge Needed: PyPI workflows, release lifecycle management.
- Deliverable: Stabilize the API, verify all tests, and publish v1.0.0 to PyPI.
- Behavioral Improvement: Delivers a production-ready code intelligence tool.
4. Roadmap Implementation Summary
This roadmap prioritizes foundational stability and codebase cleanup in the first month before introducing advanced semantic and AI integrations.
Month 1 Month 2 Month 3 Month 4 Month 5 Month 6
+----------------+ +----------------+ +----------------+ +----------------+ +----------------+ +----------------+
| Refactoring | -----> | DB Optimization| -----> | Semantic Parse | -----> | VS Code Engine | -----> | ChatGPT Web | -----> | Ollama & RAG |
| & Benchmarks | | & Language Stubs| | & Incremental | | & Integrations | | Integrations | | Releases v1.0 |
+----------------+ +----------------+ +----------------+ +----------------+ +----------------+ +----------------+