An automated PR reviewer that understands your codebase, not just the diff — and posts severity-rated inline comments directly to GitHub.
How it works
- Parsing — tree-sitter parses the repository into ASTs, chunked by function and class boundaries, preserving structural context that plain text splitting loses.
- Indexing — chunks are embedded and stored in ChromaDB. BM25 keyword indices run in parallel for hybrid retrieval.
- Retrieval — for each changed file in the PR, the system retrieves the top-k semantically and lexically similar chunks from the rest of the codebase.
- Review — retrieved context + the diff are passed through LangChain prompt templates to OpenAI and Gemini APIs. The model produces severity-rated inline comments anchored to specific line ranges.
- Posting — comments are posted via the GitHub API as review threads, improving consistency across 10+ PRs.
Why AST-based chunking
Arbitrary text splits break function boundaries and lose scope context. AST chunking keeps each chunk semantically whole — the model sees complete function signatures, not fragments.
Stack
Node.js · ChromaDB · tree-sitter · LangChain · OpenAI API · Gemini API · GitHub API