awhvish

An automated PR reviewer that understands your codebase, not just the diff — and posts severity-rated inline comments directly to GitHub.

How it works

Parsing — tree-sitter parses the repository into ASTs, chunked by function and class boundaries, preserving structural context that plain text splitting loses.
Indexing — chunks are embedded and stored in ChromaDB. BM25 keyword indices run in parallel for hybrid retrieval.
Retrieval — for each changed file in the PR, the system retrieves the top-k semantically and lexically similar chunks from the rest of the codebase.
Review — retrieved context + the diff are passed through LangChain prompt templates to OpenAI and Gemini APIs. The model produces severity-rated inline comments anchored to specific line ranges.
Posting — comments are posted via the GitHub API as review threads, improving consistency across 10+ PRs.

Why AST-based chunking

Arbitrary text splits break function boundaries and lose scope context. AST chunking keeps each chunk semantically whole — the model sees complete function signatures, not fragments.

Stack

Node.js · ChromaDB · tree-sitter · LangChain · OpenAI API · Gemini API · GitHub API