PR Reviewer

A GitHub bot that automatically reviews pull requests and posts inline code suggestions with severity ratings. Built on a RAG pipeline using ASTs (tree-sitter) to parse codebases and ChromaDB for vector search, generating context-aware LLM feedback via OpenAI and Gemini APIs.

An automated PR reviewer that understands your codebase, not just the diff — and posts severity-rated inline comments directly to GitHub.

How it works

  1. Parsing — tree-sitter parses the repository into ASTs, chunked by function and class boundaries, preserving structural context that plain text splitting loses.
  2. Indexing — chunks are embedded and stored in ChromaDB. BM25 keyword indices run in parallel for hybrid retrieval.
  3. Retrieval — for each changed file in the PR, the system retrieves the top-k semantically and lexically similar chunks from the rest of the codebase.
  4. Review — retrieved context + the diff are passed through LangChain prompt templates to OpenAI and Gemini APIs. The model produces severity-rated inline comments anchored to specific line ranges.
  5. Posting — comments are posted via the GitHub API as review threads, improving consistency across 10+ PRs.

Why AST-based chunking

Arbitrary text splits break function boundaries and lose scope context. AST chunking keeps each chunk semantically whole — the model sees complete function signatures, not fragments.

Stack

Node.js · ChromaDB · tree-sitter · LangChain · OpenAI API · Gemini API · GitHub API