TreeSitter Chunker Documentation¶
TreeSitter Chunker is a powerful Python library for semantically chunking source code using Tree-sitter parsers. It intelligently splits code into meaningful units like functions, classes, and methods, making it perfect for code analysis, embeddings, and documentation generation.
API Reference
Key Features¶
Semantic Understanding: Extracts functions, classes, methods based on AST
High Performance: Efficient parser caching and pooling (11.9x speedup)
Language Support: Python, JavaScript, Rust, C, C++ with plugin architecture
Multiple Export Formats: JSON, JSONL, Parquet, GraphML, Neo4j, SQLite
Thread Safe: Designed for concurrent processing
Zero Config: Works out of the box with sensible defaults
Universal Language Support: Auto-download 100+ languages
Quick Example¶
from chunker import chunk_file
# Chunk a Python file
chunks = chunk_file("example.py", language="python")
# Process results
for chunk in chunks:
print(f"{chunk.node_type} at lines {chunk.start_line}-{chunk.end_line}")
print(f" {chunk.content.split(chr(10))[0]}...")
Installation¶
# Using pip
pip install treesitter-chunker
# Using uv (recommended for development)
uv pip install treesitter-chunker
# From source
git clone https://github.com/Consiliency/treesitter-chunker
cd treesitter-chunker
pip install -e ".[dev]"