Chunking Strategies for RAG: Splitting Your Documents

Part of the RAG series.

Chunking is how you split your documents into the pieces that get embedded and retrieved in RAG. Chunk size, boundaries and overlap strongly affect retrieval quality: too small and you lose context; too large and you mix topics and waste context window. This post walks through common chunking strategies and practical choices.

Why Chunking Matters

RAG retrieves chunks, not full documents. Each chunk is embedded and stored; at query time you fetch the top-k chunks and send them to the LLM. So chunk boundaries define the unit of retrieval. If a chunk cuts a sentence or concept in half, retrieval and reading both suffer. If chunks are huge, you use context window on irrelevant parts and blur the signal. Good chunking keeps each piece self-contained and sized for both your embedder and your context window.

Fixed-Size Chunking

The simplest approach: split text into segments of N characters or tokens, optionally with a small overlap between consecutive chunks so that phrases that span a boundary still appear in at least one chunk. Overlap of 50–200 characters (or ~10–20% of chunk size) is common.

# Pseudocode: fixed-size chunking with overlap
def chunk_fixed(text, size=512, overlap=50):
    start = 0
    while start < len(text):
        end = start + size
        yield text[start:end]
        start = end - overlap

Pros: simple, predictable. Cons: can split mid-sentence or mid-concept; no awareness of structure.

Sentence- and Paragraph-Based Chunking

Split on sentence or paragraph boundaries so each chunk is a coherent unit. You still enforce a max size: e.g. "up to N sentences" or "up to N paragraphs," and when you hit the limit you start a new chunk. This avoids cutting sentences in half and often improves readability for the model.

Sentence-based. Split on sentence boundaries (e.g. with a sentence tokenizer or regex). Group sentences until you reach a target token or character count, then start a new chunk.
Paragraph-based. Split on double newlines or similar. Good for docs that are already well structured.

Semantic Chunking

Use a model or heuristic to find "natural" boundaries: topic shifts, section headers, or embedding-based similarity. The idea is to keep highly related text together and split when meaning changes. Some libraries use embeddings to decide where to split (e.g. when similarity between adjacent sentences drops). More accurate but more compute and complexity.

Practical Choices

Chunk size. Often 256–1024 tokens (or ~200–800 words) per chunk. Small enough to keep focus, large enough to hold a full thought. Match to your embedder's max length and how much context you send to the LLM.
Overlap. 10–20% overlap is a common starting point. Reduces boundary effects; slightly increases storage and indexing cost.
Metadata. Attach doc id, section, title, etc. to each chunk so you can filter or cite by source.

There is no single best strategy for every corpus. Start with sentence- or paragraph-aware chunking and a sensible size; tune overlap and size with retrieval and end-to-end quality. For how those chunks are retrieved, see Retrieval Methods in RAG and Evaluating RAG Systems.

Get in touch

Questions about RAG or AI knowledge systems? Tell us about your project.