ractogateway.rag.chunkers.sentence_chunker

Sentence-aware chunker — uses NLTK sent_tokenize (lazy import).

Install with: pip install ractogateway[rag-nlp]

class ractogateway.rag.chunkers.sentence_chunker.SentenceChunker(sentences_per_chunk=5, overlap_sentences=1, language='english')[source]

Bases: BaseChunker

Split text into groups of sentences using NLTK.

Parameters:
  • sentences_per_chunk (int) – Number of sentences per chunk.

  • overlap_sentences (int) – Number of sentences to repeat at the start of the next chunk.

  • language (str) – Language for the NLTK sentence tokenizer (default: "english").

chunk(document)[source]

Split document into chunks.

Parameters:

document (Document) – The fully-loaded document to split.

Return type:

list[Chunk]

Returns:

list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.