ractogateway.rag.processors.lemmatizer

Lemmatization processor — uses NLTK WordNetLemmatizer (lazy import).

Install with: pip install ractogateway[rag-nlp]

Note: Lemmatization changes the surface form of text and can degrade embedding quality for neural models (which were trained on unmodified text). Use this processor only when building keyword-index pipelines or when explicitly required for your retrieval strategy.

class ractogateway.rag.processors.lemmatizer.Lemmatizer(use_pos_tagging=True)[source]

Bases: BaseProcessor

Reduce words to their base (lemma) form using NLTK WordNet.

Parameters:

use_pos_tagging (bool) – If True, use POS tagging to improve lemmatization accuracy. Slightly slower but produces better results.

process(text)[source]

Process text and return the transformed string.

Parameters:

text (str) – Input text (chunk content or raw document content).

Return type:

str

Returns:

str – Processed text. Must be a non-empty string when input is non-empty.