ractogateway.rag.readers.html_reader

HTML reader — uses stdlib html.parser (no extra deps).

class ractogateway.rag.readers.html_reader.HtmlReader[source]

Bases: BaseReader

Extract visible text from HTML files using the stdlib HTML parser.

No external dependencies required.

Accepts a file path (str / Path), raw bytes, or any binary file-like object with a .read() method.

property supported_extensions: frozenset[str]

Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.