ractogateway.pipelines.sql_analyst._schema
Helpers for schema fetching, connection building, code extraction, and pandas execution.
- ractogateway.pipelines.sql_analyst._schema.clear_schema_cache()[source]
Evict all entries from the in-process schema cache.
Useful in tests or when the database schema changes at runtime.
- Return type:
- class ractogateway.pipelines.sql_analyst._schema.SchemaFetcher[source]
Bases:
objectIntrospects a SQLAlchemy engine and returns a rich, LLM-ready schema string.
The richer the schema, the better the SQL the LLM produces. Three optional enrichments go beyond basic column names and types:
Indexes (
include_indexes=True, default) — tells the LLM which columns are indexed so it can write efficient WHERE conditions.Row counts (
include_row_counts=True) — tells the LLM the scale of each table (e.g. ~1,200 rows vs ~50,000,000 rows) so it knows when to be careful with joins and can make smarter filtering choices.Sample values (
include_sample_values=True) — for low-cardinality string columns the fetcher appends a few representative distinct values so the LLM knows the exact strings to use in WHERE clauses (e.g.status IN ('active', 'inactive')rather than guessing).
Example output (all enrichments on):
Table: orders (~12,540 rows) id INTEGER NOT NULL status VARCHAR -- e.g. 'pending', 'shipped', 'delivered', 'cancelled' customer_id INTEGER NOT NULL total NUMERIC NOT NULL created_at TIMESTAMP NOT NULL PRIMARY KEY (id) FOREIGN KEY (customer_id) REFERENCES customers(id) INDEX (status) INDEX (created_at) Table: customers (~3,200 rows) id INTEGER NOT NULL name VARCHAR NOT NULL region VARCHAR -- e.g. 'North', 'South', 'East', 'West' PRIMARY KEY (id) INDEX UNIQUE (email)
- static fetch(engine, *, include_indexes=True, include_row_counts=False, include_sample_values=False, sample_value_limit=8, sample_cardinality_threshold=25)[source]
Return an enriched schema string by inspecting engine.
- Parameters:
engine (
Any) – A SQLAlchemyEngineinstance.include_indexes (
bool) – AppendINDEXlines for each non-PK index. Zero extra queries — data comes from the SQLAlchemy inspector. Default:True.include_row_counts (
bool) – Append approximate row counts to table headers viaSELECT COUNT(*). One extra query per table. Default:False.include_sample_values (
bool) – For string/categorical columns with ≤ sample_cardinality_threshold distinct values, append sample values as an inline comment. One extra query per eligible column. Default:False.sample_value_limit (
int) – Maximum number of sample values to show per column. Default:8.sample_cardinality_threshold (
int) – Only show sample values when the column has at most this many distinct values (avoids showing 10k-row free-text columns). Default:25.
- Return type: