ractogateway.pipelines.sql_analyst._schema

Helpers for schema fetching, connection building, code extraction, and pandas execution.

ractogateway.pipelines.sql_analyst._schema.clear_schema_cache()[source]

Evict all entries from the in-process schema cache.

Useful in tests or when the database schema changes at runtime.

Return type:: None

class ractogateway.pipelines.sql_analyst._schema.SchemaFetcher[source]

Bases: object

Introspects a SQLAlchemy engine and returns a rich, LLM-ready schema string.

The richer the schema, the better the SQL the LLM produces. Three optional enrichments go beyond basic column names and types:

Indexes (include_indexes=True, default) — tells the LLM which columns are indexed so it can write efficient WHERE conditions.
Row counts (include_row_counts=True) — tells the LLM the scale of each table (e.g. ~1,200 rows vs ~50,000,000 rows) so it knows when to be careful with joins and can make smarter filtering choices.
Sample values (include_sample_values=True) — for low-cardinality string columns the fetcher appends a few representative distinct values so the LLM knows the exact strings to use in WHERE clauses (e.g. status IN ('active', 'inactive') rather than guessing).

Example output (all enrichments on):

Table: orders  (~12,540 rows)
  id INTEGER NOT NULL
  status VARCHAR  -- e.g. 'pending', 'shipped', 'delivered', 'cancelled'
  customer_id INTEGER NOT NULL
  total NUMERIC NOT NULL
  created_at TIMESTAMP NOT NULL
  PRIMARY KEY (id)
  FOREIGN KEY (customer_id) REFERENCES customers(id)
  INDEX (status)
  INDEX (created_at)

Table: customers  (~3,200 rows)
  id INTEGER NOT NULL
  name VARCHAR NOT NULL
  region VARCHAR  -- e.g. 'North', 'South', 'East', 'West'
  PRIMARY KEY (id)
  INDEX UNIQUE (email)

static fetch(engine, *, include_indexes=True, include_row_counts=False, include_sample_values=False, sample_value_limit=8, sample_cardinality_threshold=25)[source]

Return an enriched schema string by inspecting engine.

Parameters:

engine (Any) – A SQLAlchemy Engine instance.
include_indexes (bool) – Append INDEX lines for each non-PK index. Zero extra queries — data comes from the SQLAlchemy inspector. Default: True.
include_row_counts (bool) – Append approximate row counts to table headers via SELECT COUNT(*). One extra query per table. Default: False.
include_sample_values (bool) – For string/categorical columns with ≤ sample_cardinality_threshold distinct values, append sample values as an inline comment. One extra query per eligible column. Default: False.
sample_value_limit (int) – Maximum number of sample values to show per column. Default: 8.
sample_cardinality_threshold (int) – Only show sample values when the column has at most this many distinct values (avoids showing 10k-row free-text columns). Default: 25.

Return type:

str