SQL Analyst Pipeline

SQLAnalystPipeline turns a plain-English question into a DB-backed answer:

  1. Generate read-only SQL from schema + question

  2. Execute SQL and build a dataframe

  3. Optionally generate analysis code (pandas or polars)

  4. Optionally generate a markdown answer

  5. Optionally build a Plotly chart

Use AsyncSQLAnalystPipeline when your app stack is async.

Best Use Cases

  • Natural language analytics over PostgreSQL/MySQL-compatible schemas

  • Internal data assistant for support, finance, growth, and operations teams

  • Fast prototyping of BI-style Q&A endpoints

  • Safe query generation with table/column restrictions and masking

Minimal Example

from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import SQLAnalystPipeline

pipeline = SQLAnalystPipeline(
    kit=gpt.Chat(model="gpt-4o"),
    safe_mode=True,
)

result = pipeline.run(
    user_query="Top 5 products by quantity sold in the last 30 days",
    connection_string="postgresql://user:pass@localhost:5432/shop",
)

if result.error:
    print("Pipeline error:", result.error)
else:
    print(result.sql_query)
    print(result.answer)

Async Example

from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import AsyncSQLAnalystPipeline

pipeline = AsyncSQLAnalystPipeline(
    kit=gpt.Chat(model="gpt-4o"),
    pandas_kit=gpt.Chat(model="gpt-4o-mini"),
)

result = await pipeline.run(
    user_query="Monthly revenue trend for 2025",
    connection_string="postgresql://user:pass@localhost:5432/analytics",
)

Per-Step Model Control

Use different models per step for quality/cost balance:

pipeline = SQLAnalystPipeline(
    kit=gpt.Chat(model="gpt-4o-mini"),      # fallback
    sql_kit=gpt.Chat(model="gpt-4o"),       # highest quality SQL
    pandas_kit=gpt.Chat(model="gpt-4o-mini"),
    answer_kit=gpt.Chat(model="gpt-4o"),
)

Security and Data Governance

SQLAnalystPipeline includes controls designed for production access patterns:

  • force_read_only=True blocks non-SELECT SQL

  • allowed_tables=[...] hides all other tables from the LLM

  • blocked_columns=[...] removes sensitive columns from schema context

  • mask_columns=[...] masks values in returned rows and answer context

  • max_rows=... auto-injects LIMIT when needed

pipeline = SQLAnalystPipeline(
    kit=gpt.Chat(model="gpt-4o"),
    allowed_tables=["orders", "customers", "products"],
    blocked_columns=["ssn", "credit_card_number"],
    mask_columns=["email", "phone"],
    max_rows=5000,
)

Charting

Use chart="auto" (default) or pass an explicit ChartSpec:

from ractogateway.pipelines import ChartSpec

result = pipeline.run(
    user_query="Revenue by category",
    connection_string="postgresql://user:pass@localhost:5432/shop",
    chart=ChartSpec(
        chart_type="bar",
        x="category",
        y="revenue",
        title="Revenue by Category",
    ),
)

if result.plotly_figure is not None:
    result.plotly_figure.show()

Analysis Engine: Pandas or Polars

pipeline = SQLAnalystPipeline(
    kit=gpt.Chat(model="gpt-4o"),
    analysis_engine="polars",  # "pandas" (default) or "polars"
)

Install extras:

pip install "ractogateway[pipelines-sql-polars]"

Result Object

run()/arun() return SQLAnalystResult with:

  • sql_query, columns, raw_rows, row_count

  • pandas_code, pandas_result

  • answer

  • chart_spec, plotly_figure

  • usage token counters

  • error (populated in safe_mode=True)

Export helpers:

result.to_csv("query_output.csv")
result.to_json("query_output.json")
result.to_excel("query_output.xlsx")