SQL Adapter

Extract, validate, and transpile SQL queries from LLM responses into execution-ready formats.

SQL Adapter ensures that LLM-generated SQL statements are clean, syntactically valid, stripped of comments, and transpiled to the correct target SQL dialect. It processes free-form text, handles markdown code fences, merges contiguous statements, and handles dialect-specific differences.

All extraction and comment-stripping behaviors are always active.

Stage

Stage	Behavior
POST only	Processes the LLM response: extracts the first valid SQL statement(s), strips inline/block comments, transpiles to the configured target dialect, and formats the output compactly

What It Does

The module applies the following steps to every LLM response, in order:

Extraction:
- Searches for markdown code fences (e.g. ```sql or ```) containing valid SQL statements.
- If no fences are present, scans the raw response text for plain SQL queries.
- Parses the statements using an intelligent auto-detecting parser that checks the configured target dialect and common SQL dialects.
- If multiple distinct SQL blocks are found, it raises a validation exception to prevent ambiguity.
Comment Stripping: Recursively strips all inline comments (-- comment) and block comments (/* comment */) from the AST.
Transpilation: Transpiles and formats the SQL statements compactly for the selected Target SQL Dialect.
Validation: Appends a trailing semicolon to the query and returns the normalized string along with execution statistics (such as the detected source dialect and statement count).

Example

Raw LLM response:

Here is the SQL query to get the top 10 highest-paid employees:

```sql
-- Get highest earners
SELECT TOP 10 *
FROM employees
ORDER BY salary DESC; /* Sort by salary descending */
```

After normalization (Target Dialect: PostgreSQL):

SELECT * FROM employees ORDER BY salary DESC NULLS LAST LIMIT 10;

Configuration

Target SQL Dialect

The database SQL dialect to transpile the query to. If not specified, it defaults to PostgreSQL.

The module supports the following 31 target SQL dialects:

Athena (athena)
BigQuery (bigquery)
ClickHouse (clickhouse)
Databricks (databricks)
Doris (doris)
Dremio (dremio)
Drill (drill)
Druid (druid)
DuckDB (duckdb)
Dune (dune)
Exasol (exasol)
Fabric (fabric)
Hive (hive)
Materialize (materialize)
MySQL (mysql)
Oracle (oracle)
PostgreSQL (postgres, default)
Presto (presto)
PRQL (prql)
Redshift (redshift)
RisingWave (risingwave)
Snowflake (snowflake)
Solr (solr)
Spark SQL (spark)
Spark 2 (spark2)
SQLite (sqlite)
StarRocks (starrocks)
Tableau (tableau)
Teradata (teradata)
Trino (trino)
T-SQL / SQL Server (tsql)

Failure Behavior

If the LLM response contains no SQL queries, contains multiple distinct SQL blocks, or contains invalid SQL that cannot be parsed by any supported dialect, the module raises a pipeline exception immediately.

This ensures that downstream execution systems never run malformed or invalid queries against your databases.

Observability & Fallbacks

The module logs warning-level entries if it has to fall back and guess a source dialect (e.g., if the LLM generated SQL Server syntax but target dialect was PostgreSQL). The detected source dialect is returned in the execution statistics as source_dialect for auditing and prompt tuning.

Example Use Cases

AI-driven BI & Analytics: Safely extract user-facing queries generated by LLMs and run them on target analytics databases.
Dialect Portability: Transpile generic or dialect-specific SQL (e.g., T-SQL, MySQL) generated by models into database-specific syntax (e.g. Postgres, ClickHouse).
Security & Comment Stripping: Automatically remove inline and block comments from LLM-generated code before executing it.
Strict Single-Query Enforcement: Prevent SQL injection or multi-query execution risks by validating that exactly one SQL query block is returned.