Building an LLM-Powered SQL Migration Tool
Database migrations are painful. Schema conversion is mostly mechanical: type mappings, constraint syntax. Stored procedures are different. They're business logic in proprietary dialects: Oracle's PL/SQL, SQL Server's T-SQL, PostgreSQL's PL/pgSQL. Each has its own control flow, error handling, and weird edge cases.
I spent a few months building a tool that uses Claude to convert procedural SQL between databases. This is what I learned about where the complexity actually is.
The Problem
A typical Oracle function you might need to migrate:
CREATE FUNCTION get_employee_bonus(
emp_id NUMBER,
fiscal_year NUMBER
) RETURN NUMBER AS
v_salary NUMBER;
v_performance VARCHAR2(20);
v_bonus NUMBER := 0;
BEGIN
SELECT salary, performance_rating
INTO v_salary, v_performance
FROM employees
WHERE employee_id = emp_id;
IF v_performance = 'EXCEEDS' THEN
v_bonus := v_salary * 0.15;
ELSIF v_performance = 'MEETS' THEN
v_bonus := v_salary * 0.10;
ELSE
v_bonus := v_salary * 0.05;
END IF;
RETURN NVL(v_bonus, 0);
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN 0;
END;
Oracle-specific types (NUMBER, VARCHAR2). PL/SQL control flow. Exception handling. The NVL function. All of it needs to change.
The Easy Part
The LLM call itself is trivial:
resp, err := client.Messages.Create(ctx, anthropic.MessageCreateParams{
Model: "claude-sonnet-4-20250514",
Messages: []anthropic.MessageParam{
{Role: "user", Content: "Convert this Oracle function to PostgreSQL: " + sql},
},
})
Ten lines of code. Works maybe 40% of the time. When it fails: using features that don't exist, or subtly changing behavior in ways you wouldn't notice until production.
The LLM can do the conversion. The hard part is everything else.
The Hard Parts
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Splitter │────▶│ Classifier │────▶│ Context │
│ │ │ │ │ Builder │
└─────────────┘ └─────────────┘ └──────┬──────┘
│
┌──────────────────────────┘
▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ LLM │────▶│ Validator │────▶│ Output │
│ Adapter │ │ │ │ │
└─────┬─────┘ └─────┬─────┘ └───────────┘
│ │
│ ┌────────────┘
│ │ validation failed
│ ▼
┌─────┴────────┐
│ Retry Loop │
│ + Learning │
└──────────────┘
1. Getting the context right
A generic prompt gets generic results. To get reliable conversions, you need to understand what you're converting before you convert it.
Splitting: Most files have multiple functions. Split into units so each gets its own context. Parse dialect-specific boundaries: Oracle's END;, T-SQL's GO, PostgreSQL's $$ delimiters.
Classification: Different SQL patterns need different conversion strategies. Two axes:
- Purity: Side effects? Pure functions can sometimes become SQL expressions. Functions with INSERT/UPDATE/DELETE need careful handling.
- Features: Cursors need to become CTEs. Dynamic SQL often can't be converted. Exception handling translates differently per dialect.
For the Oracle example, classification returns:
{
"purity": "read_only",
"features": ["has_exception_handling", "has_control_flow"],
"strategy": "plpgsql_function"
}
Prompt construction: Once you know what you're dealing with, build a prompt with:
- Type mappings baked in (
NUMBER→DECIMAL,NVL→COALESCE) - Feature-specific instructions (how to handle cursors, exceptions, etc.)
- 2-3 examples from a registry that match the detected features
Few-shot beats zero-shot for anything syntax-sensitive. System prompt gets cached so you're not paying to re-process the same mappings for every function in a file.
2. Catching bad output
Most LLM tools generate plausible-looking output and ship it. Fine for a demo. Not fine for production.
Static checks: Regex for known anti-patterns before hitting the database. Did it use BIGINT instead of INT8? Did it use SERIAL (causes hotspots)? Did it try EXECUTE IMMEDIATE (doesn't exist)?
Live validation: Spin up a temp schema, try to create the function, see if it compiles. If the database accepts it, at least it's syntactically valid.
Semantic validation (does it compute the same thing?) is harder to automate. I generate test cases for that separately.
3. Recovering from failures
When validation fails, you have information. Use it.
First retry, include the error:
⚠️ PREVIOUS ATTEMPT FAILED Errors from previous conversion: - Line 12: PL/pgSQL feature not supported Suggestion: Use LANGUAGE SQL instead of plpgsql
Second retry, escalate to explicit constraints:
🚨 CRITICAL: PREVIOUS ATTEMPTS FAILED You MUST: - DO NOT use LANGUAGE plpgsql - DO NOT use DECLARE blocks - USE LANGUAGE SQL only
All caps and explicit prohibitions. Not subtle, but it works. Recovers about 90% of first-attempt failures.
4. Learning across a batch
Functions in the same file tend to use the same patterns. If function 3 failed because cursors don't work, function 4 probably has cursors too.
Track what works and what doesn't across units in a file, include that in subsequent prompts:
"Previous conversions: read_only functions worked best with sql_function strategy (3 successes, 1 failure)."
By the tenth function, first-attempt success rate goes from ~70% to ~85%. The model learns from its own mistakes. Not through fine-tuning, just through prompt context.
Numbers
- Naive prompting: ~40% first-attempt success
- With classification + context: ~70%
- With retry: ~90%
- With cross-unit learning: first-attempt hits ~85% by function 10
- Prompt caching: 80-90% cost reduction for batch jobs
These patterns generalize. Linters, transpilers, refactoring tools: anything that transforms code benefits from the same loop. Classify, contextualize, generate, validate, learn.