Building an LLM-Powered SQL Migration Tool

2025-10-15

Database migrations are painful. Schema conversion is mostly mechanical: type mappings, constraint syntax. Stored procedures are different. They're business logic in proprietary dialects: Oracle's PL/SQL, SQL Server's T-SQL, PostgreSQL's PL/pgSQL. Each has its own control flow, error handling, and weird edge cases.

I spent a few months building a tool that uses Claude to convert procedural SQL between databases. This is what I learned about where the complexity actually is.

The Problem

A typical Oracle function you might need to migrate:

CREATE FUNCTION get_employee_bonus(
    emp_id NUMBER,
    fiscal_year NUMBER
) RETURN NUMBER AS
    v_salary NUMBER;
    v_performance VARCHAR2(20);
    v_bonus NUMBER := 0;
BEGIN
    SELECT salary, performance_rating
    INTO v_salary, v_performance
    FROM employees
    WHERE employee_id = emp_id;

    IF v_performance = 'EXCEEDS' THEN
        v_bonus := v_salary * 0.15;
    ELSIF v_performance = 'MEETS' THEN
        v_bonus := v_salary * 0.10;
    ELSE
        v_bonus := v_salary * 0.05;
    END IF;

    RETURN NVL(v_bonus, 0);
EXCEPTION
    WHEN NO_DATA_FOUND THEN
        RETURN 0;
END;

Oracle-specific types (NUMBER, VARCHAR2). PL/SQL control flow. Exception handling. The NVL function. All of it needs to change.

The Easy Part

The LLM call itself is trivial:

resp, err := client.Messages.Create(ctx, anthropic.MessageCreateParams{
    Model: "claude-sonnet-4-20250514",
    Messages: []anthropic.MessageParam{
        {Role: "user", Content: "Convert this Oracle function to PostgreSQL: " + sql},
    },
})

Ten lines of code. Works maybe 40% of the time. When it fails: using features that don't exist, or subtly changing behavior in ways you wouldn't notice until production.

The LLM can do the conversion. The hard part is everything else.

The Hard Parts

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Splitter  │────▶│  Classifier │────▶│   Context   │
│             │     │             │     │   Builder   │
└─────────────┘     └─────────────┘     └──────┬──────┘
                                               │
                    ┌──────────────────────────┘
                    ▼
              ┌───────────┐     ┌───────────┐     ┌───────────┐
              │    LLM    │────▶│ Validator │────▶│  Output   │
              │  Adapter  │     │           │     │           │
              └─────┬─────┘     └─────┬─────┘     └───────────┘
                    │                 │
                    │    ┌────────────┘
                    │    │ validation failed
                    │    ▼
              ┌─────┴────────┐
              │ Retry Loop   │
              │ + Learning   │
              └──────────────┘

1. Getting the context right

A generic prompt gets generic results. To get reliable conversions, you need to understand what you're converting before you convert it.

Splitting: Most files have multiple functions. Split into units so each gets its own context. Parse dialect-specific boundaries: Oracle's END;, T-SQL's GO, PostgreSQL's $$ delimiters.

Classification: Different SQL patterns need different conversion strategies. Two axes:

For the Oracle example, classification returns:

{
  "purity": "read_only",
  "features": ["has_exception_handling", "has_control_flow"],
  "strategy": "plpgsql_function"
}

Prompt construction: Once you know what you're dealing with, build a prompt with:

Few-shot beats zero-shot for anything syntax-sensitive. System prompt gets cached so you're not paying to re-process the same mappings for every function in a file.

2. Catching bad output

Most LLM tools generate plausible-looking output and ship it. Fine for a demo. Not fine for production.

Static checks: Regex for known anti-patterns before hitting the database. Did it use BIGINT instead of INT8? Did it use SERIAL (causes hotspots)? Did it try EXECUTE IMMEDIATE (doesn't exist)?

Live validation: Spin up a temp schema, try to create the function, see if it compiles. If the database accepts it, at least it's syntactically valid.

Semantic validation (does it compute the same thing?) is harder to automate. I generate test cases for that separately.

3. Recovering from failures

When validation fails, you have information. Use it.

First retry, include the error:

⚠️ PREVIOUS ATTEMPT FAILED
Errors from previous conversion:
- Line 12: PL/pgSQL feature not supported
  Suggestion: Use LANGUAGE SQL instead of plpgsql

Second retry, escalate to explicit constraints:

🚨 CRITICAL: PREVIOUS ATTEMPTS FAILED
You MUST:
- DO NOT use LANGUAGE plpgsql
- DO NOT use DECLARE blocks
- USE LANGUAGE SQL only

All caps and explicit prohibitions. Not subtle, but it works. Recovers about 90% of first-attempt failures.

4. Learning across a batch

Functions in the same file tend to use the same patterns. If function 3 failed because cursors don't work, function 4 probably has cursors too.

Track what works and what doesn't across units in a file, include that in subsequent prompts:

"Previous conversions: read_only functions worked best with sql_function strategy (3 successes, 1 failure)."

By the tenth function, first-attempt success rate goes from ~70% to ~85%. The model learns from its own mistakes. Not through fine-tuning, just through prompt context.

Numbers

These patterns generalize. Linters, transpilers, refactoring tools: anything that transforms code benefits from the same loop. Classify, contextualize, generate, validate, learn.