Why I Built Delta: A Programming Language for Machine Learning

January 11, 2026

Or, what I learned while trying to get a new language to understand what my training loop really means.

The Problem I Kept Running Into

In my ML research, I often use models with several objectives, like classification, auxiliary feature learning, regularization, or KL-divergence for VAEs. Most research models have these kinds of goals.

And every single time, I'd write something like this in PyTorch:

def compute_loss(model, batch, mode='train'):
    pred = model.classify(batch.images)
    loss = F.cross_entropy(pred, batch.labels)

    if mode == 'train':
        aux_pred = model.aux_head(batch.images)
        loss += 0.3 * F.mse_loss(aux_pred, batch.attributes)
        loss += 0.01 * sum(p.norm() for p in model.parameters())
        z = model.encode(batch.images)
        loss += kl_divergence(z, normal_dist)

    return loss

Debugging this code often took days. I kept having to go back and question my earlier assumptions, and I started to notice a clear pattern.

If I forget to call an auxiliary task or accidentally leave dropout on during inference, things break. Sometimes I add new constraints to my notes but forget to update the code, or I miss when gradients are missing.

All of these issues waste time. If I forget to update the code after changing a weight, I can end up with unexpected bugs.

These bugs aren’t caused by the math or the model’s architecture. They happen when I try to turn my objectives into code, which can get confusing.

This ongoing gap between defining objectives and actually coding them made me look for new solutions. I started to wonder if a new language could help close this gap.

My thinking changed after I worked on a side project with SQL. For example, in SQL you might write:

SELECT * FROM users WHERE age > 18 AND active = true;

You tell the database what you want, and it figures out how to find the right rows, parallelize the query, and use indexes – all while understanding your intent.

In contrast, PyTorch training loops are imperative: you have to specify every step. PyTorch does exactly what you tell it, without any context. It doesn’t know which losses are for training, which parameters need gradients, or how constraints are connected.

PyTorch just runs the math operations you give it on GPUs. That’s it.

Seeing these challenges, I wondered: what if we could write machine learning objectives as declarative statements, like in SQL? That idea led to Delta. With Delta, you state your model's goals – like predicting labels or preferring smaller weights – and let the compiler handle the details.

So I Built Delta

Here's what that same model looks like in Delta:

let model = MyModel();
obs images: Tensor[Float];
obs labels: Tensor[Int];
obs attributes: Tensor[Float];

fn classify(x: Tensor[Float]) -> Tensor[Float] {
    model.classify(x)
}

let pred = classify(images);
require argmax(pred) == labels;

train {
    let aux_pred = model.aux_head(images);
    require aux_pred == attributes weight 0.3;
    prefer norm(model.parameters()) < 1.0 weight 0.01;

    let z = model.encode(images);
    require kl_divergence(z, normal_dist) < 0.1;
}

Delta’s real value is that it connects your research intent directly to code through the compiler. Delta understands the structure and goals of your machine learning models, so it can enforce, validate, and optimize based on what you mean – not just on the computations. This makes it different from other systems that only focus on execution.

But the compiler can do even more.

When I write require argmax(pred) == labels, the compiler sees this as a classification constraint. It selects a differentiable loss function (e.g., cross-entropy, hinge loss). It checks the gradient flow from the constraint to the parameters and ensures the constraint is used in the loss computation.

When I write train { ... }, the compiler makes two separate code paths. During training, all constraints are active. For inference, the whole block is removed from the generated code. It’s not just hidden with an if statement – it’s completely gone from the compiled graph.

Marking something as 'obs' or 'param' lets the compiler track gradients and warn if a parameter is not provided.

Because Delta connects code and objectives directly, it does more than just improve syntax. The compiler enforces your research intent, checks constraints, and validates your model’s objectives at compile time, all thanks to Delta’s unique semantic understanding.

To make this work, Delta uses a multi-stage compiler pipeline that turns high-level, constraint-based code into optimized PyTorch graphs. Here’s how the pipeline works:

You write Delta code, and the compiler parses it into an AST, checks types, assigns roles, and figures out effects. It decides which tensors are parameters, which are data, which operations are differentiable, and whether code runs during training or inference.

The typed AST is then converted to SIR (Semantic Intermediate Representation), where each operation gets a role: whether it’s differentiable, has gradients, runs in training or inference, and whether it connects to the loss.

Next, the optimizer runs. The constraint compiler turns require and prefer into real loss terms. The mode specializer picks training or inference graphs. The relaxation step changes hard control flow into soft, differentiable operations. The gradient analyzer checks that all parameters can be reached from the loss.

Finally, everything is turned into PyTorch FX graphs that can actually run.

Why SIR Matters

SIR is where everything happens. For example:

param w = randn(10, 5);
const bias = 0.1;
obs x: Tensor[Float];

let y = w @ x + bias;

train {
    let dropout_mask = rand() > 0.5;
    y = y * dropout_mask;
}

Each SIR operation has metadata. The compiler keeps track of parameters, constants, observed data, whether things are differentiable, and the execution mode. It also makes sure dropout only runs during training.

With this setup, the compiler builds optimized graphs, checks if parameters can be reached from the loss, and reports any non-differentiable operations or problems with the training objective.

Constraint Compilation

Here's where things get really interesting. In PyTorch, when you want your predictions to match your labels, you manually write:

loss = F.cross_entropy(predictions, labels)

You have to know that cross-entropy is used for classification, add it to the loss, and check if it’s differentiable.

In Delta, you write:

require argmax(pred) == labels;

The compiler sees this as a classification objective. It knows that argmax isn’t differentiable, so it uses an approximation like cross-entropy and automatically creates the right loss term.

For soft constraints, you can write:

prefer norm(weights) < 1.0 weight 0.01;

The compiler reads the regularization preference, builds a penalty, and adds 0.01 to the loss if the norm is greater than 1.0. It separates hard and soft constraints, picks the active ones for each mode, and checks if everything is differentiable and connected to the loss.

Differentiable Control Flow

Differentiable programming makes control flow tricky. Regular if statements aren’t differentiable. In PyTorch, you either have to avoid them or write your own soft versions.

Delta handles this automatically. You can write:

if x > 0 temperature 0.1 {
    activate(x)
} else {
    deactivate(x)
}

The temperature parameter tells the compiler to make this a soft, differentiable conditional. It compiles to:

soft_condition = sigmoid((x - 0) / 0.1)
result = soft_condition * activate(x) + (1 - soft_condition) * deactivate(x)

Higher temperature acts as a hard condition; lower temperature softens it; the compiler adjusts this automatically.

Use Delta when you must define, validate, and manage complex learning objectives or differentiation logic. For standard ML tasks, use traditional frameworks.

For standard supervised models, deployment, or when you need the Python ecosystem and mature tools, use PyTorch. Delta is still early in development, and its API and tools aren’t as mature as PyTorch’s.

Delta is best for research with complex, shifting losses, fast iteration, and when constraint bugs are costly.

In these cases, features like compiler validation of gradient flow, automatic code specialization by mode, and catching constraint errors at compile time can make using a new language worthwhile.

What I Learned

Building Delta taught me a lot about the gap between writing ML code and expressing ML intent. PyTorch is great for writing code – it gives you full control, powerful abstractions, and strong debugging. But it doesn’t know what you want your tensor operations to achieve.

There’s real value in languages that understand the meaning of what you’re doing. SQL understands relational queries. Regular expressions understand pattern matching. Type systems understand program structure. Delta is my attempt to build a language that understands machine learning objectives.

The main idea is that constraints and training objectives have a structure that a compiler can understand. When you write "require X == Y", it’s not just syntax – it’s a statement about what you want your model to learn. A compiler that gets this can validate, optimize, and help you avoid whole categories of bugs.

It’s not clear if Delta will become widely used, since it’s experimental software built for a specific research workflow. Still, the ideas of treating constraints as first-class elements and using compiler techniques to ensure correctness are valuable beyond this project.

At the very least, building Delta has made me a better ML engineer. I now think more carefully about what I'm optimizing for, not just how I implement it. I write training objectives in a more declarative way and see the benefits of languages that understand what I want to achieve, not just the instructions I write.

Try It Out

If this sounds interesting and you’re working on research with complex training objectives, give Delta a try. The code is on GitHub, and you can find documentation at deltalanguage.org.

Keep in mind that Delta is early-stage software. The API might change, and there could be compiler bugs. Still, if your workflow involves frequent changes to complex multi-objective losses, Delta might help you spend less time debugging.

If Delta doesn’t fit your needs or if PyTorch works better for you, that’s totally fine. Different tools work for different problems. Delta was built for my workflow, and other workflows might need other solutions.

It’s worth exploring languages that can understand user intent, not just instructions. These languages can catch bugs at compile time and let you specify learning objectives declaratively, leaving the implementation details to the compiler.

That’s the vision, even if the long-term outcome is still uncertain.