Program Synthesis
Introduction
The gist of it is generating programs symbolically instead of using a neural network. It is a completely different approach to program generation and comes with different trade-offs.
There is no hallucination, no syntax errors and guaranteed to be type correct if you want. You can give it a library for it to use. You can give any kinds of hints or constraints. This could be assertions like assert f(2) = 4 or assert f(x) = IsPrime(x) for all inputs x. You can say “only use 100 steps to calculate the result” or “only use 100 bytes of memory.”
You can think of a program synthesizer as a tool that takes a program with todo’s and fills them in automatically. Let’s say we have this program:
fn sum(xs: Vec<i32>) -> i32 {
todo!()
}
The synthesizer will never produce a program that isn’t correctly typed like this:
fn sum(xs: Vec<i32>) -> i32 {
//Vec doesn't have a `sum` method. It should've been `xs.iter().sum()`.
xs.sum()
}
or something syntactically invalid like this:
fn sum(xs: Vec<i32>) -> i32 {
xs.iter(
}
It’s possible to run into these kinds of problems with LLMs since they’re like a function that takes a string and returns a string. Technically, there is nothing stopping them from producing an invalid program. (Now that I think about it, the idea of attention could work for an incomplete AST. I don’t know how you would represent that as a vector though.) In natural language, this isn’t much of a problem because we can understand text even if it has incorrect grammar, missing words, or misspellings. Program synthesis on the other hand works on structured data, an AST, so it’s impossible for it to create an invalid syntax.
In its simplest form, it just enumerates all possible programs from simple to complex until it finds the “correct” one. I’m putting quotes around the word “correct” because its definition changes depending on how you specify it. Take the previous incomplete sum function for example. The following program is technically correct:
fn sum(xs: Vec<i32>) -> i32 {
x
}
It is syntactically correct and type-checks, but it isn’t the intended implementation. There are two ways we can help the synthesizer understand our intention.
First is to encode a specification in the type system so that when searching through programs, it can eliminate ones that don’t type-check. I don’t know how I could specify this function, so let’s try a simpler one — a function that takes two numbers and returns the bigger one:
method Max(a: int, b: int): int
ensures Max(a, b) >= a && Max(a, b) >= b
ensures Max(a, b) == a || Max(a, b) == b
{
TODO()
}
ensures Max(a, b) >= a && Max(a, b) >= b tells us that the return value must be greater than or equal to both a and b. ensures Max(a, b) == a || Max(a, b) == b specifies that the result must either be a or b. If we didn’t have the first condition, just returning a would be correct; and a + b would be correct if we didn’t have the second condition. As you can imagine, these specifications could get very complex and lengthy for more complex problems.
The other way we can help the synthesizer is by providing examples like unit tests, so the synthesizer can check them against the program it found:
assert Max(1, 2) == 2
assert Max(-1, 0) == 0
I like the example based approach. If the synthesizer produces something unexpected, it’s easy to refine it. Just add a few more examples or change It is easier than writing specifications. It also feels more interactive. This video is a fantastic demo of this kind of interactive programming session. In it, the presenter tries to synthesize a program to extract users’ first and last names from emails.
Of course, there’s no rule that we have to use one approach or the other. If you already type-check, you can also run the program to see if the assertions hold. It’ll be more computationally expensive though. Victor Taelin’s SupGen uses both approaches. There’s also Smyth, which has interactive examples on their website you can play with. I think they’re planning on integrating it into the Hazel programming language. I highly recommend checking it out as well.
The really amazing thing is how these systems learn to generalize from just a few examples – kind of like how our brains do it. I honestly believe this is a crucial piece of the puzzle for overcoming the massive data and compute requirements we often see in AI. But before we can get to the “correct” programs, we need a way to generate them in the first place. So, let’s talk about program enumeration.
Enumerating Programs
Let’s simplify the problem as much as possible—but not too much, as we want to capture the essential complexity. The following language seems like a perfect candidate:
expr =
| 0
| S(expr)
| expr * expr
It has natural numbers represented as in Peano arithmetic. S(0) represents 1, S(S(0)) represents 2, etc. We also have multiplication expressions because otherwise, we could just enumerate like 0, S(0), S(S(0)), ... and that’s not interesting at all. With multiplication however, simply enumerating like that won’t work because we’d never generate expressions like 0 * S(0). In a sense, we have to be fair to each expression type, sometimes generating one type and sometimes another.
So we have a search problem where:
- We have to make a decision at each step e.g. after
S(0) * 0what’s the next expression? - There is a hierarchy between things we search for. Every
S(expr * expr)is aS(expr)and that in turn is anexpr.
A tree structure is ideal for visualizing this problem. Now, you might be wondering, how should we connect the nodes? And what should be the root? Well, recall our initial definition of program synthesis: the synthesizer’s job is to fill in those todo sections in your code. Think about todo – it’s like a placeholder that works for any type – number, string, you name it – and the type-checker is perfectly happy with it. It’s a universal stand-in for expressions.
So, can you see the connection between todo and our grammar’s expr? When we write S(expr), it’s like saying, “you can put any expression right here,” from something simple like S(0) to something more involved like S(S(0) * S(S(0))). This is just like how a todo!() in your code can be replaced with anything from a simple 42 to a more complex x or xs.sum(). That’s why we’re going to add a todo expression to our language. To keep things short and sweet, and because Smyth does it too, we’ll use ? as our shorthand for todo.
Now it’s really easy to structure the tree:

You can see that the tree looks exactly like our grammar definition. Everywhere we see a ?, we replace it with 0, S(?), and ? * ?. It also looks like we’re pattern matching against the program and becoming more specific at each level:
match x {
0 => ..
S(y) => match y {
S(0) => ..
S(S(z)) => ..
S(Mul(z, k)) => ..
}
Mul(y, z) => match (y, z) {
(0, 0) => ..
(0, S(k)) => ..
..
}
}
As you can see, the depth of this tree is infinite, so we can’t really use DFS. BFS is perfect for this use case because it explores the tree level by level. Since the programs grow from short to long, if there is a simpler correct program, we’ll find it first.
Implementing It in Code
Let’s first just scaffold the BFS algorithm:
function enumerate(program) {
const queue = [program];
while (queue.length !== 0) {
const expr = queue.shift();
console.log(showExpression(expr));
//TODO: find the children and add them to the queue
}
}
There’s nothing new here; we start our queue with the root and print the nodes in the queue until it’s empty. The crux of the algorithm is finding the children of a node. Let’s define an expand function that, given a node, replaces each hole with possible expression types as we discussed earlier.
// Define helpers to construct the expressions so that it's not crowded.
const hole = () => ({ type: "hole" });
const O = () => ({ type: "zero" });
const S = (prev) => ({ type: "succ", prev });
const mul = (left, right) => ({
type: "mul",
left,
right,
});
function expand(expr) {
switch (expr.type) {
case "zero":
return [];
case "hole":
// [0, S(?), ? * ?]
return [O(), S(hole()), mul(hole(), hole())];
}
}
Alright, those first two cases – zero and hole – were pretty straightforward. But now we get to the other two: succ and mul. Things start to get a little trickier here because these expressions can balloon in complexity. We could have something like S(S(S(S(?)))) or (? * 0) * (0 * S(?)). Now, one way to handle this would be to just walk through the expression, the AST, and replace any ? we find. But that sounds quite hard because we need to copy the tree for every replacement Since we’re already dealing with a tree structure, a more natural approach is to use recursion. We can simply call our expand function on each part of the expression. This way, we’ll eventually hit those base cases we already coded up. Let’s see how this works with S(S(?)) as an example:
expand(S(S(?))
|
expand(S(?))
|
expand(?) == [0, S(?), ? * ?]
Now we do the reverse: we take these basic expressions and construct our original expression. In this case, we wrap each of those expressions with S:
[0, S(?), ? * ?]
|
[S(0), S(S(?)), S(? * ?)]
|
[S(S(0)), S(S(S(?))), S(S(? * ?))]
Translating this into code:
function expand(expr) {
switch (expr.type) {
case "zero":
return [];
case "hole":
// [0, S(?), ? * ?]
return [O(), S(hole()), mul(hole(), hole())];
+ case "succ": {
+ const result = [];
+ for(const prev of expand(expr.prev)) {
+ result.push(S(prev))
+ }
+ return result;
}
}
}
This is just a map operation so we could do it more succinctly like this:
function expand(expr) {
switch (expr.type) {
case "zero":
return [];
case "hole":
// [0, S(?), ? * ?]
return [O(), S(hole()), mul(hole(), hole())];
+ case "succ": {
+ return expand(expr.prev).map(S);
+ }
}
}
? * ? expression is the same but since we have two sub-expressions, we have nested for loops:
function expand(expr) {
switch (expr.type) {
case "zero":
return [];
case "hole":
// [0, S(?), ? * ?]
return [O(), S(hole()), mul(hole(), hole())];
case "succ": {
return expand(expr.prev).map(S);
}
+ case "mul": {
+ const result = [];
+ for (const left of expand(expr.left)) {
+ for (const right of expand(expr.right)) {
+ result.push(mul(left, right));
+ }
+ }
+ return result;
+ }
}
}
Now we can use this to find the children in our enumerate function:
function enumerate(program) {
const queue = [program];
while (queue.length !== 0) {
const expr = queue.shift();
console.log(showExpression(expr));
for (const child of expand(expr)) {
queue.push(child);
}
}
}
and lastly the showExpression function:
function showExpression(expr: Expression): string {
switch (expr.type) {
case "hole":
return "?";
case "zero":
return "0";
case "succ":
return `S(${showExpression(expr.prev)})`;
case "mul":
return `(${showExpression(expr.left)} * ${showExpression(expr.right)})`;
}
}
If you run enumerate function, it’ll output as follows:
?
0
S(?)
(? * ?)
S(0)
S(S(?))
S((? * ?))
(0 * 0)
(0 * S(?))
(0 * (? * ?))
... and so on
If you look at the tree again, you’ll notice that only the complete nodes are actual programs. To print only those, we can track that information in the queue:
function enumerate(program: Expression) {
- const queue = [program];
+ const queue = [["incomplete", program]];
while (queue.length !== 0) {
- const expr = queue.shift()!;
- console.log(showExpression(expr));
+ const [type, expr] = queue.shift()!;
+ if (type === "complete") {
+ console.log(showExpression(expr));
+ }
for (const [i, child] of expand(expr).entries()) {
// Mark it as complete if it's the first child.
+ const type = i === 0 ? "complete" : "incomplete";
queue.push([type, child]);
}
}
}
Now when we run it, we only get complete programs:
0
S(0)
(0 * 0)
S(S(0))
S((0 * 0))
(S(0) * S(0))
(S(0) * (0 * 0))
((0 * 0) * S(0))
((0 * 0) * (0 * 0))
S(S(S(0)))
...and so on
Although it’s a nice and short solution, its memory consumption grows exponentially. I honestly don’t know a way to get around that. Another solution would be storing the generated programs in an array and building the next program using their combinations. For example:
0
S(0), 0 * 0
S(S(0)), S(0 * 0), 0 * S(0), 0 * (0 * 0), S(0) * 0, ...
...and so on
We don’t generate incomplete programs, but the growth is still exponential. However, with type information and examples, we’ll be able to prune most of the branches because the number of incorrect implementations is much larger than the number of correct ones.
Conclusion
Of course, this is just the beginning. The language we’ve built here is super simplified – not quite ready to tackle anything truly practical and it doesn’t find any programs; it just enumerates them. But that’s about as far as my current knowledge goes, so I think this is a good place to wrap things up for now. Hopefully, this has given you a clearer picture of the fundamental ideas behind program enumeration and synthesis! I’ll write another blog post after I further my understanding.