Conditional assignment in database queries

For the following to make the maximum amount of sense, you should first read Part 1

Where we left things off

Even though this blog has no analytics on it, I can with almost absolute certainty say that 84.5% of readers will ignore the first line of this post and will not read Part 1.

I know because I wouldn’t either. It’s ok, we all do it, nothing to be ashamed of.

So let me bring you up to speed.

Value assignment in Glowdust is kind of different from most languages. Specifically, assignment is pattern matching (like Rust, for example) but it carries variables from the enclosing context. This changes a whole bunch of semantics.

Let’s see an example:

In Rust, variable a will get a new value in the if let block

let a = 1;
if let (a, 3) = (2, 3) {
	println!(a);	// prints 2
}

That’s regardless of the value a has, or if it is even declared, outside the if let scope. Inside, it is shadowed.

Glowdust doesn’t shadow variables. Instead, if the variable is defined, it treats it like a value. This makes possible some patterns that are particularly useful for a database query language.

If we first create some data:

create function is_pair(int, int) -> bool

is_pair(1,2) = true;
is_pair(2,3) = true;
is_pair(4,5) = true;

we can then query it with a match:

match is_pair(a, b), is_pair(b, c), return a, b, c

<<
[1,2,3]

What happens here is, for each value mapping in is_pair, the query engine assigns the arguments first to a and b, then to b and c, but both assignments are conditional. This means b in the inner loop already has a value and the assignment to c will happen only if the inner is_pair value has a matching value for b’s position.

The end result is a join on b, which is exactly what the pattern is supposed to search for.

This is like the Rust example above not printing because a was already set to something other than 2.

With this behavior, you can do some interesting stuff:

match is_pair(a, a+1), is_pair(a+3, b), return b

<<
[5]

Yup, I just assigned a variable and conditionally matched on an expression with that variable.

Call me biased, but that’s pretty cool.

Let’s make it make sense now

With this post I want to explore

how conditional assignment works,
why it creates the semantics it does and
how Glowdust computes a query

all at the same time.

I don’t think I’ll succeed in all of these, but I sure can try.

You can download Glowdust and follow along

Let’s step through a non conditional example, using is_pair from above:

match is_pair(a, b), return a

<<
[1]
[2]
[4]

Query execution in Glowdust starts by creating a FunctionCursor for the function is_pair that returns the (domain, codomain) value pairs.

From the create statement, we know that domain is (int, int) and codomain is bool.

(A FunctionCursor iterates over domain/codomain value pairs for a function, and is opened for every function that has unbound variables in its arguments).

For every such value, in a loop until the cursor returns () (the empty tuple):

The domain value is conditionally assigned to a and b
The codomain value is dropped (the query does not map it to anything)
a is produced

At the beginning of each loop, variables a and b are unbound and have a value assigned that will be used for the rest of the loop. Obviously b is not used anywhere, but a is used in the return atom.

More generally, a variable is assigned a value at the place where it appears the first time as a standalone expression, then that value is used in the rest of the query.

So now you can follow through a slightly more complex query:

match is_pair(a, a+1), return a

<<
[1]
[2]
[4]

Here exactly the same steps are followed, but instead of b there is a+1. Since a is already assigned, Glowdust will compute the expression and compare it to the value produced by the FunctionCursor. It will always match for our small dataset, so the same results are produced.

We could even flip it around:

match is_pair(a-1, a), return a

<<
[2]
[3]
[5]

This still works even though a first appears in an expression. Glowdust is smart enough to first scan for standalone variables, bind these to values, and then use them everywhere else.

Joins

Now we know enough to understand how joins work. It’s exactly the same idea:

match is_pair(a, b), is_pair(b, c), return a, b, c

<<
[1,2,3]

This will be executed as a nested loop, because both is_pair atoms have unbound variables (a and b for the first one, c for the second).

(Note: If there are no unbound variables in a function atom, then it is compiled directly as a function call, that just returns a single value, not a cursor)

For each value produced by the first FunctionCursor, variables a and b will be bound. Then the inner cursor will run through the full set of values, but it will only bind c if the corresponding b is equal to whatever was bound from the outer cursor.

Finally, for each iteration of the inner cursor that successfully matches b, the full triplet of a, b, and c is produced and returned.

Once the inner cursor is exhausted, the outer cursor is advanced, a and b are rebound, and the whole process repeats.

Bringing it all together

Now we can do the final example:

match is_pair(a, a+1), is_pair(a+3, b), return b

At this point it should be obvious what’s happening. The outer cursor binds a, computes and compares a+1 and if it is equal to the second parameter of the domain value, it continues in the inner cursor.

There each pair produced is compared with the value of a+3 and, if equals, b is assigned and eventually returned.

For our dataset, this produces just a single value:

<<
[5]

Semantics?

This post started as an attempt to explain in detail what conditional assignment is and justify why assignment in Glowdust is a boolean expression (and not () like in Rust).

As you probably understood by the complete lack of arguments about the boolean thing, I am not so sure about it.

The last example we saw:

match is_pair(a, a+1), is_pair(a+3, b), return b

can be rewritten with filters like so:

match is_pair(a, b), is_pair(c, d), {b == a + 1}, {c == a + 3}, return d

and it will be exactly equivalent.

Uglier, non declarative and with less opportunities for optimization.

But equivalent.

In this form, no conditional assignment is necessary. It’s straight assignment and then conditional checks.

Of course, “equivalent” doesn’t mean “the same”.

With conditional assignment, execution short circuits. It will not assign more than necessary, so strictly speaking the last, rewritten example above is slower than the original.

The boolean value I think is an implementation detail. It comes from the VM bytecode executing for conditional assignment.

You see, for each variable that appears in a match, the compiler must generate bytecode to either assign it a value or compare it with something:

match is_pair(a, a)

The two a’s here are quite different. The first is assignment and always succeeds, he second is comparison and can fail.

But the bytecode that follows cannot care about what preceeded. It just needs to know if it should continue executing or fetch the next value from the current cursor.

The solution is that both assignment and comparison will leave a bool on the stack. It’s always true for assignment and whatever was the result for comparisons.

So maybe there is no reason to have assignment semantically be a boolean expression.

I’ll continue exploring.

As always, let me know your thoughts on Mastodon

And, if you find this interesting enough, you may want to donate towards my costs as an independent developer.

Conditional assignment in database queries

For the following to make the maximum amount of sense, you should first read Part 1 #

If you did, you can skip to the new stuff #

Where we left things off #

Let’s make it make sense now #

Joins #

Bringing it all together #

Semantics? #