Conditional assignment in database queries
For the following to make the maximum amount of sense, you should first read Part 1
If you did, you can skip to the new stuff
Where we left things off
Even though this blog has no analytics on it, I can with almost absolute certainty say that 84.5% of readers will ignore the first line of this post and will not read Part 1.
I know because I wouldn’t either. It’s ok, we all do it, nothing to be ashamed of.
So let me bring you up to speed.
Value assignment in Glowdust is kind of different from most languages. Specifically, assignment is pattern matching (like Rust, for example) but it carries variables from the enclosing context. This changes a whole bunch of semantics.
Let’s see an example:
In Rust, variable a
will get a new value in the if let
block
let a = 1;
if let (a, 3) = (2, 3) {
println!(a); // prints 2
}
That’s regardless of the value a
has, or if it is even declared, outside the if let
scope. Inside, it
is shadowed.
Glowdust doesn’t shadow variables. Instead, if the variable is defined, it treats it like a value. This makes possible some patterns that are particularly useful for a database query language.
If we first create some data:
create function is_pair(int, int) -> bool
is_pair(1,2) = true;
is_pair(2,3) = true;
is_pair(4,5) = true;
we can then query it with a match
:
match is_pair(a, b), is_pair(b, c), return a, b, c
<<
[1,2,3]
What happens here is, for each value mapping in is_pair
, the query engine assigns the arguments
first to a
and b
, then to b
and c
, but both assignments are conditional.
This means b
in the inner loop already has a value and the assignment to c
will
happen only if the inner is_pair
value has a matching value for b
’s position.
The end result is a join on b
, which is exactly what the pattern is supposed to search for.
This is like the Rust example above not printing because a
was already set to something other
than 2
.
With this behavior, you can do some interesting stuff:
match is_pair(a, a+1), is_pair(a+3, b), return b
<<
[5]
Yup, I just assigned a variable and conditionally matched on an expression with that variable.
Call me biased, but that’s pretty cool.
Let’s make it make sense now
With this post I want to explore
- how conditional assignment works,
- why it creates the semantics it does and
- how Glowdust computes a query
all at the same time.
I don’t think I’ll succeed in all of these, but I sure can try.
You can download Glowdust and follow along
Let’s step through a non conditional example, using is_pair
from above:
match is_pair(a, b), return a
<<
[1]
[2]
[4]
Query execution in Glowdust starts by creating a FunctionCursor for the function is_pair
that returns the (domain, codomain) value pairs.
From the create
statement, we know that domain is (int, int)
and codomain is bool
.
(A FunctionCursor iterates over domain/codomain value pairs for a function, and is opened for every function that has unbound variables in its arguments).
For every such value, in a loop until the cursor returns ()
(the empty tuple):
- The domain value is conditionally assigned to
a
andb
- The codomain value is dropped (the query does not map it to anything)
a
is produced
At the beginning of each loop, variables a
and b
are unbound and have a value
assigned that will be used for the rest of the loop. Obviously b
is not used
anywhere, but a
is used in the return
atom.
More generally, a variable is assigned a value at the place where it appears the first time as a standalone expression, then that value is used in the rest of the query.
So now you can follow through a slightly more complex query:
match is_pair(a, a+1), return a
<<
[1]
[2]
[4]
Here exactly the same steps are followed, but instead of b
there is
a+1
. Since a
is already assigned, Glowdust will compute the expression
and compare it to the value produced by the FunctionCursor. It will
always match for our small dataset, so the same results are produced.
We could even flip it around:
match is_pair(a-1, a), return a
<<
[2]
[3]
[5]
This still works even though a
first appears in an expression. Glowdust
is smart enough to first scan for standalone variables, bind these
to values, and then use them everywhere else.
Joins
Now we know enough to understand how joins work. It’s exactly the same idea:
match is_pair(a, b), is_pair(b, c), return a, b, c
<<
[1,2,3]
This will be executed as a nested loop, because both is_pair
atoms have
unbound variables (a
and b
for the first one, c
for the second).
(Note: If there are no unbound variables in a function atom, then it is compiled directly as a function call, that just returns a single value, not a cursor)
For each value produced by the first FunctionCursor, variables a
and b
will be bound. Then the inner cursor will run through the full set of
values, but it will only bind c
if the corresponding b
is
equal to whatever was bound from the outer cursor.
Finally, for each iteration of the inner cursor that successfully matches
b
, the full triplet of a
, b
, and c
is produced and returned.
Once the inner cursor is exhausted, the outer cursor is advanced, a
and
b
are rebound, and the whole process repeats.
Bringing it all together
Now we can do the final example:
match is_pair(a, a+1), is_pair(a+3, b), return b
At this point it should be obvious what’s happening. The outer cursor
binds a
, computes and compares a+1
and if it is equal to the second
parameter of the domain value, it continues in the inner cursor.
There each pair produced is compared with the value of a+3
and, if
equals, b
is assigned and eventually returned.
For our dataset, this produces just a single value:
<<
[5]
Semantics?
This post started as an attempt to explain in detail what
conditional assignment is and justify why assignment in Glowdust
is a boolean
expression (and not ()
like in Rust).
As you probably understood by the complete lack of arguments
about the boolean
thing, I am not so sure about it.
The last example we saw:
match is_pair(a, a+1), is_pair(a+3, b), return b
can be rewritten with filters like so:
match is_pair(a, b), is_pair(c, d), {b == a + 1}, {c == a + 3}, return d
and it will be exactly equivalent.
Uglier, non declarative and with less opportunities for optimization.
But equivalent.
In this form, no conditional assignment is necessary. It’s straight assignment and then conditional checks.
Of course, “equivalent” doesn’t mean “the same”.
With conditional assignment, execution short circuits. It will not assign more than necessary, so strictly speaking the last, rewritten example above is slower than the original.
The boolean
value I think is an implementation detail. It comes from the
VM bytecode executing for conditional assignment.
You see, for each variable that appears in a match
, the compiler must
generate bytecode to either assign it a value or compare it with something:
match is_pair(a, a)
The two a
’s here are quite different. The first is assignment and always
succeeds, he second is comparison and can fail.
But the bytecode that follows cannot care about what preceeded. It just needs to know if it should continue executing or fetch the next value from the current cursor.
The solution is that both assignment and comparison will leave a bool
on the stack. It’s always true
for assignment and
whatever was the result for comparisons.
So maybe there is no reason to have assignment semantically be a boolean expression.
I’ll continue exploring.
As always, let me know your thoughts on Mastodon
And, if you find this interesting enough, you may want to donate towards my costs as an independent developer.