Mutable object trees in Rust, using memory arenas
I have to be honest with you
I’m not exactly comfortable with Rust yet. I muddle through, but things like Rc
and lifetimes are a bit over my head. So, I do what every reasonable person does when faced with the prospect of learning something hard.
I avoid it.
Do you know the saying that necessity is the mother of invention? While not as widely known, it’s equally true that invention’s deadbeat stepdad is laziness.
So anyway, let’s invent a way out of using Rc
to create a recursive data structure in Rust.
Pssst, hey kid. Wanna make a tree in Rust?
If you search for trees, linked lists or any recursive data structure in Rust, you will find a series of posts saying that “Rust’s borrow checker is great, but it makes it hard to create recursive data structures”.
And then they all go ahead and show how to implement them using Rc<RefCell>
.
Which is great and all, and I want to thank all the authors that put these posts together. The idea is quite simple, and it is very succinctly described in this post from the rust-lang users forum
Alice gives 2 solutions there. One is the classic Rc<RefCell>
approach, and the other is a vector/integer combo. And the thing about the second one is that it doesn’t have Rc
and it’s also not that widely discussed. So let’s do that.
But first, I’ll need a motivating example.
Cursor
s in Glowdust
Glowdust is the new DBMS I am building and for which I am currently creating a query engine. To generate query results, I need to create iterators over the base data and then combine them in various ways with joins, projections, function calls and so on. But, because database Iterators need to be composable and rewind-able, they tend to be called Cursor
s instead.
Ok, we call them Cursors because DBMS people need to feel special. But it’s mostly the composable thing.
The interface I have for Cursor
s right now looks like this
pub trait Cursor {
fn next(&mut self) -> ResultRow;
}
I have different types of Cursor
s, of which I’ll mention three so we can build up an example.1
The base Cursor
is the FunctionCursor
, which creates new rows from data stored on disk:
pub struct FunctionCursor {
source: DiskData
}
impl Cursor for FunctionCursor {
fn next(&mut self) -> ResultRow {
// get row from self.source
// return it
}
}
A JoinCursor
combines two Cursor
s and performs a join on their results:
pub struct JoinCursor<'a> {
lhs: &'a mut FunctionCursor,
rhs: &'a mut FunctionCursor,
}
// none of this is real code, it's just for illustration
impl<'a> Cursor for JoinCursor<'a> {
fn next(&mut self) -> ResultRow {
for left_row in self.lhs {
for right_row in self.rhs {
if join(left_row, right_row) {
return concatenate_rows(left_row, right_row)
}
// else check next combination
}
}
}
}
And finally, a ProjectionCursor
takes the output of a JoinCursor
and projects some of its variables in every row
pub struct ProjectionCursor<'a> {
source: &'a mut JoinCursor<'a>
projection: Projection
}
impl<'a> Cursor for ProjectionCursor<'a> {
fn next(&mut self) -> ResultRow {
let result = self.source.next();
return self.projection.apply(result);
}
}
These cursors can be combined in arbitrary ways, but they need to obey the invariant that there is a unique path connecting any two. Since each Cursor
is unidirectional, this creates a directed tree, and an example would look like this:
The arrows indicate the direction of the dependency between the cursors, which is the reverse of the flow of rows. So in this case, rows are produced on the left and flow towards the right to the two projection cursors, where they are consumed.
The problem
I want to draw your attention to the mut
references to self
in Cursor::next()
and the struct
members. This means each Cursor
needs to have mutable access to the ones it depends on, and those to the ones they depend on and so on until the base FunctionCursor
s. Calling next()
on the end cursors, the ProjectionCursors
in the example above, means we need mutable access on the path from the ProjectionCursor
to the leaves - the FunctionCursors
in this case.
Since this is a tree, there is exactly one path between any two cursors, which means as long as we don’t call next()
on two different “end” Cursor
s concurrently, there is never a case where a &mut
is being used from two places at the same time.
Now, I don’t know if it is obvious to you, but I am willing to admit that it took me a while to understand that the code I showed above can never build the tree in the picture. The problem is that the JoinCursor
cannot be shared between the two ProjectionCursor
s. The references they hold to the JoinCursor
will have to be &mut
and you can’t have two &mut
s to the same object at the same time. That’s ownership for you.
Here, I will let rustc
demonstrate this in no uncertain terms. This is the attempt at building the tree I showed above:
fn main() {
let mut base_1 = FunctionCursor::new();
let mut base_2 = FunctionCursor::new();
let mut join = JoinCursor {
lhs: &mut base_1,
rhs: &mut base_2
};
let mut projection_1 = ProjectionCursor{
source: &mut join
};
// This is never going to work
let mut projection_2 = ProjectionCursor{
source: &mut join
};
}
And here is rustc
educating me on the error of my ways:
error[E0499]: cannot borrow `join` as mutable more than once at a time
--> src/cur.rs:44:53
|
43 | let mut projection_1 = ProjectionCursor{source: &mut join};
| --------- first mutable borrow occurs here
44 | let mut projection_2 = ProjectionCursor{source: &mut join};
| ^^^^^^^^^
| |
| second mutable borrow occurs here
| first borrow later used here
But, from the tree invariant, we know that it will never be the case at runtime that two different &mut
s will be used for the same object at the same time. The problem is that rustc
has no way to prove this.
One way to deal with this, just like Alice said, is to avoid references and use instead an array (the arena) and integer indexes into it, so we can “borrow” without the compiler complaining.
Which brings us full circle to the title of the post - we have our mutable cursors, our tree and now the arena2.
About time too, if you ask me.
Let’s build this
The core idea here is that instead of using pointers, that point to locations in memory, we use integers that point to a specific element in an array. So first, we’ll need an array, right?
An array of what, though?
Let’s see what we want out of it.
Each element will be a Cursor
, and each one should be possible to share with multiple other Cursor
s (like the Join cursor is shared in the image above). Instead of references, we will hand out the index of the Cursor
in the arena, so we don’t need Rc
. But, as described, these “references” still need to be mutable3.
This, dear reader, can only mean one thing: RefCell
.4
RefCell
will let us get non-mut references to the arena elements, but still be able to get mut access to the included Cursor
. With this in mind, here’s what the arena looks like5:
struct CursorArena {
arena: Vec<RefCell<CursorValue>>,
next_unused: usize,
}
impl CursorArena {
fn new() -> Self {
CursorArena {
arena: Vec::default(),
next_unused: 0,
}
}
fn add(&mut self, cursor: CursorValue) -> usize {
self.arena.push(RefCell::new(cursor));
let result = self.next_unused;
self.next_unused += 1;
result
}
fn get(&self, id: usize) -> RefMut<'_, CursorValue> {
match self.arena.get(id) {
Some(ref value) => {
return value.borrow_mut();
}
_ => {
panic!()
}
}
}
}
About CursorValue
CursorValue
is an enum that let me put all cursor types in the same vector. Each Cursor is a different struct, with different fields and implementation. To fit them all in the same vector, I define an enum to use as a tagged union, like so:
enum CursorValue {
FunctionCursorValue(FunctionCursor),
JoinCursorValue(JoinCursor),
ProjectionCursorValue(ProjectionCursor),
}
With this, I can put all my cursors in the same arena. I am not a fan of this pattern, but apparently it’s not uncommon and it also happens to work, so I won’t complain much.
Rewriting the Cursor, get rid of references
Now that we have the arena, we can replace references with integers. FunctionCursor won’t change, it never referenced any other Cursors, but the other two have reference fields that can become usize
.
pub struct JoinCursor {
lhs: usize,
rhs: usize
}
// none of this is real code, it's just for illustration
impl Cursor for JoinCursor {
fn next(&mut self) -> ResultRow {
let mut ref_mut_left = arena.get(self.lhs);
let FunctionCursorValue(ref mut lhs) = ref_mut_left.deref_mut() else {
panic!();
};
let mut ref_mut_right = arena.get(self.rhs);
let FunctionCursorValue(ref mut rhs) = ref_mut_right.deref_mut() else {
panic!();
};
for left_row in lhs {
for right_row in rhs {
if join(left_row, right_row) {
return concatenate_rows(left_row, right_row)
}
// else check next combination
}
}
}
}
pub struct ProjectionCursor {
source: usize
projection: ProjectionSpec
}
impl Cursor for FunctionCursor {
fn next(&mut self) -> ResultRow {
let mut ref_mut_source = arena.get(self.source);
let JoinCursorValue(ref mut source) = ref_mut_source.deref_mut() else {
panic!();
};
let result = source.next();
return projection.apply(result);
}
}
As a nice bonus, we also got rid of the lifetimes for the references. And now we can now construct our tree:
let mut arena = CursorArena::new();
let function_1 = arena.add(FunctionCursorValue(FunctionCursor::new()));
let function_2 = arena.add(FunctionCursorValue(FunctionCursor::new()));
let mut join = JoinCursor::new();
join.lhs = function_1;
join.rhs = function_2;
let join_id = arena.add(JoinCursorValue(join));
let mut project_1 = ProjectionCursor::new()
let mut project_2 = ProjectionCursor::new()
project_1.source = join_id;
project_2.source = join_id; // <- This is the big deal right here.
arena.add(ProjectionCursorValue(project_1));
arena.add(ProjectionCursorValue(project_2));
That third line from the end, that’s what this post is all about, which may look kind of underwhelming, unless you’ve really, really needed to have such a tree in your program. Then it’s a big deal. Here’s what we built, in picture form:
You know what?
After going through all this, I think I understand Rc
better. Not so sure about RefCell
, as you can see it gets a bit fuzzy with borrow_mut()
and stuff, but it works. I will keep using the integer way, it makes more sense to me, and I like that it doesn’t get cluttered with multiple nested parametric types.
Thank you for reading. If you have any comments on this, see what I’m working on, or keep track of Glowdust’s development, you can find me on Mastodon at @chrisg@fosstodon.org .
-
The examples here don’t make use of the
Cursor
trait. In real code, you’d use trait objects or some other method of combining cursors more freely. This was outside the scope of this post, so I hardwired the types. ↩︎ -
Arena is a fancy word for array that is used for memory allocation. When you read arena, just think array - that’s what I do and it hasn’t failed me yet. They even start with the same letter. ↩︎
-
Since the
Cursor
s are owned by the arena, a mutable reference to any singleCursor
would also mean a mutable reference to theVec
, which would create exactly the same problem, just one collection higher. ↩︎ -
Ok, it may mean more things than that. As I said, I’m still learning :) ↩︎
-
You can opt for
Vec<Option<RefCell<CursorValue>>>
if you want to have empty locations in the arena. I skip them here to keep things cleaner. ↩︎