Mutable object trees in Rust, using memory arenas

I have to be honest with you

I’m not exactly comfortable with Rust yet. I muddle through, but things like Rc and lifetimes are a bit over my head. So, I do what every reasonable person does when faced with the prospect of learning something hard.

I avoid it.

Do you know the saying that necessity is the mother of invention? While not as widely known, it’s equally true that invention’s deadbeat stepdad is laziness.

So anyway, let’s invent a way out of using Rc to create a recursive data structure in Rust.

Pssst, hey kid. Wanna make a tree in Rust?

If you search for trees, linked lists or any recursive data structure in Rust, you will find a series of posts saying that “Rust’s borrow checker is great, but it makes it hard to create recursive data structures”.

And then they all go ahead and show how to implement them using Rc<RefCell>.

Which is great and all, and I want to thank all the authors that put these posts together. The idea is quite simple, and it is very succinctly described in this post from the rust-lang users forum

Alice gives 2 solutions there. One is the classic Rc<RefCell> approach, and the other is a vector/integer combo. And the thing about the second one is that it doesn’t have Rc and it’s also not that widely discussed. So let’s do that.

But first, I’ll need a motivating example.

`Cursor`s in Glowdust

Glowdust is the new DBMS I am building and for which I am currently creating a query engine. To generate query results, I need to create iterators over the base data and then combine them in various ways with joins, projections, function calls and so on. But, because database Iterators need to be composable and rewind-able, they tend to be called Cursors instead.

Ok, we call them Cursors because DBMS people need to feel special. But it’s mostly the composable thing.

The interface I have for Cursors right now looks like this

pub trait Cursor {
    fn next(&mut self) -> ResultRow;
}

I have different types of Cursors, of which I’ll mention three so we can build up an example.¹

The base Cursor is the FunctionCursor, which creates new rows from data stored on disk:

pub struct FunctionCursor {
    source: DiskData
}

impl Cursor for FunctionCursor {
    fn next(&mut self) -> ResultRow {
        // get row from self.source
        // return it
    }
}

A JoinCursor combines two Cursors and performs a join on their results:

pub struct JoinCursor<'a> {
    lhs: &'a mut FunctionCursor,
    rhs: &'a mut FunctionCursor,
}

// none of this is real code, it's just for illustration
impl<'a> Cursor for JoinCursor<'a> {
    fn next(&mut self) -> ResultRow {
        for left_row in self.lhs {
            for right_row in self.rhs {
                if join(left_row, right_row) {
                    return concatenate_rows(left_row, right_row)
                }
                // else check next combination
            }
        }
    }
}

And finally, a ProjectionCursor takes the output of a JoinCursor and projects some of its variables in every row

pub struct ProjectionCursor<'a> {
    source: &'a mut JoinCursor<'a>
    projection: Projection
}


impl<'a> Cursor for ProjectionCursor<'a> {
    fn next(&mut self) -> ResultRow {
        let result = self.source.next();
        return self.projection.apply(result);
    }
}

These cursors can be combined in arbitrary ways, but they need to obey the invariant that there is a unique path connecting any two. Since each Cursor is unidirectional, this creates a directed tree, and an example would look like this:

A sample directed tree of 5 Cursors, pointer-style

The arrows indicate the direction of the dependency between the cursors, which is the reverse of the flow of rows. So in this case, rows are produced on the left and flow towards the right to the two projection cursors, where they are consumed.

The problem

I want to draw your attention to the mut references to self in Cursor::next() and the struct members. This means each Cursor needs to have mutable access to the ones it depends on, and those to the ones they depend on and so on until the base FunctionCursors. Calling next() on the end cursors, the ProjectionCursors in the example above, means we need mutable access on the path from the ProjectionCursor to the leaves - the FunctionCursors in this case.

Since this is a tree, there is exactly one path between any two cursors, which means as long as we don’t call next() on two different “end” Cursors concurrently, there is never a case where a &mut is being used from two places at the same time.

Now, I don’t know if it is obvious to you, but I am willing to admit that it took me a while to understand that the code I showed above can never build the tree in the picture. The problem is that the JoinCursor cannot be shared between the two ProjectionCursors. The references they hold to the JoinCursor will have to be &mut and you can’t have two &muts to the same object at the same time. That’s ownership for you.

Here, I will let rustc demonstrate this in no uncertain terms. This is the attempt at building the tree I showed above:

fn main() {
    let mut base_1 = FunctionCursor::new();
    let mut base_2 = FunctionCursor::new();

    let mut join = JoinCursor {
        lhs: &mut base_1,
        rhs: &mut base_2
    };

    let mut projection_1 = ProjectionCursor{
        source: &mut join
    };

    // This is never going to work
    let mut projection_2 = ProjectionCursor{
        source: &mut join
    };

}

And here is rustc educating me on the error of my ways:

error[E0499]: cannot borrow `join` as mutable more than once at a time
  --> src/cur.rs:44:53
   |
43 |     let mut projection_1 = ProjectionCursor{source: &mut join};
   |                                                     --------- first mutable borrow occurs here
44 |     let mut projection_2 = ProjectionCursor{source: &mut join};
   |                                                     ^^^^^^^^^
   |                                                     |
   |                                                     second mutable borrow occurs here
   |                                                     first borrow later used here

But, from the tree invariant, we know that it will never be the case at runtime that two different &muts will be used for the same object at the same time. The problem is that rustc has no way to prove this.

One way to deal with this, just like Alice said, is to avoid references and use instead an array (the arena) and integer indexes into it, so we can “borrow” without the compiler complaining.

Which brings us full circle to the title of the post - we have our mutable cursors, our tree and now the arena².

About time too, if you ask me.

Let’s build this

The core idea here is that instead of using pointers, that point to locations in memory, we use integers that point to a specific element in an array. So first, we’ll need an array, right?

An array of what, though?

Let’s see what we want out of it.

Each element will be a Cursor, and each one should be possible to share with multiple other Cursors (like the Join cursor is shared in the image above). Instead of references, we will hand out the index of the Cursor in the arena, so we don’t need Rc. But, as described, these “references” still need to be mutable³.

This, dear reader, can only mean one thing: RefCell.⁴

RefCell will let us get non-mut references to the arena elements, but still be able to get mut access to the included Cursor. With this in mind, here’s what the arena looks like⁵:

struct CursorArena {
    arena: Vec<RefCell<CursorValue>>,
    next_unused: usize,
}

impl CursorArena {
    fn new() -> Self {
        CursorArena {
            arena: Vec::default(),
            next_unused: 0,
        }
    }

    fn add(&mut self, cursor: CursorValue) -> usize {
        self.arena.push(RefCell::new(cursor));
        let result = self.next_unused;
        self.next_unused += 1;
        result
    }

    fn get(&self, id: usize) -> RefMut<'_, CursorValue> {
        match self.arena.get(id) {
            Some(ref value) => {
                return value.borrow_mut();
            }
            _ => {
                panic!()
            }
        }
    }
}

About `CursorValue`

CursorValue is an enum that let me put all cursor types in the same vector. Each Cursor is a different struct, with different fields and implementation. To fit them all in the same vector, I define an enum to use as a tagged union, like so:

enum CursorValue {
    FunctionCursorValue(FunctionCursor),
    JoinCursorValue(JoinCursor),
    ProjectionCursorValue(ProjectionCursor),
}

With this, I can put all my cursors in the same arena. I am not a fan of this pattern, but apparently it’s not uncommon and it also happens to work, so I won’t complain much.

Rewriting the Cursor, get rid of references

Now that we have the arena, we can replace references with integers. FunctionCursor won’t change, it never referenced any other Cursors, but the other two have reference fields that can become usize.

pub struct JoinCursor {
    lhs: usize,
    rhs: usize
}

// none of this is real code, it's just for illustration
impl Cursor for JoinCursor {
    fn next(&mut self) -> ResultRow {

        let mut ref_mut_left = arena.get(self.lhs);
        let FunctionCursorValue(ref mut lhs) = ref_mut_left.deref_mut() else {
            panic!();
        };
        let mut ref_mut_right = arena.get(self.rhs);
        let FunctionCursorValue(ref mut rhs) = ref_mut_right.deref_mut() else {
            panic!();
        };

        for left_row in lhs {
            for right_row in rhs {
                if join(left_row, right_row) {
                    return concatenate_rows(left_row, right_row)
                }
                // else check next combination
            }
        }
    }
}

pub struct ProjectionCursor {
    source: usize
    projection: ProjectionSpec
}

impl Cursor for FunctionCursor {
    fn next(&mut self) -> ResultRow {

        let mut ref_mut_source = arena.get(self.source);
        let JoinCursorValue(ref mut source) = ref_mut_source.deref_mut() else {
            panic!();
        };

        let result = source.next();
        return projection.apply(result);
    }
}

As a nice bonus, we also got rid of the lifetimes for the references. And now we can now construct our tree:

let mut arena = CursorArena::new();

let function_1 = arena.add(FunctionCursorValue(FunctionCursor::new()));
let function_2 = arena.add(FunctionCursorValue(FunctionCursor::new()));

let mut join = JoinCursor::new();
join.lhs = function_1;
join.rhs = function_2;

let join_id = arena.add(JoinCursorValue(join));

let mut project_1 = ProjectionCursor::new()
let mut project_2 = ProjectionCursor::new()

project_1.source = join_id; 
project_2.source = join_id; // <- This is the big deal right here.

arena.add(ProjectionCursorValue(project_1));
arena.add(ProjectionCursorValue(project_2));

That third line from the end, that’s what this post is all about, which may look kind of underwhelming, unless you’ve really, really needed to have such a tree in your program. Then it’s a big deal. Here’s what we built, in picture form:

The same tree of 5 Cursors, vector style

You know what?

After going through all this, I think I understand Rc better. Not so sure about RefCell, as you can see it gets a bit fuzzy with borrow_mut() and stuff, but it works. I will keep using the integer way, it makes more sense to me, and I like that it doesn’t get cluttered with multiple nested parametric types.

Thank you for reading. If you have any comments on this, see what I’m working on, or keep track of Glowdust’s development, you can find me on Mastodon at @chrisg@fosstodon.org .

The examples here don’t make use of the Cursor trait. In real code, you’d use trait objects or some other method of combining cursors more freely. This was outside the scope of this post, so I hardwired the types. ↩︎
Arena is a fancy word for array that is used for memory allocation. When you read arena, just think array - that’s what I do and it hasn’t failed me yet. They even start with the same letter. ↩︎
Since the Cursors are owned by the arena, a mutable reference to any single Cursor would also mean a mutable reference to the Vec, which would create exactly the same problem, just one collection higher. ↩︎
Ok, it may mean more things than that. As I said, I’m still learning :) ↩︎
You can opt for Vec<Option<RefCell<CursorValue>>> if you want to have empty locations in the arena. I skip them here to keep things cleaner. ↩︎

Mutable object trees in Rust, using memory arenas

I have to be honest with you #

Pssst, hey kid. Wanna make a tree in Rust? #

Cursors in Glowdust #

The problem #

Let’s build this #

About CursorValue #

Rewriting the Cursor, get rid of references #

You know what? #