Welcome to a cs140e blog post series, where I write down my notes as I work
through Stanford's experimental course on operating systems in Rust. Today, we're going
to walk through creating a dynamic memory allocator in Rust so that the kernel can use data structures
that live in the heap like the Vec
and HashSet
types. Think malloc
and free
: Rust-style.
Here are the signatures for malloc
and free
in C:
void * malloc(size_t size); void free(void * pointer);
And the related signatures of alloc
and dealloc
in Rust (GlobalAlloc):
unsafe fn alloc(&self, _layout: Layout) -> *mut u8; unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout);
Later, we'll dive deeper into these type signatures.
Before writing a heap allocator, "What is the heap?".
The Book has a very good description of the characteristics of the heap and the stack
in the ownership chapter. In summary, we can say the heap is where we put objects that are
potentially variable and large in size and the stack consists of those objects whose size is typically
smaller and known at runtime. Since many collection types like HashMap
and Vec
can
change size during runtime, only by having a something to allocate them dynamically can
we unlock their powers.
The details will vary from system to system, but on the Raspberry Pi the heap will live
directly after the kernel's binary in RAM. We assume that it's an area of demand-zero1
memory that begins after the uninitialized .bss
2 area and moves upwards (towards higher
addresses).
Memory Alignment
/// Align `addr` downwards to the nearest multiple of `align`. /// /// The returned usize is always <= `addr.` /// /// # Panics /// /// Panics if `align` is not a power of 2. pub fn align_down(addr: usize, align: usize) -> usize { assert!(align.is_power_of_two()); (addr / align) * align } /// Align `addr` upwards to the nearest multiple of `align`. /// /// The returned `usize` is always >= `addr.` /// /// # Panics /// /// Panics if `align` is not a power of 2. pub fn align_up(addr: usize, align: usize) -> usize { align_down(addr + align - 1, align) }
The Layout
type describes the particular layout of a block of memory. This type has two
getter methods layout.size()
and layout.align()
that return the requested size of our
block and the alignment of that memory block's address.
GlobalAlloc
is an unsafe trait (like Send
and Sync
) for a variety of reasons:
alloc
can cause undefined behavior if the caller does not ensure that thelayout
argument has non-zero size.- That allocated block of memory may or may not be initialized.
dealloc
will cause undefined behavior the caller does not ensure that theptr
denotes a block of memory returned byalloc
(allocated via this allocator) and that the memorylayout
is the same as that used to allocate the block of memory we're freeing.
In writing this allocator one make sure that the addresses returned are properly aligned.
A difference from C we see here is that Rust split the responsiblities of dynamic memory
allocation such that the caller of alloc
and dealloc
must specify the the alignment.
alloc
puts the burden of responsiblity on the allocator to return an address in memory
that is properly aligned while dealloc
puts the onus on the caller to keep track of the
memory layout that was previous used to allocate the address being free'd. Rust has more
restrictions on memory alignment than C does. The benefits of this are that all common
code are in the same location avoiding the same code being written for every allocator.
This goes well with Rust's power to abstract away details from the user and additionally
allows the Rust compiler to perform some langauge-level optimizations without involving
the allocator.
Back to memory alignment: we need to write two utility functions to aide us, align_up
and align_down
which will find the closest memory address upwards or downwards to
the nearest multiple of some power of two. The power of two thing is just a convention,
but a good one at that because it's how computers are built! Multiplication, division,
addition, and subtraction, are all much easier to do (i.e: faster) than attempting to do
so with non-binary powers.
In a few simple lines, including an assertion that the align
argument is a power of two.
Thread Safety
There is already extensive literature on creating thread-safe memory allocators. The reason
is because most allocators are global to the scope of a program and dealloc
and alloc
are certainly not reentrant functions, as they operate on a slab of shared data. What happens
if two separate threads (which we don't support yet) calls alloc
at the sime time? Could
they receive the same address? If so, how large would the allocated region be? Rust, as a
language, takes these questions of concurrency very seriously. It is difficult to write
programs in Rust that isn't thread-safe. For our part the instructor has simply wrapped the
allocator in a Mutex
ensuring thread-safety by mutual-exclusion.
#[path = "bump.rs"] mod imp; /// Thread-safe (locking) wrapper around a particular memory allocator. #[derive(Debug)] pub struct Allocator(Mutex<Option<imp::Allocator>>);
Note here that the imp
module is virtual and not actually backed by a file of the same
name on the file-system but instead given a path to the allocator of our choosing. In this
lab we're going to write two different kinds of allocators so having an ability to easily
switch between implementations is really convienent. Our first allocator is called a "bump
allocator" and our second is a "bin allocator". We'll go into more detail as to how these
two memory allocators work further along in the post.
Mapping Memory
Here is the implementation of our generic memory allocator:
impl Allocator { /// Returns an uninitialized `Allocator`. /// /// The allocator must be initialized by calling `initialize()` before the /// first memory allocation. Failure to do will result in panics. pub const fn uninitialized() -> Self { Allocator(Mutex::new(None)) } /// Initializes the memory allocator. /// /// # Panics /// /// Panics if the system's memory map could not be retrieved. pub fn initialize(&self) { let (start, end) = memory_map().expect("failed to find memory map"); *self.0.lock() = Some(imp::Allocator::new(start, end)); } }
The initialize
method constructs an instance of the internal imp::Allocator
structure
for later allocations and deallocations. It in turn calls memory_map
to find out the start and
end point of a region the region memory that has been mapped by the Raspberry Pi's bootloader
on start. We can implement this using the Atags
implementation of the previous post!
Since the MEM ATAG gives us the amount of memory available in RAM, we can just take the
start address in the tag (likely 0x0000000) and then add the size to that number to get
the end address before returning them both as a tuple (start, end)
.
// Holds the first address after the kernel's binary. extern "C" { static _end: u8; } use pi::atags::Atags; /// Returns the (start address, end address) of the available memory on this /// system if it can be determined. If it cannot, `None` is returned. /// /// This function is expected to return `Some` under all normal cirumstances. fn memory_map() -> Option<(usize, usize)> { let binary_end = unsafe { (&_end as *const u8) as u32 }; for atag in Atags::get() { if let Some(mem) = atag.mem() { return Some( (binary_end as usize, (binary_end + mem.size) as usize) ); } } None }
It's imporant to note that the MEM ATAG reports the amount of total system memory in RAM,
however, and not the amount of actually free memory. Our instructor has helpfully defined
for us binary_end
that holds the first address of memory after the kernel
Bump Allocator
Our first allocator will be the dumbest of all allocators. Whose behavior is specfified as:
- Initialized with the pointer at the beginning of our heap space
- When we request to
alloc
n-bytes of memory, thecurrent
pointer is bumped forward n-bytes (plus some alignment) and the old value of the pointer is returned. - When we
dealloc
an address, nothing actually happens.
Here's a diagram from the assignment's page that depicts what happens to the current
pointer
after a 1k
byte allocation and a subsequent 512
byte allocation. Though alignment concerns
are abset in the diagram.
We're tasked with implementing the new()
, alloc()
, and dealloc()
methods of the
bump::Allocator
using our align_up
and align_down
utility functions to ensure
proper alignment of the return address.
The instructor wrote a good number of tests to run our solution against, though this is one of those times where the Rust core API has changed considerably enough over the past year that we have to make some changes to get them running. I'll document this process here for others taking the class years after it was first created.
The first bug we run into is related to a missing module raw_vec
that supposedly contains RawVec
:
error[E0432]: unresolved import `core::alloc::raw_vec` --> src/allocator/tests.rs:63:22 | 63 | use core::alloc::raw_vec::RawVec; | ^^^^^^^ could not find `raw_vec` in `alloc`
Looking for RawVec
in the standard documentation yielded no results though checking out
the official Rust repo we see a RawVec
implementation in liballoc with the #![doc(hidden)]
attribute, which is why it doesn't appear in the doucmentation.
Using the Allocator
Rust 1.28 introduced a stable #[global_allocator]
attribute that allows Rust programs to either
set the allocator used to the system allocator as well as create new allocators which implment
the GlobalAlloc
trait. This is very nice for us because this means that just by properly
implementing alloc
and dealloc
within our custom allocator, we'll be able to manage
memory without having to know exactly how much memory our kernel will need at runtime.
A region (or page, though we haven't implemented paging yet) of memory that's mapped to an anonymous file. It's demand-zero because no data is actually transferred and s.t. their sizes are initially zero.
[Sections] 00 0x00000000 0 0x00000000 0 ----- 01 0x000000c0 20860 0x00080000 20860 --r-x .text 02 0x00005240 6066 0x00085180 6066 --r-- .rodata 03 0x00006a00 2848 0x00086940 2848 --rw- .data 04 0x00000000 0 0x00087460 0 --rw- .bss <-- demand-zero 05 0x00007520 5976 0x00000000 5976 ----- .symtab 06 0x00008c78 6233 0x00000000 6233 ----- .strtab 07 0x0000a4d1 52 0x00000000 52 ----- .shstrtab 08 0x000000c0 29792 0x00080000 29792 m-rwx LOAD0 09 0x00000000 0 0x00000000 0 m-rw- GNU_STACK <-- demand-zero 10 0x00000000 64 0x00080000 64 m-rw- ehdr
As per wikipedia: "The name .bss is used by many compilers and linkers for the portion of the executable file containing statically-allocated variables that are not explicitly initialized to any value."