A Fast, Growable Array With Stable Pointers in C

August 5, 2025・6 minute read

My last article about generic data structures in C was written to set the stage for today’s topic: A data structure that can be used in place of dynamic arrays, has stable pointers, and works well with arena allocators. It’s been independently discovered by different programmers over the years and so goes by different names. A 2001 paper called it a “levelwise-allocated pile” (bleh). Others call it an “exponential array”. In Zig’s standard library it’s “SegmentedList”. I use the name “segment array”.

You can download my single-header C implementation here: segment_array.h. The data structure is not limited to C though, you could write it in other languages like Zig, Rust, or C++.

The core concept is straight forward: items are stored in multiple contiguous segments, and each segment is double the length of its predecessor. New segments are allocated only when needed, and their pointers are kept in a fixed sized array. Here’s how that looks:

NULLNULLsegments arraysegment 0segment 1segment 2

Unlike standard arrays, pointers to a segment array’s items are always valid because items are never moved. Leaving items in place also means it never leaves “holes” of abandoned memory in arena allocators. The layout also allows us to access any index in constant time.

The Implementation

You could implement a Segment Array just from the description above, but there are a few details worth discussing further. Here’s the above diagram as a C struct:

typedef struct {
    u32 count;
    int used_segments;
    u8 *segments[26];
} SegmentArrayInternal;

You might be surprised that the segments array has such a low fixed size. Most computers today use only 48 bits of the 64 bits in a pointer. That means 49 segments would hold more items than a pointer could even point to (aka more than 256 TiB of memory), so anything above 48 is out. I like to be able to use a u32 to index the array1 when I don’t need more than 4 billion items. That reduces the segments to 32. Finally, I shave off 6 more segments because the smallest segments sizes (1, 2, 4, 8, 16, 32) aren’t worth their overhead, so the 64 item segment becomes the first segment. 26 segments can hold 4,294,967,232 items (just under UINT32_MAX).

The segments array being a fixed size is actually important because it means it’s right next to the rest of the struct in memory and likely wont get a cache miss when we index into it.

You may have noticed I keep mentioning segment sizes that are a power of two. It’s not strictly necessary, but it makes the math nice and allows us to use fast bit shift operations for calculating indices:

#define SMALL_SEGMENTS_TO_SKIP 6

#define log2i(X) ((u32) (8*sizeof(unsigned long long) \
    - __builtin_clzll((X)) - 1))

u32 capacity_for_segment_count(int segment_count) {
    return ((1 << SMALL_SEGMENTS_TO_SKIP) << segment_count) 
        - (1 << SMALL_SEGMENTS_TO_SKIP);
}

void *_sa_get(SegmentArrayInternal *sa, u32 index, size_t item_size) {
    int segment = log2i((index >> SMALL_SEGMENTS_TO_SKIP) + 1);
    u32 slot = index - capacity_for_segment_count(segment);
    return sa->segments[segment] + item_size*slot;
}

log2i (base 2 logarithm for integers) is implemented with the compiler intrinsic __builtin_clzll (count leading zeros of unsigned long long). It’s just an efficient way to calculate which segment a given index falls within.

Clang optimizes _sa_get to just 10 x86-64 instructions (with -O3):

_sa_get:
    mov     eax, esi
    shr     eax, 5
    inc     eax
    bsr     ecx, eax
    mov     eax, -32
    shl     eax, cl
    add     eax, esi
    add     eax, 32
    imul    rax, rdx
    add     rax, qword ptr [rdi + 8*rcx + 8]
    ret

In other words, memory will be the bottleneck, not the instructions to calculate where an index is. If you’re iterating over the items in order, you don’t need to call sa_get at all, you can loop over the segments and their items directly. I have a macro to make that easy in segment_array.h

Here’s the code to allocate a new item in the array:

u32 slots_in_segment(int segment_index) {
    return (1 << SMALL_SEGMENTS_TO_SKIP) << segment_index;
}

void *_sa_alloc(SegmentArrayInternal *sa, size_t item_size) {
    if (sa->count >= capacity_for_segment_count(sa->used_segments)) {
        size_t segment_size = item_size * slots_in_segment(sa->used_segments);
        sa->segments[sa->used_segments++] = malloc(segment_size);
    }

    sa->count++;
    return _sa_get(sa, sa->count-1, item_size);
}

I use the techniques from my last article to allow the Segment Array to store any type while also being type checked. This macro associates type information with the generic struct:

#define SegmentArray(type) \
    union { \
        SegmentArrayInternal internal; \
        type *payload; \
    }

And these macros use the payload member to pass the item sizes to the generic functions:

#define sa_get(sa, index) \
    ((typeof((sa)->payload))_sa_get(&(sa)->internal, \
                                    index,  \
                                    sizeof(*(sa)->payload)))
    
#define sa_alloc(sa) \
    (typeof((sa)->payload))_sa_alloc(&(sa)->internal, \
                                     sizeof(*(sa)->payload))

One final implementation note: A small change to the segment array makes its capacity always a power-of-two. All you have to do is make the first two segments the same size. It makes the code less nice, but can be useful if you don’t want it to waste ~50% of its memory when used as the backing array for a power-of-two-sized hash table2.

In Use

All together we get this nice API:

#define SEGMENT_ARRAY_IMPL
#include "segment_array.h"
#include <stdio.h>

int main(void) {
    typedef struct {int a,b; } Entity;

    SegmentArray(Entity) entities = {0};
    sa_add(&entities, (Entity){ 1,0 });
    sa_add(&entities, (Entity){ 2,0 });

    // getting one item
    printf("entities[0].a = %i\n", sa_get(&entities, 0)->a);

    // iterating all items
    sa_for(&entities) {
         printf("entities[%i].a = %i\n", it_index, it.a);
    }

    // only needed when not using an arena
    sa_free(&entities);

    return 0;
}

Comparison

When deciding if a Segment Array is the right fit for a given problem, it’s worth comparing to these other data structures:

  • Fixed Size Array - an array you don’t grow, either by calculating exactly how many items you’ll need ahead of time, or by calculating an upper limit.
  • Dynamic Array - standard array that grows by some multiple of its size (usually 1.5), and moves its items.
  • Chunked Linked List - a linked list that stores one or more items per link.
  • Hybrid Approach - when creating an unknown number of items all at once (such as when parsing a file) you create the items in a chunked linked list. Once you have all items, you copy them into a fixed sized array and free the linked list.
  • Virtual Memory Array - Use mmap (macOS, Linux) or VirtualAlloc (Windows) to reserve a huge amount of virtual address space and commit to physical memory as the array grows. This can’t be used on platforms without virtual memory like webassembly and embedded systems.

They each have different trade offs:


GrowableArena FriendlyRandom AccessContiguous
Fixed Size Array
Dynamic Array
Chunked Linked List
Hybrid Approach✅ (at creation)✅ (after creation)✅ (after creation)
Virtual Memory Array✅ (up to reservation)not really
Segment Array

Average memory efficiency:

  • Fixed Size Array: 100%
  • Dynamic Array: 83% with a 1.5x growth factor, 75% with a 2x growth factor
  • Chunked Linked List: nearly 100%. Depends on chunk size and item count.
  • Hybrid approach: 100%
  • Virtual Memory Array: 100% (give or take one page’s worth of memory)
  • Segment Array: 75%

Personally, I most often opt for the fixed sized array, or the hybrid approach. I’ve found the segment array useful in situations where you are generating an unknown number of items over time and you’re using an arena allocator - like in the build profiler I’m working on.

Conclusion

So that’s it! A growable, fast, stable, arena-friendly data structure that can be implemented in less than 120 lines. My single-header library implementation is here. Let me know if you use it or find issues!


  1. Useful for handles ↩︎

  2. Thanks to Andrew Reece for the power-of-two-size technique ↩︎

Get notified about my next article: