Bard's Gallery

glibc malloc implementation

Bergwolf TECH

I’ve been asked several time in interviews how libc implements malloc function. So here I give the a simple summary.

First, I’d like to copy some simple malloc implementation found from Internet.

simple explaintion of malloc:
struct mem_control_block {
 int is_available;
 int size;
};
1. If our allocator has not been initialized, initialize it.
2. Add sizeof(struct mem_control_block) to the size requested.
3. start at managed_memory_start.
4. Are we at last_valid address?
5. If we are:
A. We didn't find any existing space that was large enough
-- ask the operating system for more and return that.
6. Otherwise:
A. Is the current space available (check is_available from
the mem_control_block)?
B. If it is:
i) Is it large enough (check "size" from the mem_control_block)?
ii) If so:
a. Mark it as unavailable
b. Move past mem_control_block and return the pointer
iii) Otherwise:
a. Move forward "size" bytes
b. Go back go step 4
C. Otherwise:
i) Move forward "size" bytes
ii) Go back to step 4

However, the real-world in libc in much more complicated and thus much more efficient.
Since 2.3 release GNU C library (glibc) uses a modified ptmalloc2, based on “Doug Lea’s Malloc” (dlmalloc), which is a lock free implementation.

Quoting glibc/malloc/malloc.c:

  The main properties of the algorithms are:
  * For large (>= 512 bytes) requests, it is a pure best-fit allocator,
    with ties normally decided via FIFO (i.e. least recently used).
  * For small (<= 64 bytes by default) requests, it is a caching
    allocator, that maintains pools of quickly recycled chunks.
  * In between, and for combinations of large and small requests, it does
    the best it can trying to meet both goals at once.
  * For very large requests (>= 128KB by default), it relies on system
    memory mapping facilities, if supported

The differences between memory allocated by sbrk() and mmap() are:

  1. A released memory map does not create any “hole” that would need to be managed.
  2. mmaped regions must be page-aligned.
  3. invoking mmap and mfree is much slower than carving out an existing chunk of memory, because mmap will force kernel to zero out the memory it returns, which eats a lot of cache and memory bandwidth.

The basic memory structure in glibc malloc is:

struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free.  */
  struct malloc_chunk* bk_nextsize;
};

There are two core elements in dlmalloc: Boundary tags and bins.

Basically, boundary tags are how memory chunks look like, and bins are how memory chunks are managed. An allocated chunk looks like this (boundary tags):

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk, if allocated            | |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             User data starts here...                          .
            .                                                               .
            .             (malloc_usable_size() bytes)                      .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk                                     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk                            |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                         |P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Forward pointer to next chunk in list             |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Back pointer to previous chunk in list            |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Unused space (may be 0 bytes long)                .
            .                                                               .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Bins: chunk in a list is known to be preceeded and followed by either inuse chunks or the ends of memory.

Fastbins: A single-linked LIFO array of lists holding recently freed small chunks. Chunks in fastbins keep their inuse bit set. malloc_consolidate releases all chunks in fastbins and consolidates them with other free chunks.

the “unsorted” bin: chunks being placed on it in free (and malloc_consolidate), and taken off (to be either used or placed in bins) in malloc.

Binmap: a bitvector recording whether bins are definitely empty so they can be skipped over during during traversal.

Bergwolf
Everyday citizen, A gear