There is a nuance in the behaviour of the umem port compared with umem on Solaris.
On Linux umem will not return memory back to the OS until umem fails to allocate a chunk. On failure, umem_reap() will be called automatically, to return memory to the OS. If your code is going to be running for a long time on Linux and mixes calls to different memory allocators (e.g.: malloc()) and umem, your code will need to call umem_reap() periodically.
This doesn't happen on Solaris, because malloc is replaced with umem calls, meaning that umem_reap() is called automatically.
based on usr/src/uts/common/os/kmem.c r1.64 from 2001/12/18
The slab allocator, as described in the following two papers:
Jeff Bonwick,
The Slab Allocator: An Object-Caching Kernel Memory Allocator.
Proceedings of the Summer 1994 Usenix Conference.
Available as /shared/sac/PSARC/1994/028/materials/kmem.pdf.
Jeff Bonwick and Jonathan Adams,
Magazines and
vmem: Extending the Slab Allocator to Many CPUs and
Arbitrary Resources.
Proceedings of the 2001 Usenix Conference.
Available as /shared/sac/PSARC/2000/550/materials/
vmem.pdf.
1. Overview
-----------
umem is very close to kmem in implementation. There are four major
areas of divergence:
* Initialization
* umem_update()
2. Initialization
-----------------
kmem is initialized early on in boot, and knows that no one will call
into it before it is ready. umem does not have these luxuries. Instead,
initialization is divided into two phases:
* library initialization, and
* first use
umem's full initialization happens at the time of the first allocation
request (via malloc() and friends, umem_alloc(), or umem_zalloc()),
or the first call to umem_cache_create().
umem_free(), and umem_cache_alloc() do not require special handling,
since the only way to get valid arguments for them is to successfully
call a function from the first group.
2.1. Library Initialization: umem_startup()
-------------------------------------------
umem_startup() is libumem.so's .init section. It calls pthread_atfork()
to install the handlers necessary for umem's Fork1-Safety. Because of
race condition issues, all other pre-
umem_init() initialization is done
statically (i.e. by the dynamic linker).
For standalone use,
umem_startup() returns everything to its initial
state.
------------------------------
The first time any memory allocation function is used, we have to
create the backing caches and
vmem arenas which are needed for it.
umem_init() is the central point for that task. When it completes,
to initialize, probably due to lack of memory).
There are four different paths from which
umem_init() is called:
The last three just check if umem is initialized, and call
umem_init()
if it is not. For performance reasons, the first case is more complicated.
-----------------------------------------------------------------
There is special case code in which causes any allocation on
flags argument.
umem_alloc_retry(). umem_alloc_retry() sees that the allocation
If initialization is successful, umem_alloc_retry() returns 1, which
the (now valid) cache pointer from umem_alloc_table.
2.2.2. Dealing with race conditions
-----------------------------------
There are a couple race conditions resulting from the initialization
code that we have to guard against:
that is passed for caches created during initialization. It
is illegal for a user to try to create a
UMC_INTERNAL cache.
This allows initialization to proceed, but any other
mask the cpu number. This prevents a race between grabbing a
cache pointer out of umem_alloc_table and growing the cpu array.
---------------
kmem uses the
CPU's sequence number to determine which "cpu cache" to
use for an allocation. Currently, there is no way to get the sequence
number in userspace.
umem keeps track of cpu information in umem_cpus, an array of
umem_max_ncpusumem_cpu_t structures. CURCPU() is a a "hint" function, which we then mask
with either umem_cpu_mask or cp->cache_cpu_mask to find the actual "cpu" id.
The mechanics of this is all in the
CPU(mask) macro.
Currently, umem uses _lwp_self() as its hint.
4. The update thread
--------------------
kmem uses a task queue, kmem_taskq, to do periodic maintenance on
every kmem cache.
vmem has a periodic timeout for hash table resizing.
The kmem_taskq also provides a separate context for kmem_cache_reap()'s
to be done in, avoiding issues of the context of kmem_reap() callers.
Instead, umem has the concept of "updates", which are asynchronous requests
for work attached to single caches. All caches with pending work are
for notification between threads.
4.1. Cache states with regards to updates
-----------------------------------------
A given cache is in one of three states:
Inactive cache_uflags is zero, cache_u{next,prev} are NULL
Work Requested cache_uflags is non-zero (but
UMU_ACTIVE is not
set),
cache_u{next,prev} link the cache onto the global
update list
Active cache_uflags has
UMU_ACTIVE set, cache_u{next,prev}
cache.
An update can be added to any cache in any state -- if the cache is
Inactive, it transitions to being Work Requested. If the cache is
Active, the worker will notice the new update and act on it before
transitioning the cache to the Inactive state.
If a cache is in the Active state,
UMU_NOTIFY can be
set, which asks
4.2. Update interface
---------------------
umem_add_update() adds an update to a particular cache.
umem_updateall() adds an update to all caches.
umem_remove_updates() returns a cache to the Inactive state.
4.3. Reaping
------------
When
umem_reap() is called (at the time of heap growth), it schedule
UMU_REAP updates on every cache. It then checks to see if the update
program currently has multiple threads,
umem_reap() attempts to create
a new update thread.
If the process is not multithreaded, or the creation fails,
umem_reap()
calls umem_st_update() to do an inline update.
4.4. The update thread
----------------------
* The timedwait times out, in which case it needs to run a global
update, or
it needs to check if there are any caches in the Work Requested
state.
After any work from the global update completes, the update timer is
updates self-throttling.
Reaps are similarly self-throttling. After a
UMU_REAP update has
been scheduled on all caches,
umem_reap() sets a flag and wakes up the
update thread. The update thread notices the flag, and resets the
reap state.
4.5. Inline updates
-------------------
If the update thread is not running, umem_st_update() is used instead. It
immediately does a global update (as above), then calls
any work generated by the global update. Afterwards, it resets the reap
state.
id of the thread performing the update.
4.6. Updates and fork1()
------------------------
umem has fork1() pre- and post-handlers which lock up (and release) every
fork1() only copies over a single lwp, other threads (including the update
thread) could have been actively using a cache in the parent. This
can lead to inconsistencies in the child process.
Because we locked all of the mutexes, the only possible inconsistancies are:
* a caller of umem_depot_alloc() could leak a magazine, and all the
buffers contained in it.
* a cache could be in the Active update state. In the child, there
would be no thread actually working on it.
* a umem_hash_rescale() could leak the new hash table.
* a umem_magazine_resize() could be in progress.
The memory leaks we can't do anything about. umem_release_child() resets
the update state, moves any caches in the Active state to the Work Requested
state. This might cause some updates to be re-run, but
UMU_REAP and
happen from umem_magazine_resize() is resizing the magazine twice in close
succession.
Much of the cleanup in umem_release_child() is skipped if
fork1() from a cache callback does not break. Needless to say, any such
application is tremendously broken.
----------------------------
Allocations against kmem and
vmem have two basic modes: SLEEP and
NOSLEEP. A sleeping allocation is will go to sleep (waiting for
more memory) instead of failing (returning NULL).
SLEEP allocations presume an extremely multithreaded model, with
a lot of allocation and deallocation activity. umem cannot presume
that its clients have any particular type of behavior. Instead,
it provides two types of allocations:
*
UMEM_DEFAULT, equivalent to KM_NOSLEEP (i.e. return NULL on
failure)
*
UMEM_NOFAIL, which, on failure, calls an optional callback
The callback is invoked with no locks held, and can do an arbitrary
amount of work. It then has a choice between:
to be restarted.
to be invoked with status. If multiple threads attempt to do
this simultaneously, only one will call exit(2).
* Doing some kind of non-local exit (thr_exit(3thr), longjmp(3C),
etc.)
To have these callbacks without risk of state corruption (in the case of
a non-local exit), we have to ensure that the callbacks get invoked
close to the original allocation, with no inconsistent state or held
locks. The following steps are taken:
* All constructor callbacks (which can themselves to allocations)
are passed
UMEM_DEFAULT as their required allocation argument. This
way, the constructor will fail, allowing the highest-level allocation
invoke the nofail callback.
If a constructor callback _does_ do a
UMEM_NOFAIL allocation, and
the nofail callback does a non-local exit, we will leak the
partially-constructed buffer.