linux futex (fast user-space locking)
§ RTFM : manpage for futex(2)
NAME
futex - fast user-space locking
LIBRARY
Standard C library (libc, -lc)
SYNOPSIS
#include <linux/futex.h> /* Definition of FUTEX_* constants */
#include <sys/syscall.h> /* Definition of SYS_* constants */
#include <unistd.h>
long syscall(SYS_futex,
uint32_t *uaddr, // points to futex word
int futex_op,
uint32_t val,
const struct timespec *timeout, /* or: uint32_t val2 */
uint32_t *uaddr2,
uint32_t val3);
Note: glibc provides no wrapper for futex(), necessitating the use of syscall
futex
is always 32bit value (refered to by*uaddr
) and must be 4-byte aligned.- futex syscall takes process userspace virt addr, but it’s internally handled by the kernel via physical address (thus allowing sharing among different processes
FUTEX_OP FLAGS
FUTEX_PRIVATE_FLAG (can be used for all)
the futex is process-private and is not shared with another process. For
convenience some futex_op flags can have _PRIVATE suffix e.g.
FUTEX_WAIT_PRIVATE == FUTEX_WAIT_PRIVATE | FUTEX_PRIVATE_FLAG
FUTEX_CLOCK_REALTIME (check manual for applied futex_ops)
either the timeout is measure against the CLOCK_REALTIME clock or
(default / unset) CLOCK_MONOTONIC clock
FUTEX_WAIT and FUTEX_WAKE
PARAS
uaddr : points to the futex (for all futex_op)
FUTEX_WAIT
val : expected value of the futex (as poitned by uaddr)
timeout : (relative) timeout for the sleep; if NULL the call blocks
indefinitely until waken
uaddr2/val3 : ignored
FUTEX_WAKE
val : max number of waiters to wake up (INT_MAX to wake all)
timeout : ignored
uaddr2/val3 : ignored
if expected val doesn’t match actual value (via a atomic load in the kernel) the call will return immediately with EAGAIN (was EDOUBLELOCK in earlier versions)
FUTEX_WAKE_OP
PARAS - WAKE_OP
timeout : used as val2, (max waiters to wake for futex2)
uaddr2 : points to the second futex
val3 : compond value as folows
+---+---+-----------+-----------+
|op |cmp| oparg | cmparg |
+---+---+-----------+-----------+
4 4 12 12 <== # of bits
is equivalent to executing the following code atomically and totally ordered wrt. other futex operations on any of the the two supplied futex words
- saves the old futex2 value (pointed by uaddr2) and performs an operation to modity the value of futex2 (via a Read-Modify-Write atomic memory op).
- wakes up max val waiter on futex1 (*uaddr) like FUTEX_WAKE
- dependent on the results of a test of the old value of futex2, wakes up max val2 waiters
FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET (TODO)
§ userspace lock implementations with futex
share futex between processes via manual mmap()
.
Code from here
iaddr = mmap(NULL, /* no addr specs for the allocation */
sizeof(int) * 2, /* mapping for 2 futexes */
PROT_READ | PROT_WRITE, /* allow read write access */
MAP_ANONYMOUS | MAP_SHARED, /* anonymous mapping + shared */
-1, 0 /* fd = -1 for anon mapping, no offset */
)
futex1 = &iaddr[0];
futex2 = &iaddr[1];
pid = fork(); /* child inherits the shared anon mapping */
if (pid == -1) {/* fork error */}
if (cpid == 0) {/* child */ } else {/* parent */}
CODE EXAMPLES
impl MUTEX via FUTEX
Ulricht 1 lists in their paper a few mutex implementations via futex, which seem intuitive but turn out to be incorrect. Here are some recaps, refer to the paper for details
Mutex, Take 1 from 1
class mutex {
private:
int val;
public:
mutex () : val (0) { }
void lock () {
int c;
while ((c =atomic_inc (val)) != 0)
futex_wait (&val, c + 1);
}
void unlock () {
val = 0;
futex_wake (&val, 1);
}
};
pitfalls:
- livelock : two
lock()
calls repeatedly cause each other to fail. Since the val doesn’t match the old value before the sleep,futex_wait
call returns immediately with EAGAINAC: atomic_inc FW: futex_wait AC FW AC FW thread 1: ---XX------XX-[EAGAIN]-XX-----------------XX-[EAGAIN]- thread 2: ------XX------------------XX-[EAGAIN]--XX------------- AC FW AC
- int wraparound on futex val (since waiters unconditionally increment it)
unlock()
unnecessarily callsfutex_wake()
even when uncontested
Mutex, Take 3 from 1 (same as take 2 but optimized by replacing some
cmpxchg
with less expensive xchg
)
|
|
NOTE:
- the unlock shortpath is taken, if the lock was acquired at the first attempt (line 9)
- the
UNLOCKED
state doesn’t not imply whether there is a waiter. A new thread could cut in @ line 24, sees theUNLOCKED
state, takes the lock and setsval
toNO_WAITER
. The new thread would take the short path for unlock. But the (presumably interrupted)futex_wake()
@ line 25 would get called when it resumes, waking a waiting thread. Thus no starvation. - there is no waiter counter, therefore when a waiter is waken, given the lock and finishes the CS, it cannot know whether there are still other waiters.
- also, no need to wake more than one thread for a mutex.
§ TODOs
Priority-inheritance futex (TODO)
-
Futexes Are Tricky, Ulrich Drepper, https://dept-info.labri.fr/~denis/Enseignement/2008-IR/Articles/01-futex.pdf ↩︎ ↩︎ ↩︎