mikejsavage.co.uk • About • Archive • RSS • Thanks for blocking ads! Blocking ads owns: AdGuard for Safari / uBlock Origin for everything else
In a bounded lock-free multi-producer queue, pushing an element takes two steps. First you need to acquire a node for writing to avoid races with other producers, then you need to flag the node as fully written once you're done with it. Then to pop from the queue you check if the head node has been fully written, then try to acquire it if you're multi-consumer too.
(Look here or here for the details.)
One issue to note is that dequeues can fail, even though the queue is non-empty, in the sense that items have been successfully enqueued and not yet dequeued. Specifically this can happen:
It makes sense to guard such a queue with a semaphore. An obvious way to do that is to make the semaphore count how many elements are in the queue. You can't do that with a queue like this! If you write something like:
// producer
if( q.push( x ) )
sem_post( &sem );
// consumer
while( true ) {
sem_wait( &sem );
T y;
if( q.pop( &y ) )
// do things with y
}
but that's incorrect because you decrement sem
even when you haven't
dequeued anything, and the item pushed by P2 gets lost. So instead you
might try:
// producer
if( q.push( x ) )
sem_post( &sem );
// consumer
while( true ) {
T y;
if( q.pop( &y ) ) {
// do things with y
sem_wait( &sem );
}
}
but then you're spinning while you wait for P1 to finish.
We ran into a particularly nasty instance of the first case of this at
work. We have a queue which accepts various commands, and one of them is
a "flush" command where the sender goes to sleep and the consumer wakes
them up again (with an Event
) once the flush is done. So something
like this happened:
queueSem
.queueSem
, and waits on
flushEvent
.queueSem
, and
exits.queueSem
again.flushEvent
, C is waiting for P2 to
signal queueSem
, and we're toast.The fix is to make queueSem
count the (negative) number of consumer
threads that are asleep. The change is very simple:
// consumer
while( true ) {
T y;
if( q.pop( &y ) )
// do things with y
else
sem_wait( &sem );
}
Which avoids the deadlock thusly:
queueSem
.queueSem
, and waits on
flushEvent
.queueSem
, and
exits.flushEvent
, then waits on
queueSem
again.I'm not totally satisfied with that solution because I feel like I've misunderstood something more fundamental, something that will stop me running into similar problems in the future. Please email me if you know!
All in all writing an MPSC lock-free queue has been an enormous waste of
time courtesy of shit like this. We need to enqueue things from a signal
handler, which means locks and malloc are gone. Even so, since we're
single-consumer and we only push in signal handlers so I believe we
could have used a mutex to avoid races between producers and kept the
consumer lock-free. I wasn't sure if you could get signalled while still
inside a signal handler. FYI the answer is no, unless you set
SA_NODEFER
. I also don't know if pthread_mutex_lock
etc are
signal-safe, and of course if you try to Google it you just get pages of
trash about "you can deadlock if the thread holding the lock gets
signalled!!!!!". Presumably they are but I didn't want to risk it.