> Computers > Programming
Various Topics Home

Programming



Programming Computers win32 condvar futex - NOT!


Default win32 condvar futex - NOT!

I don't think you can get there from here. The problem with
fast pathing a condvar is that condvars are ociated with
a lock, so the closer you get to a situation where a fast
path would make a difference, the more likely you will get
contention on the lock. So Alexander was right about interlocked
vs locked condvar implementations not making a difference, but
not the way he thinks. (Actually interlocked condvars run
way faster than locked condvar implementations for the no
contention case. It just doesn't matter).

The same can be said about semaphore futexes since only a
small fraction of semaphore applications do not require an
additional lock.

Joe Seigh
Default Re: win32 condvar futex - NOT!

> problem with
> fast pathing a condvar is that condvars are ociated with
> a lock, so the closer you get to a situation where a fast
> path would make a difference, the more likely you will get
> contention on the lock.


The only way I can think of to fast-path a condvar is to have it use a
fast-pathed mutex, i.e CRITICAL_SECTION?

--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.




Default Re: win32 condvar futex - NOT!



SenderX wrote:
>
> > problem with
> > fast pathing a condvar is that condvars are ociated with
> > a lock, so the closer you get to a situation where a fast
> > path would make a difference, the more likely you will get
> > contention on the lock.

>
> The only way I can think of to fast-path a condvar is to have it use a
> fast-pathed mutex, i.e CRITICAL_SECTION?
>


No, actually the way to fast path it is to not use locks at all. It's
call an eventcount. I lied when I said I was working on a condvar.
An eventcount is an event object that doesn't have the sharing problems
win32 Events have.

The condvar is emulated with a trival subroutine, i.e.

DWORD WaitForEventCountUnlocked2(EventCount_t * ev, HANDLE hMutex, DWORD dwTimeout) {
DWORD rc;
int z;

z = ResetEventCount(ev); // local reset
ReleaseMutex(hMutex);
rc = WaitForEventCount(ev, z, dwTimeout); // wait
WaitForSingleObject(hMutex, INFINITE);

return rc;
}

There's a version called WaitForEventCountUnlocked() which uses
WaitForMultipleObjects() for wait morphing (doesn't buy you that
much) but it's a lot more complicated since it has WaitForEventCount()
logic modified somewhat.

EventCount is optimized for signaling. The code for signaling
with no waiters (fast path) is a membar, load, and branch not
taken (plus call linkage).

This would give you an alternative to semaphores to use with
lock-free collections. You couldn't use condvars since they
require locks. A futex semaphore uses interlocked instructions
on the fast path. EventCount doesn't, so it's even faster.

Joe Seigh
Default Re: win32 condvar futex - NOT!


A side note. Making win32 Events eventcounts internally would fix the
problem of PulseEvent getting lost w.r.t. APC calls. It would be nice
if Microsoft could make EventCount an external object if someone could
figure out how to make HANDLE cope with "local" reset. Then I wouldn't
have to use refcounted Events for the futex implementation.

Is Microsoft interested in cleaning up their threading api?

Joe Seigh
Default Re: win32 condvar futex - NOT!

> This would give you an alternative to semaphores to use with
> lock-free collections. You couldn't use condvars since they
> require locks. A futex semaphore uses interlocked instructions
> on the fast path. EventCount doesn't, so it's even faster.



So an EventCount behaves just like a futex semaphore, but has a more
efficient signaling. Is that about it?

Sounds great!

When your API is finished, will it be for sale?

=P


P.S.

Could you elaborate on implementing wait-morphing in win32? You would have
to have control over the waiter queues themselves right?

If you don't have control over the waiter lists how could you do the
following:

Five threads blocked on the same mutex HANDLE in the WaitForXXX API. How
would you morph them into blocking on, lets say a semaphore HANDLE, without
waking any of them?

I am probably way off in left field here, try not to flame me too bad...



--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.




Default Re: win32 condvar futex - NOT!


"SenderX" <> wrote in message
news:5Hlgb.694227$

> Could you elaborate on implementing wait-morphing in win32? You would have
> to have control over the waiter queues themselves right?


> If you don't have control over the waiter lists how could you do the
> following:


> Five threads blocked on the same mutex HANDLE in the WaitForXXX API. How
> would you morph them into blocking on, lets say a semaphore HANDLE,

without
> waking any of them?


You don't let them all block in the same mutex. Although they may all
think they're all blocking on the same thing, internally the API blocks each
thread on its own mutex (or event, semaphore, whatever makes sense
internally and the API wakes each thread when whatever conditions are
necessary for the thread to make forward progress are satisfied.

This has some advantages and some disadvantage. One advantage is that it
makes wait morphing trivial, just change the logical conditions under which
you wake the thread from 'blocked on condition' to 'blocked on mutex'.
Another advantage is that you get to pick which thread to wake, so you can
implement things like FIFO wake semantics if you want. A disadvantage is
that you have to pick which thread to wake, and you may not have information
the kernel might have (such as what CPU is available and which a thread last
ran on) that might make your choice of threads to wake inefficient.

DS


Default Re: win32 condvar futex - NOT!

> You don't let them all block in the same mutex. Although they may all
> think they're all blocking on the same thing, internally the API blocks

each
> thread on its own mutex (or event, semaphore, whatever makes sense
> internally and the API wakes each thread when whatever conditions are
> necessary for the thread to make forward progress are satisfied.
>
> This has some advantages and some disadvantage. One advantage is that

it
> makes wait morphing trivial, just change the logical conditions under

which
> you wake the thread from 'blocked on condition' to 'blocked on mutex'.
> Another advantage is that you get to pick which thread to wake, so you can
> implement things like FIFO wake semantics if you want. A disadvantage is
> that you have to pick which thread to wake, and you may not have

information
> the kernel might have (such as what CPU is available and which a thread

last
> ran on) that might make your choice of threads to wake inefficient.


That clears up a lot of questions I had...

Thank DS!

=)


Default Re: win32 condvar futex - NOT!



SenderX wrote:
>
> > This would give you an alternative to semaphores to use with
> > lock-free collections. You couldn't use condvars since they
> > require locks. A futex semaphore uses interlocked instructions
> > on the fast path. EventCount doesn't, so it's even faster.

>
> So an EventCount behaves just like a futex semaphore, but has a more
> efficient signaling. Is that about it?


For the fast path part, yes.

>
> Could you elaborate on implementing wait-morphing in win32? You would have
> to have control over the waiter queues themselves right?
>
> If you don't have control over the waiter lists how could you do the
> following:
>
> Five threads blocked on the same mutex HANDLE in the WaitForXXX API. How
> would you morph them into blocking on, lets say a semaphore HANDLE, without
> waking any of them?
>


Well, on the slow path you need a win32 synchronization object to do the
waiting with. So you have

WaitForSingleObject(hEvent, dwTimeout);
WaitForSingleObject(hMutex, INFINITE);

which can be "morphed" into

waitarray[0] = hEvent;
waitarray[1] = hMutex;
WaitForMultipleObjects(2, waitarray, TRUE, dwTimeout);

with some conditional logic for handling timeouts, etc...
and you have to reacquire that mutex no matter what.

Eventcounts only do broadcasts, if that's what you are thinking about.
I don't think signaling a single thread buys you that much except dubious
logic. If there are legitimate cases, semaphores in conjunction with
an eventcount can handle them without causing a performance hit to
the 99.99% of cases that don't need signal.

Joe Seigh
Default Re: win32 condvar futex - NOT!



> which can be "morphed" into
>
> waitarray[0] = hEvent;
> waitarray[1] = hMutex;
> WaitForMultipleObjects(2, waitarray, TRUE, dwTimeout);
>


I should add that I don't reset the Event object while it's
in use, so I can use this trick. Some of the other condvar
implementations reset their Event objects differently and
can't use this technique. The Event object has to stay
signaled until the Mutex is also signaled.

Joe Seigh
Default Re: win32 condvar futex - NOT!

> Eventcounts only do broadcasts, if that's what you are thinking about.
> I don't think signaling a single thread buys you that much except dubious
> logic.


Does an EventCount use multi-event handles, or a single event and mutex for
the waiters?


Does this sound OK for wait-morphing?:

One semaphore per-thread. You would use waitformulti... on the condvars
mutex and a threads unique semaphore handle.

The per-thread semaphore would be added to a condvars waitset, and a signal
or broadcast to a condition would post to the semaphores in the wait-set.

Waiting threads will acquire the condvar mutex and their semaphore signals
in a single atomic op, waitformultu...

Each per-thread semaphore would last for an entire threads life-span, so
thread-local-storage would be used...

Sounds like it should work...


> I don't think signaling a single thread buys you that much except dubious
> logic.


Sometimes I find it useful to be able to send certain conditions to specific
threads...



--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.





Thread Tools
Display Modes





Powered by vBulletin®
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0