|
|||||
|
fast pathing a condvar is that condvars are ociated with a lock, so the closer you get to a situation where a fast path would make a difference, the more likely you will get contention on the lock. So Alexander was right about interlocked vs locked condvar implementations not making a difference, but not the way he thinks. (Actually interlocked condvars run way faster than locked condvar implementations for the no contention case. It just doesn't matter). The same can be said about semaphore futexes since only a small fraction of semaphore applications do not require an additional lock. Joe Seigh |
|
> fast pathing a condvar is that condvars are ociated with > a lock, so the closer you get to a situation where a fast > path would make a difference, the more likely you will get > contention on the lock. The only way I can think of to fast-path a condvar is to have it use a fast-pathed mutex, i.e CRITICAL_SECTION? -- The designer of the experimental, SMP and HyperThread friendly, AppCore library. |
|
SenderX wrote: > > > problem with > > fast pathing a condvar is that condvars are ociated with > > a lock, so the closer you get to a situation where a fast > > path would make a difference, the more likely you will get > > contention on the lock. > > The only way I can think of to fast-path a condvar is to have it use a > fast-pathed mutex, i.e CRITICAL_SECTION? > No, actually the way to fast path it is to not use locks at all. It's call an eventcount. I lied when I said I was working on a condvar. ![]() An eventcount is an event object that doesn't have the sharing problems win32 Events have. The condvar is emulated with a trival subroutine, i.e. DWORD WaitForEventCountUnlocked2(EventCount_t * ev, HANDLE hMutex, DWORD dwTimeout) { DWORD rc; int z; z = ResetEventCount(ev); // local reset ReleaseMutex(hMutex); rc = WaitForEventCount(ev, z, dwTimeout); // wait WaitForSingleObject(hMutex, INFINITE); return rc; } There's a version called WaitForEventCountUnlocked() which uses WaitForMultipleObjects() for wait morphing (doesn't buy you that much) but it's a lot more complicated since it has WaitForEventCount() logic modified somewhat. EventCount is optimized for signaling. The code for signaling with no waiters (fast path) is a membar, load, and branch not taken (plus call linkage). This would give you an alternative to semaphores to use with lock-free collections. You couldn't use condvars since they require locks. A futex semaphore uses interlocked instructions on the fast path. EventCount doesn't, so it's even faster. Joe Seigh |
|
A side note. Making win32 Events eventcounts internally would fix the problem of PulseEvent getting lost w.r.t. APC calls. It would be nice if Microsoft could make EventCount an external object if someone could figure out how to make HANDLE cope with "local" reset. Then I wouldn't have to use refcounted Events for the futex implementation. Is Microsoft interested in cleaning up their threading api? Joe Seigh |
|
> lock-free collections. You couldn't use condvars since they > require locks. A futex semaphore uses interlocked instructions > on the fast path. EventCount doesn't, so it's even faster. So an EventCount behaves just like a futex semaphore, but has a more efficient signaling. Is that about it? Sounds great! When your API is finished, will it be for sale? =P P.S. Could you elaborate on implementing wait-morphing in win32? You would have to have control over the waiter queues themselves right? If you don't have control over the waiter lists how could you do the following: Five threads blocked on the same mutex HANDLE in the WaitForXXX API. How would you morph them into blocking on, lets say a semaphore HANDLE, without waking any of them? I am probably way off in left field here, try not to flame me too bad... ![]() -- The designer of the experimental, SMP and HyperThread friendly, AppCore library. |
|
"SenderX" <> wrote in message news:5Hlgb.694227$ > Could you elaborate on implementing wait-morphing in win32? You would have > to have control over the waiter queues themselves right? > If you don't have control over the waiter lists how could you do the > following: > Five threads blocked on the same mutex HANDLE in the WaitForXXX API. How > would you morph them into blocking on, lets say a semaphore HANDLE, without > waking any of them? You don't let them all block in the same mutex. Although they may all think they're all blocking on the same thing, internally the API blocks each thread on its own mutex (or event, semaphore, whatever makes sense internally and the API wakes each thread when whatever conditions are necessary for the thread to make forward progress are satisfied. This has some advantages and some disadvantage. One advantage is that it makes wait morphing trivial, just change the logical conditions under which you wake the thread from 'blocked on condition' to 'blocked on mutex'. Another advantage is that you get to pick which thread to wake, so you can implement things like FIFO wake semantics if you want. A disadvantage is that you have to pick which thread to wake, and you may not have information the kernel might have (such as what CPU is available and which a thread last ran on) that might make your choice of threads to wake inefficient. DS |
|
> You don't let them all block in the same mutex. Although they may all
> think they're all blocking on the same thing, internally the API blocks each > thread on its own mutex (or event, semaphore, whatever makes sense > internally and the API wakes each thread when whatever conditions are > necessary for the thread to make forward progress are satisfied. > > This has some advantages and some disadvantage. One advantage is that it > makes wait morphing trivial, just change the logical conditions under which > you wake the thread from 'blocked on condition' to 'blocked on mutex'. > Another advantage is that you get to pick which thread to wake, so you can > implement things like FIFO wake semantics if you want. A disadvantage is > that you have to pick which thread to wake, and you may not have information > the kernel might have (such as what CPU is available and which a thread last > ran on) that might make your choice of threads to wake inefficient. That clears up a lot of questions I had... Thank DS! =) |
|
SenderX wrote: > > > This would give you an alternative to semaphores to use with > > lock-free collections. You couldn't use condvars since they > > require locks. A futex semaphore uses interlocked instructions > > on the fast path. EventCount doesn't, so it's even faster. > > So an EventCount behaves just like a futex semaphore, but has a more > efficient signaling. Is that about it? For the fast path part, yes. > > Could you elaborate on implementing wait-morphing in win32? You would have > to have control over the waiter queues themselves right? > > If you don't have control over the waiter lists how could you do the > following: > > Five threads blocked on the same mutex HANDLE in the WaitForXXX API. How > would you morph them into blocking on, lets say a semaphore HANDLE, without > waking any of them? > Well, on the slow path you need a win32 synchronization object to do the waiting with. So you have WaitForSingleObject(hEvent, dwTimeout); WaitForSingleObject(hMutex, INFINITE); which can be "morphed" into waitarray[0] = hEvent; waitarray[1] = hMutex; WaitForMultipleObjects(2, waitarray, TRUE, dwTimeout); with some conditional logic for handling timeouts, etc... and you have to reacquire that mutex no matter what. Eventcounts only do broadcasts, if that's what you are thinking about. I don't think signaling a single thread buys you that much except dubious logic. If there are legitimate cases, semaphores in conjunction with an eventcount can handle them without causing a performance hit to the 99.99% of cases that don't need signal. Joe Seigh |
|
> which can be "morphed" into > > waitarray[0] = hEvent; > waitarray[1] = hMutex; > WaitForMultipleObjects(2, waitarray, TRUE, dwTimeout); > I should add that I don't reset the Event object while it's in use, so I can use this trick. Some of the other condvar implementations reset their Event objects differently and can't use this technique. The Event object has to stay signaled until the Mutex is also signaled. Joe Seigh |
|
> Eventcounts only do broadcasts, if that's what you are thinking about.
> I don't think signaling a single thread buys you that much except dubious > logic. Does an EventCount use multi-event handles, or a single event and mutex for the waiters? Does this sound OK for wait-morphing?: One semaphore per-thread. You would use waitformulti... on the condvars mutex and a threads unique semaphore handle. The per-thread semaphore would be added to a condvars waitset, and a signal or broadcast to a condition would post to the semaphores in the wait-set. Waiting threads will acquire the condvar mutex and their semaphore signals in a single atomic op, waitformultu... Each per-thread semaphore would last for an entire threads life-span, so thread-local-storage would be used... Sounds like it should work... > I don't think signaling a single thread buys you that much except dubious > logic. Sometimes I find it useful to be able to send certain conditions to specific threads... ![]() -- The designer of the experimental, SMP and HyperThread friendly, AppCore library. |