|
|||||
|
|
#11 |
|
|
> Also, another question: Are there any other "quirks" of 2^32-base > floating point I should know about than just this one? For example, > is it more difficult to program mathematical functions other than > arithmetic (like sin, cos, tan, exp, log, etc.)? Like if we use a > Taylor series, we need to test the convergence. Does the > fluctuation in the number do something to make that harder? Yes, a big wobble definitely complicates all of those. Consider the greatly increased error when you subtract two terms that are just to either side of a "wobble point". You'll get something like 31 bits of error at the right end, vs. a maximum of one for binary. You definitely need to consider that when writing the math functions, that error if incurred at the "large" end of the series can easily be larger than the smallest terms. However, in software you have the option of throwing an extra word of precision in, and while that may require an additional term or two to converge, that result rounded will be within the usual ULP that we hope for on transcendentals. Having a bit of extra room makes most of the transcendentals* easier to compute anyway, even with binary floats. This is especially true of some of the really hard functions to get right like pow(). It just gets harder when some of your intermediate results are actually shorter than the result you want to calculate, which is what happens when the wobble goes the wrong way. *Doing many of the transcendentals well is much harder than many people realize, where "well" is defined as being correct to within an ULP over the entire input range, and still having decent performance. If you don't mind curdling a few bits at the low end (or sometimes not so few), life is much easier. |
|
|
#12 |
|
|
> On Jun 13, 12:15 pm, Pascal Bourguignon <p...@informatimago.com> > wrote: >> Shifting the data can be done for free on the pipelined processors we >> have nowadays, if the operation is done in parallel to the others in >> the loop. For multiplication, it wouldn't be needed. Therefore I >> don't see that using base 2 should be slower than base 2^32. >> > > How do you do it in parallel, anyway? You mean, like mix it in with > the addition loop (ie. do both operations in a single loop)? You don't have anything special to have it executed in parallel, it's the processor and compiler who do it. http://en.wikipedia.org/wiki/Instruction_pipeline Unless you write the loop in ***embler yourself, in which case you may have to care about the order the instructions to help the processor better fill its pipeline. -- __Pascal Bourguignon__ http://www.informatimago.com/ NOTE: The most fundamental particles in this product are held together by a "gluing" force about which little is currently known and whose adhesive power can therefore not be permanently guaranteed. |
|
|
#13 |
|
|
<robertwess...@yahoo.com> wrote: > On Jun 13, 9:41 pm, mike3 <mike4...@yahoo.com> wrote: > > > Also, another question: Are there any other "quirks" of 2^32-base > > floating point I should know about than just this one? For example, > > is it more difficult to program mathematical functions other than > > arithmetic (like sin, cos, tan, exp, log, etc.)? Like if we use a > > Taylor series, we need to test the convergence. Does the > > fluctuation in the number do something to make that harder? > > Yes, a big wobble definitely complicates all of those. Consider the > greatly increased error when you subtract two terms that are just to > either side of a "wobble point". You'll get something like 31 bits of > error at the right end, vs. a maximum of one for binary. You > definitely need to consider that when writing the math functions, that > error if incurred at the "large" end of the series can easily be > larger than the smallest terms. > > However, in software you have the option of throwing an extra word of > precision in, and while that may require an additional term or two to > converge, that result rounded will be within the usual ULP that we > hope for on transcendentals. > > Having a bit of extra room makes most of the transcendentals* easier > to compute anyway, even with binary floats. This is especially true > of some of the really hard functions to get right like pow(). It just > gets harder when some of your intermediate results are actually > shorter than the result you want to calculate, which is what happens > when the wobble goes the wrong way. > > *Doing many of the transcendentals well is much harder than many > people realize, where "well" is defined as being correct to within an > ULP over the entire input range, and still having decent performance. > If you don't mind curdling a few bits at the low end (or sometimes not > so few), life is much easier. So it is easier then to do them with binary. |
|
|
#14 |
|
|
On Jun 14, 12:49 am, Pascal Bourguignon <p...@informatimago.com>
wrote: > mike3 <mike4...@yahoo.com> writes: > > On Jun 13, 12:15 pm, Pascal Bourguignon <p...@informatimago.com> > > wrote: > >> Shifting the data can be done for free on the pipelined processors we > >> have nowadays, if the operation is done in parallel to the others in > >> the loop. For multiplication, it wouldn't be needed. Therefore I > >> don't see that using base 2 should be slower than base 2^32. > > > How do you do it in parallel, anyway? You mean, like mix it in with > > the addition loop (ie. do both operations in a single loop)? > > You don't have anything special to have it executed in parallel, it's > the processor and compiler who do it. > > http://en.wikipedia.org/wiki/Instruction_pipeline > > Unless you write the loop in ***embler yourself, in which case you may > have to care about the order the instructions to help the processor > better fill its pipeline. > Well, I've decided to give the base-2 thing a try. This is what I've got so far: 1. Do part of the shift implictily with pointer gymanstics -- ie. if the shift amount is bigger than the word size, then do the word-shifting part with this. 2. Bit shift the result of step (1). 3. Do the addition. Now, there is a significant "gap" between these, during which other things are performed to prepare for the addition, and to handle the varying degrees of overlap between the parameters. It would be possible to move the shift in up against the addition part, but would this pipeline it even though the two loops are in separate functions? Would that function calling interfere with the pipelining? > -- > __Pascal Bourguignon__ http://www.informatimago.com/ > > NOTE: The most fundamental particles in this product are held > together by a "gluing" force about which little is currently known > and whose adhesive power can therefore not be permanently > guaranteed. |
|
|
#15 |
|
|
On Jun 14, 12:20 am, "robertwess...@yahoo.com"
<robertwess...@yahoo.com> wrote: > On Jun 13, 9:41 pm, mike3 <mike4...@yahoo.com> wrote: .... > *Doing many of the transcendentals well is much harder than many > people realize, where "well" is defined as being correct to within an > ULP over the entire input range, and still having decent performance. > If you don't mind curdling a few bits at the low end (or sometimes not > so few), life is much easier. And this applies regardless of whether or not the floating point is base-2 exponent or base-4294967296 right? |
|
|
#16 |
|
|
On Jun 14, 4:12 am, mike3 <mike4...@yahoo.com> wrote:
> On Jun 14, 12:20 am, "robertwess...@yahoo.com" > > *Doing many of the transcendentals well is much harder than many > > people realize, where "well" is defined as being correct to within an > > ULP over the entire input range, and still having decent performance. > > If you don't mind curdling a few bits at the low end (or sometimes not > > so few), life is much easier. > > And this applies regardless of whether or not the floating point > is base-2 exponent or base-4294967296 right? Correct. It really helps to have a few extra bits of precision for intermediate results. Say at least as many as the exponent size plus one or two would cover most situations, although pretty much any extra helps. Using pow() (IOW x**y) as an example, the obvious (and appealing*) transformation to exp(y*ln(x)) is a terrible idea since the log operation essentially moves all the exponent bits of X into the mantissa, discarding an equivalent amount of the original precision. Of course if you have enough extra mantissa bits available to avoid that loss (for example, if you're computing a single precision pow() with double precision intermediates), you're OK. The "problem" with large bases is that you get extra bits of precision discarded as the magnitude of the number changes, so you have more compensation to do since your mantissa on intermediates is often shorter than the nominal specification (yet you may have to produce a full length result). *since ln() and exp() have relatively simple and decently performing expansions available. |
|
|
#17 |
|
|
On Jun 14, 3:43 pm, "robertwess...@yahoo.com"
<robertwess...@yahoo.com> wrote: > On Jun 14, 4:12 am, mike3 <mike4...@yahoo.com> wrote: > > > On Jun 14, 12:20 am, "robertwess...@yahoo.com" > > > *Doing many of the transcendentals well is much harder than many > > > people realize, where "well" is defined as being correct to within an > > > ULP over the entire input range, and still having decent performance. > > > If you don't mind curdling a few bits at the low end (or sometimes not > > > so few), life is much easier. > > > And this applies regardless of whether or not the floating point > > is base-2 exponent or base-4294967296 right? > > Correct. It really helps to have a few extra bits of precision for > intermediate results. Say at least as many as the exponent size plus > one or two would cover most situations, although pretty much any extra > helps. > > Using pow() (IOW x**y) as an example, the obvious (and appealing*) > transformation to exp(y*ln(x)) is a terrible idea since the log > operation essentially moves all the exponent bits of X into the > mantissa, discarding an equivalent amount of the original precision. > Of course if you have enough extra mantissa bits available to avoid > that loss (for example, if you're computing a single precision pow() > with double precision intermediates), you're OK. > > The "problem" with large bases is that you get extra bits of precision > discarded as the magnitude of the number changes, so you have more > compensation to do since your mantissa on intermediates is often > shorter than the nominal specification (yet you may have to produce a > full length result). > > *since ln() and exp() have relatively simple and decently performing > expansions available. So, overall, in a toss-off between base 2 and base 4294967296, which is best? |
|
|
#18 |
|
|
On Jun 14, 9:12 pm, mike3 <mike4...@yahoo.com> wrote:
> On Jun 14, 3:43 pm, "robertwess...@yahoo.com" > > > > > > <robertwess...@yahoo.com> wrote: > > On Jun 14, 4:12 am, mike3 <mike4...@yahoo.com> wrote: > > > > On Jun 14, 12:20 am, "robertwess...@yahoo.com" > > > > *Doing many of the transcendentals well is much harder than many > > > > people realize, where "well" is defined as being correct to within an > > > > ULP over the entire input range, and still having decent performance. > > > > If you don't mind curdling a few bits at the low end (or sometimes not > > > > so few), life is much easier. > > > > And this applies regardless of whether or not the floating point > > > is base-2 exponent or base-4294967296 right? > > > Correct. It really helps to have a few extra bits of precision for > > intermediate results. Say at least as many as the exponent size plus > > one or two would cover most situations, although pretty much any extra > > helps. > > > Using pow() (IOW x**y) as an example, the obvious (and appealing*) > > transformation to exp(y*ln(x)) is a terrible idea since the log > > operation essentially moves all the exponent bits of X into the > > mantissa, discarding an equivalent amount of the original precision. > > Of course if you have enough extra mantissa bits available to avoid > > that loss (for example, if you're computing a single precision pow() > > with double precision intermediates), you're OK. > > > The "problem" with large bases is that you get extra bits of precision > > discarded as the magnitude of the number changes, so you have more > > compensation to do since your mantissa on intermediates is often > > shorter than the nominal specification (yet you may have to produce a > > full length result). > > > *since ln() and exp() have relatively simple and decently performing > > expansions available. > > So, overall, in a toss-off between base 2 and base 4294967296, which > is best? As always, it depends on your application. If you're dealing with more than about three words of precision in software, it's almost certainly worth it from a performance perspective to make the base the word size and take the hit of adding an extra word to deal with the wobble. Size consideration on shorter number may make you go the other direction. |
|
|
#19 |
|
|
On Jun 14, 9:51 pm, "robertwess...@yahoo.com"
<robertwess...@yahoo.com> wrote: > On Jun 14, 9:12 pm, mike3 <mike4...@yahoo.com> wrote: > > > > > > > On Jun 14, 3:43 pm, "robertwess...@yahoo.com" > > > <robertwess...@yahoo.com> wrote: > > > On Jun 14, 4:12 am, mike3 <mike4...@yahoo.com> wrote: > > > > > On Jun 14, 12:20 am, "robertwess...@yahoo.com" > > > > > *Doing many of the transcendentals well is much harder than many > > > > > people realize, where "well" is defined as being correct to within an > > > > > ULP over the entire input range, and still having decent performance. > > > > > If you don't mind curdling a few bits at the low end (or sometimes not > > > > > so few), life is much easier. > > > > > And this applies regardless of whether or not the floating point > > > > is base-2 exponent or base-4294967296 right? > > > > Correct. It really helps to have a few extra bits of precision for > > > intermediate results. Say at least as many as the exponent size plus > > > one or two would cover most situations, although pretty much any extra > > > helps. > > > > Using pow() (IOW x**y) as an example, the obvious (and appealing*) > > > transformation to exp(y*ln(x)) is a terrible idea since the log > > > operation essentially moves all the exponent bits of X into the > > > mantissa, discarding an equivalent amount of the original precision. > > > Of course if you have enough extra mantissa bits available to avoid > > > that loss (for example, if you're computing a single precision pow() > > > with double precision intermediates), you're OK. > > > > The "problem" with large bases is that you get extra bits of precision > > > discarded as the magnitude of the number changes, so you have more > > > compensation to do since your mantissa on intermediates is often > > > shorter than the nominal specification (yet you may have to produce a > > > full length result). > > > > *since ln() and exp() have relatively simple and decently performing > > > expansions available. > > > So, overall, in a toss-off between base 2 and base 4294967296, which > > is best? > > As always, it depends on your application. If you're dealing with > more than about three words of precision in software, it's almost > certainly worth it from a performance perspective to make the base the > word size and take the hit of adding an extra word to deal with the > wobble. Size consideration on shorter number may make you go the > other direction. I need to handle both (min. 64 bits, max. 8192 bits, all with a 32-bit word size (so that's 2 words all the way up to 256 words.)). Speed is the primary concern here. |
|
|
#20 |
|
|
mike3 <mike4ty4@yahoo.com> wrote:
> > So, overall, in a toss-off between base 2 and base 4294967296, which > is best? So, overall, in a toss-off between a Mack Truck and a Volkswagen Beetle, which is best? -Larry Jones Nobody knows how to pamper like a Mom. -- Calvin |
| Thread Tools | |
| Display Modes | |
|
|