> Computers > Programming
Various Topics Home | Disclaimer | Report Adult Posts

Various Topics on Programming



Programming - "Another question about big floating point" in Computers


Old 06-12-2007   #1
..k..
 
Default Another question about big floating point

Hi.

I've been programming that bignum floating point thing I've talked
about here, and am wondering about this. See, I use the exponent as a
power of 4294967296 (ie. 2^32) instead of 2 so as to eliminate costly
bit shifting. But then we get to the following scenario:

Add (in hex-represented base 4294967296):

FC341204.56700000 x 100000000^0
+ FE901421.30000000 x 100000000^0
= 00000001.FAC42626 x 100000000^1 (rounded)

And bam! We just lose a whole 32-bit dword "digit" from the number's
precision. Does this imply what I think it does -- namely that the
size of the integer part does not contribute reliably to the amount of
preicision a floating point number actually has, only the amount of
digits to the right of the radix point (so the numbers above have
effectively only 32 bits of precision)? I'm just curious -- it doesn't
really add much cost to the addition/subtraction/etc. to use that big
integer digit, unlike what bit shifts would cost (they cost as much as
an entire addition of all the digits of the mantissas together
(yikes!)), anyway, and GMP does this and nobody seems to have
complained.

 
Old 06-12-2007   #2
..sc.. ..urguign..
 
Default Re: Another question about big floating point

mike3 <mike4ty4@yahoo.com> writes:

> Hi.
>
> I've been programming that bignum floating point thing I've talked
> about here, and am wondering about this. See, I use the exponent as a
> power of 4294967296 (ie. 2^32) instead of 2 so as to eliminate costly
> bit shifting. But then we get to the following scenario:
>
> Add (in hex-represented base 4294967296):
>
> FC341204.56700000 x 100000000^0
> + FE901421.30000000 x 100000000^0
> = 00000001.FAC42626 x 100000000^1 (rounded)
>
> And bam! We just lose a whole 32-bit dword "digit" from the number's
> precision. Does this imply what I think it does -- namely that the
> size of the integer part does not contribute reliably to the amount of
> preicision a floating point number actually has, only the amount of
> digits to the right of the radix point (so the numbers above have
> effectively only 32 bits of precision)? I'm just curious -- it doesn't
> really add much cost to the addition/subtraction/etc. to use that big
> integer digit, unlike what bit shifts would cost (they cost as much as
> an entire addition of all the digits of the mantissas together
> (yikes!)), anyway, and GMP does this and nobody seems to have
> complained.


Indeed. It's so true that IEEE floating point representations don't
even bother storing the digits before the point, not even 1.

00000001.FAC42626 x 100000000^42 is stored as (+,FAC42626,42)

Of course, it is easier to have always only 1 for digits before the
point when you use 2 as base for the exponent.

--
__Pascal Bourguignon__ http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
 
Old 06-12-2007   #3
..bertwesse.. ..hoo.c..
 
Default Re: Another question about big floating point

On Jun 12, 3:02 am, mike3 <mike4...@yahoo.com> wrote:
> Hi.
>
> I've been programming that bignum floating point thing I've talked
> about here, and am wondering about this. See, I use the exponent as a
> power of 4294967296 (ie. 2^32) instead of 2 so as to eliminate costly
> bit shifting. But then we get to the following scenario:
>
> Add (in hex-represented base 4294967296):
>
> FC341204.56700000 x 100000000^0
> + FE901421.30000000 x 100000000^0
> = 00000001.FAC42626 x 100000000^1 (rounded)
>
> And bam! We just lose a whole 32-bit dword "digit" from the number's
> precision. Does this imply what I think it does -- namely that the
> size of the integer part does not contribute reliably to the amount of
> preicision a floating point number actually has, only the amount of
> digits to the right of the radix point (so the numbers above have
> effectively only 32 bits of precision)? I'm just curious -- it doesn't
> really add much cost to the addition/subtraction/etc. to use that big
> integer digit, unlike what bit shifts would cost (they cost as much as
> an entire addition of all the digits of the mantissas together
> (yikes!)), anyway, and GMP does this and nobody seems to have
> complained.



Quite correct. That "wobbling precision" is one of the major
annoyances of the hex (base 16) float historically used by S/
360..zSeries mainframes, and for exactly the same reason - depending
on the value of the most significant digit, it contains between one
and four bits of precision, so a 32 bit float wobbles between 21 and
24 bits of precision in the mantissa (in your case it's 1-32 bits of
wobble). You need to take special care when writing functions because
the errors are ac***ulated somewhat irregularly. The bigger problem
in some cases is that it makes floats effectively too small for many
purposes, so you're forced to use doubles.

Probably your best bet is to add an extra word and treat the number as
having the minimum number of bits. Although you'll still get some
problems when you're expecting functions to converge, so some careful
analysis is in order, and you may need an extra term on some
functions.

 
Old 06-12-2007   #4
..k..
 
Default Re: Another question about big floating point

On Jun 12, 3:15 am, "robertwess...@yahoo.com"
<robertwess...@yahoo.com> wrote:
> On Jun 12, 3:02 am, mike3 <mike4...@yahoo.com> wrote:
>
>
>
>
>
> > Hi.

>
> > I've been programming that bignum floating point thing I've talked
> > about here, and am wondering about this. See, I use the exponent as a
> > power of 4294967296 (ie. 2^32) instead of 2 so as to eliminate costly
> > bit shifting. But then we get to the following scenario:

>
> > Add (in hex-represented base 4294967296):

>
> > FC341204.56700000 x 100000000^0
> > + FE901421.30000000 x 100000000^0
> > = 00000001.FAC42626 x 100000000^1 (rounded)

>
> > And bam! We just lose a whole 32-bit dword "digit" from the number's
> > precision. Does this imply what I think it does -- namely that the
> > size of the integer part does not contribute reliably to the amount of
> > preicision a floating point number actually has, only the amount of
> > digits to the right of the radix point (so the numbers above have
> > effectively only 32 bits of precision)? I'm just curious -- it doesn't
> > really add much cost to the addition/subtraction/etc. to use that big
> > integer digit, unlike what bit shifts would cost (they cost as much as
> > an entire addition of all the digits of the mantissas together
> > (yikes!)), anyway, and GMP does this and nobody seems to have
> > complained.

>
> Quite correct. That "wobbling precision" is one of the major
> annoyances of the hex (base 16) float historically used by S/
> 360..zSeries mainframes, and for exactly the same reason - depending
> on the value of the most significant digit, it contains between one
> and four bits of precision, so a 32 bit float wobbles between 21 and
> 24 bits of precision in the mantissa (in your case it's 1-32 bits of
> wobble). You need to take special care when writing functions because
> the errors are ac***ulated somewhat irregularly. The bigger problem
> in some cases is that it makes floats effectively too small for many
> purposes, so you're forced to use doubles.
>
> Probably your best bet is to add an extra word and treat the number as
> having the minimum number of bits. Although you'll still get some
> problems when you're expecting functions to converge, so some careful
> analysis is in order, and you may need an extra term on some
> functions.


One could reduce the granularity by using bytes instead of 32-bit
words,
but then the loop overhead increases dramatically, whereas adding on
one extra word to carry the integer part and ***uming the precision to
be only that of the size of the fraction does not do that. Especially
for
addition and subtraction, since they grow at O(n) the one word added
one does not really do much. Multiplication might be punished a bit
though since it grows at O(n^2) (although for enormous multiplies
you could use a Fourier transform algorithm that grows much slower at
O(n log(n))), but at sufficiently large sizes the tiny added cost gets
lost anyway in the large number of digits (dwords). For example, the
difference between 8000^2 and 8001^2 is 16,001, or around one part
in 4000. Tiny.


 
Old 06-12-2007   #5
..k..
 
Default Re: Another question about big floating point

On Jun 12, 3:03 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> mike3 <mike4...@yahoo.com> writes:
> > Hi.

>
> > I've been programming that bignum floating point thing I've talked
> > about here, and am wondering about this. See, I use the exponent as a
> > power of 4294967296 (ie. 2^32) instead of 2 so as to eliminate costly
> > bit shifting. But then we get to the following scenario:

>
> > Add (in hex-represented base 4294967296):

>
> > FC341204.56700000 x 100000000^0
> > + FE901421.30000000 x 100000000^0
> > = 00000001.FAC42626 x 100000000^1 (rounded)

>
> > And bam! We just lose a whole 32-bit dword "digit" from the number's
> > precision. Does this imply what I think it does -- namely that the
> > size of the integer part does not contribute reliably to the amount of
> > preicision a floating point number actually has, only the amount of
> > digits to the right of the radix point (so the numbers above have
> > effectively only 32 bits of precision)? I'm just curious -- it doesn't
> > really add much cost to the addition/subtraction/etc. to use that big
> > integer digit, unlike what bit shifts would cost (they cost as much as
> > an entire addition of all the digits of the mantissas together
> > (yikes!)), anyway, and GMP does this and nobody seems to have
> > complained.

>
> Indeed. It's so true that IEEE floating point representations don't
> even bother storing the digits before the point, not even 1.
>
> 00000001.FAC42626 x 100000000^42 is stored as (+,FAC42626,42)
>
> Of course, it is easier to have always only 1 for digits before the
> point when you use 2 as base for the exponent.
>


If you wnat to make it implicit, that is. However using 2 for a bignum
implementation is costly, since we need to do bit shifting. A bit
shift
operation takes about the same time as an addition of the mantissa
digits, so summing with a base-2 exponent actually takes two times
longer than to sum with a base-4294967296 one (one expensive
shift plus the addition itself). But since addition grows at only O(n)
the added digit to use base 4294967296 is a relatively small cost
for sufficiently large n, so any advantage the more expensive
shift method may have had gets lost. Even with multiplication.

I wrote a quick program that simulates performance with a 128-bit
floating point number, and it is indeed faster to do the work with
32 bits of integer and 128 bit of fraction, exponent base of
4294967296 than to do it with 1 bit of integer, 128 bits of fraction,
and base of 2, since no time-comsuming shifts are necessary.

> --
> __Pascal Bourguignon__ http://www.informatimago.com/
>
> NOTE: The most fundamental particles in this product are held
> together by a "gluing" force about which little is currently known
> and whose adhesive power can therefore not be permanently
> guaranteed.- Hide quoted text -
>
> - Show quoted text -



 
Old 06-13-2007   #6
..sc.. ..urguign..
 
Default Re: Another question about big floating point

mike3 <mike4ty4@yahoo.com> writes:

> On Jun 12, 3:03 am, Pascal Bourguignon <p...@informatimago.com> wrote:
>> mike3 <mike4...@yahoo.com> writes:
>> > Add (in hex-represented base 4294967296):

>>
>> > FC341204.56700000 x 100000000^0
>> > + FE901421.30000000 x 100000000^0
>> > = 00000001.FAC42626 x 100000000^1 (rounded)

>>
>> > And bam! We just lose a whole 32-bit dword "digit" from the number's
>> > precision. Does this imply what I think it does -- namely that the

>> [...]

> If you wnat to make it implicit, that is. However using 2 for a bignum
> implementation is costly, since we need to do bit shifting. A bit
> shift
> operation takes about the same time as an addition of the mantissa
> digits, so summing with a base-2 exponent actually takes two times
> longer than to sum with a base-4294967296 one (one expensive
> shift plus the addition itself). But since addition grows at only O(n)
> the added digit to use base 4294967296 is a relatively small cost
> for sufficiently large n, so any advantage the more expensive
> shift method may have had gets lost. Even with multiplication.
>
> I wrote a quick program that simulates performance with a 128-bit
> floating point number, and it is indeed faster to do the work with
> 32 bits of integer and 128 bit of fraction, exponent base of
> 4294967296 than to do it with 1 bit of integer, 128 bits of fraction,
> and base of 2, since no time-comsuming shifts are necessary.


Well if we are considering big numbers you can easily add one word of
precision to "damp" the wobbling.

That is, if you need N words of precision, have N words after the
point, and add one word for the integer part.

With a base-2 exponent, for normalized non null numbers we have the
invariant that the integer part x is 0<x<2^1 (hence x=1 always and can
be omited in the stored representation).

With a base-2^32 exponent, for normalized non null numbers you will
have the invariant that the integer part x is 0<x<2^32 (and therefore
you need one word to store it).

FC341204.5670000000000000 x 100000000^0
+ FE901421.3000000000000000 x 100000000^0
= 1.FAC4262586700000 x 100000000^1 (rounded)

--
__Pascal Bourguignon__ http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
 
Old 06-13-2007   #7
..k..
 
Default Re: Another question about big floating point

On Jun 13, 1:21 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> mike3 <mike4...@yahoo.com> writes:
> > On Jun 12, 3:03 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> >> mike3 <mike4...@yahoo.com> writes:
> >> > Add (in hex-represented base 4294967296):

>
> >> > FC341204.56700000 x 100000000^0
> >> > + FE901421.30000000 x 100000000^0
> >> > = 00000001.FAC42626 x 100000000^1 (rounded)

>
> >> > And bam! We just lose a whole 32-bit dword "digit" from the number's
> >> > precision. Does this imply what I think it does -- namely that the
> >> [...]

> > If you wnat to make it implicit, that is. However using 2 for a bignum
> > implementation is costly, since we need to do bit shifting. A bit
> > shift
> > operation takes about the same time as an addition of the mantissa
> > digits, so summing with a base-2 exponent actually takes two times
> > longer than to sum with a base-4294967296 one (one expensive
> > shift plus the addition itself). But since addition grows at only O(n)
> > the added digit to use base 4294967296 is a relatively small cost
> > for sufficiently large n, so any advantage the more expensive
> > shift method may have had gets lost. Even with multiplication.

>
> > I wrote a quick program that simulates performance with a 128-bit
> > floating point number, and it is indeed faster to do the work with
> > 32 bits of integer and 128 bit of fraction, exponent base of
> > 4294967296 than to do it with 1 bit of integer, 128 bits of fraction,
> > and base of 2, since no time-comsuming shifts are necessary.

>
> Well if we are considering big numbers you can easily add one word of
> precision to "damp" the wobbling.
>
> That is, if you need N words of precision, have N words after the
> point, and add one word for the integer part.
>


That's what I was thinking of. However this tends to make the
multiplication a bit slower, although for sufficiently large numbers
the difference becomes immaterial. The additions and subtractions
are made a bit slower with base 2 though since we have to carry
out a bit shift by a value that is not always a multiple of the word
size. However the multiplication grows faster so if what one is
doing with the bignum package requires a lot of multiplication and
less than that amount's worth of addition and subtraction the
penalty on those is drowned out.

> With a base-2 exponent, for normalized non null numbers we have the
> invariant that the integer part x is 0<x<2^1 (hence x=1 always and can
> be omited in the stored representation).
>
> With a base-2^32 exponent, for normalized non null numbers you will
> have the invariant that the integer part x is 0<x<2^32 (and therefore
> you need one word to store it).
>
> FC341204.5670000000000000 x 100000000^0
> + FE901421.3000000000000000 x 100000000^0
> = 1.FAC4262586700000 x 100000000^1 (rounded)
>


This all raises the question of which is preferable -- base
2 or base 2^32? Any advice on that?

> --
> __Pascal Bourguignon__ http://www.informatimago.com/
>
> NOTE: The most fundamental particles in this product are held
> together by a "gluing" force about which little is currently known
> and whose adhesive power can therefore not be permanently
> guaranteed.



 
Old 06-13-2007   #8
..sc.. ..urguign..
 
Default Re: Another question about big floating point

mike3 <mike4ty4@yahoo.com> writes:

> On Jun 13, 1:21 am, Pascal Bourguignon <p...@informatimago.com> wrote:
>> mike3 <mike4...@yahoo.com> writes:
>> > On Jun 12, 3:03 am, Pascal Bourguignon <p...@informatimago.com> wrote:
>> >> mike3 <mike4...@yahoo.com> writes:
>> >> > Add (in hex-represented base 4294967296):

>>
>> >> > FC341204.56700000 x 100000000^0
>> >> > + FE901421.30000000 x 100000000^0
>> >> > = 00000001.FAC42626 x 100000000^1 (rounded)

>>
>> >> > And bam! We just lose a whole 32-bit dword "digit" from the number's
>> >> > precision. Does this imply what I think it does -- namely that the
>> >> [...]
>> > If you wnat to make it implicit, that is. However using 2 for a bignum
>> > implementation is costly, since we need to do bit shifting. A bit
>> > shift
>> > operation takes about the same time as an addition of the mantissa
>> > digits, so summing with a base-2 exponent actually takes two times
>> > longer than to sum with a base-4294967296 one (one expensive
>> > shift plus the addition itself). But since addition grows at only O(n)
>> > the added digit to use base 4294967296 is a relatively small cost
>> > for sufficiently large n, so any advantage the more expensive
>> > shift method may have had gets lost. Even with multiplication.

>>
>> > I wrote a quick program that simulates performance with a 128-bit
>> > floating point number, and it is indeed faster to do the work with
>> > 32 bits of integer and 128 bit of fraction, exponent base of
>> > 4294967296 than to do it with 1 bit of integer, 128 bits of fraction,
>> > and base of 2, since no time-comsuming shifts are necessary.

>>
>> Well if we are considering big numbers you can easily add one word of
>> precision to "damp" the wobbling.
>>
>> That is, if you need N words of precision, have N words after the
>> point, and add one word for the integer part.
>>

>
> That's what I was thinking of. However this tends to make the
> multiplication a bit slower, although for sufficiently large numbers
> the difference becomes immaterial. The additions and subtractions
> are made a bit slower with base 2 though since we have to carry
> out a bit shift by a value that is not always a multiple of the word
> size. However the multiplication grows faster so if what one is
> doing with the bignum package requires a lot of multiplication and
> less than that amount's worth of addition and subtraction the
> penalty on those is drowned out.
>
>> With a base-2 exponent, for normalized non null numbers we have the
>> invariant that the integer part x is 0<x<2^1 (hence x=1 always and can
>> be omited in the stored representation).
>>
>> With a base-2^32 exponent, for normalized non null numbers you will
>> have the invariant that the integer part x is 0<x<2^32 (and therefore
>> you need one word to store it).
>>
>> FC341204.5670000000000000 x 100000000^0
>> + FE901421.3000000000000000 x 100000000^0
>> = 1.FAC4262586700000 x 100000000^1 (rounded)
>>

>
> This all raises the question of which is preferable -- base
> 2 or base 2^32? Any advice on that?


Shifting the data can be done for free on the pipelined processors we
have nowadays, if the operation is done in parallel to the others in
the loop. For multiplication, it wouldn't be needed. Therefore I
don't see that using base 2 should be slower than base 2^32.

On the other hand, adding a word to each number should not matter much
either. Indeed, for multiplication, it adds a O(n), but
multiplication is already O(nē) so it should not matter much. If you
have a bad C compiler, it might be easier to get fast results with
base 2^32 than base 2


--
__Pascal Bourguignon__ http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
 
Old 06-14-2007   #9
..k..
 
Default Re: Another question about big floating point

On Jun 13, 12:15 pm, Pascal Bourguignon <p...@informatimago.com>
wrote:
> mike3 <mike4...@yahoo.com> writes:
> > On Jun 13, 1:21 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> >> mike3 <mike4...@yahoo.com> writes:
> >> > On Jun 12, 3:03 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> >> >> mike3 <mike4...@yahoo.com> writes:
> >> >> > Add (in hex-represented base 4294967296):

>
> >> >> > FC341204.56700000 x 100000000^0
> >> >> > + FE901421.30000000 x 100000000^0
> >> >> > = 00000001.FAC42626 x 100000000^1 (rounded)

>
> >> >> > And bam! We just lose a whole 32-bit dword "digit" from the number's
> >> >> > precision. Does this imply what I think it does -- namely that the
> >> >> [...]
> >> > If you wnat to make it implicit, that is. However using 2 for a bignum
> >> > implementation is costly, since we need to do bit shifting. A bit
> >> > shift
> >> > operation takes about the same time as an addition of the mantissa
> >> > digits, so summing with a base-2 exponent actually takes two times
> >> > longer than to sum with a base-4294967296 one (one expensive
> >> > shift plus the addition itself). But since addition grows at only O(n)
> >> > the added digit to use base 4294967296 is a relatively small cost
> >> > for sufficiently large n, so any advantage the more expensive
> >> > shift method may have had gets lost. Even with multiplication.

>
> >> > I wrote a quick program that simulates performance with a 128-bit
> >> > floating point number, and it is indeed faster to do the work with
> >> > 32 bits of integer and 128 bit of fraction, exponent base of
> >> > 4294967296 than to do it with 1 bit of integer, 128 bits of fraction,
> >> > and base of 2, since no time-comsuming shifts are necessary.

>
> >> Well if we are considering big numbers you can easily add one word of
> >> precision to "damp" the wobbling.

>
> >> That is, if you need N words of precision, have N words after the
> >> point, and add one word for the integer part.

>
> > That's what I was thinking of. However this tends to make the
> > multiplication a bit slower, although for sufficiently large numbers
> > the difference becomes immaterial. The additions and subtractions
> > are made a bit slower with base 2 though since we have to carry
> > out a bit shift by a value that is not always a multiple of the word
> > size. However the multiplication grows faster so if what one is
> > doing with the bignum package requires a lot of multiplication and
> > less than that amount's worth of addition and subtraction the
> > penalty on those is drowned out.

>
> >> With a base-2 exponent, for normalized non null numbers we have the
> >> invariant that the integer part x is 0<x<2^1 (hence x=1 always and can
> >> be omited in the stored representation).

>
> >> With a base-2^32 exponent, for normalized non null numbers you will
> >> have the invariant that the integer part x is 0<x<2^32 (and therefore
> >> you need one word to store it).

>
> >> FC341204.5670000000000000 x 100000000^0
> >> + FE901421.3000000000000000 x 100000000^0
> >> = 1.FAC4262586700000 x 100000000^1 (rounded)

>
> > This all raises the question of which is preferable -- base
> > 2 or base 2^32? Any advice on that?

>
> Shifting the data can be done for free on the pipelined processors we
> have nowadays, if the operation is done in parallel to the others in
> the loop. For multiplication, it wouldn't be needed. Therefore I
> don't see that using base 2 should be slower than base 2^32.
>


How do you do it in parallel, anyway? You mean, like mix it in with
the addition loop (ie. do both operations in a single loop)?

> On the other hand, adding a word to each number should not matter much
> either. Indeed, for multiplication, it adds a O(n), but
> multiplication is already O(nē) so it should not matter much. If you
> have a bad C compiler, it might be easier to get fast results with
> base 2^32 than base 2
>
> --
> __Pascal Bourguignon__ http://www.informatimago.com/
>
> NOTE: The most fundamental particles in this product are held
> together by a "gluing" force about which little is currently known
> and whose adhesive power can therefore not be permanently
> guaranteed.- Hide quoted text -
>
> - Show quoted text -



 
Old 06-14-2007   #10
..k..
 
Default Re: Another question about big floating point

On Jun 13, 12:15 pm, Pascal Bourguignon <p...@informatimago.com>
wrote:
> mike3 <mike4...@yahoo.com> writes:
> > On Jun 13, 1:21 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> >> mike3 <mike4...@yahoo.com> writes:
> >> > On Jun 12, 3:03 am, Pascal Bourguignon <p...@informatimago.com> wrote:
> >> >> mike3 <mike4...@yahoo.com> writes:
> >> >> > Add (in hex-represented base 4294967296):

>
> >> >> > FC341204.56700000 x 100000000^0
> >> >> > + FE901421.30000000 x 100000000^0
> >> >> > = 00000001.FAC42626 x 100000000^1 (rounded)

>
> >> >> > And bam! We just lose a whole 32-bit dword "digit" from the number's
> >> >> > precision. Does this imply what I think it does -- namely that the
> >> >> [...]
> >> > If you wnat to make it implicit, that is. However using 2 for a bignum
> >> > implementation is costly, since we need to do bit shifting. A bit
> >> > shift
> >> > operation takes about the same time as an addition of the mantissa
> >> > digits, so summing with a base-2 exponent actually takes two times
> >> > longer than to sum with a base-4294967296 one (one expensive
> >> > shift plus the addition itself). But since addition grows at only O(n)
> >> > the added digit to use base 4294967296 is a relatively small cost
> >> > for sufficiently large n, so any advantage the more expensive
> >> > shift method may have had gets lost. Even with multiplication.

>
> >> > I wrote a quick program that simulates performance with a 128-bit
> >> > floating point number, and it is indeed faster to do the work with
> >> > 32 bits of integer and 128 bit of fraction, exponent base of
> >> > 4294967296 than to do it with 1 bit of integer, 128 bits of fraction,
> >> > and base of 2, since no time-comsuming shifts are necessary.

>
> >> Well if we are considering big numbers you can easily add one word of
> >> precision to "damp" the wobbling.

>
> >> That is, if you need N words of precision, have N words after the
> >> point, and add one word for the integer part.

>
> > That's what I was thinking of. However this tends to make the
> > multiplication a bit slower, although for sufficiently large numbers
> > the difference becomes immaterial. The additions and subtractions
> > are made a bit slower with base 2 though since we have to carry
> > out a bit shift by a value that is not always a multiple of the word
> > size. However the multiplication grows faster so if what one is
> > doing with the bignum package requires a lot of multiplication and
> > less than that amount's worth of addition and subtraction the
> > penalty on those is drowned out.

>
> >> With a base-2 exponent, for normalized non null numbers we have the
> >> invariant that the integer part x is 0<x<2^1 (hence x=1 always and can
> >> be omited in the stored representation).

>
> >> With a base-2^32 exponent, for normalized non null numbers you will
> >> have the invariant that the integer part x is 0<x<2^32 (and therefore
> >> you need one word to store it).

>
> >> FC341204.5670000000000000 x 100000000^0
> >> + FE901421.3000000000000000 x 100000000^0
> >> = 1.FAC4262586700000 x 100000000^1 (rounded)

>
> > This all raises the question of which is preferable -- base
> > 2 or base 2^32? Any advice on that?

>
> Shifting the data can be done for free on the pipelined processors we
> have nowadays, if the operation is done in parallel to the others in
> the loop. For multiplication, it wouldn't be needed. Therefore I
> don't see that using base 2 should be slower than base 2^32.
>
> On the other hand, adding a word to each number should not matter much
> either. Indeed, for multiplication, it adds a O(n), but
> multiplication is already O(nē) so it should not matter much. If you
> have a bad C compiler, it might be easier to get fast results with
> base 2^32 than base 2
>


Also, another question: Are there any other "quirks" of 2^32-base
floating point I should know about than just this one? For example,
is it more difficult to program mathematical functions other than
arithmetic (like sin, cos, tan, exp, log, etc.)? Like if we use a
Taylor series, we need to test the convergence. Does the
fluctuation in the number do something to make that harder?

> --
> __Pascal Bourguignon__ http://www.informatimago.com/
>
> NOTE: The most fundamental particles in this product are held
> together by a "gluing" force about which little is currently known
> and whose adhesive power can therefore not be permanently
> guaranteed.



 

Thread Tools
Display Modes





Powered by vBulletin®
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0