boinc/doc/fortran_numerics.txt

[ Notes from Eric McIntosh at CERN on how to
  eliminate numerical discrepancies between platforms. ]

First I found a problem with data input on Windows using
an "old" Compaq Visual Fortran compiler. Approximately
1000 out of 16 million magnet errors were one bit too big
on the Windows system. This problem is apparently fixed with
"more modern" compilers, and my colleague Flrent Denichin
from Lyon says we could also have specified a larger number of
decimal digits to avoid this........

However I found that the Lahey Fortran compilers
produce identical results on Linux and Windows.
The company claims it strives for this but does
not guarantee it. I use compatible releases
of their compiler e.g. 5.7 on Windows and 6.1 on Linux
but am now in production with 7.1.1 on Windows and 6.2 on Linux.
The data input problem was thus resolved.

It is very important to note that the compiler disables
extended precision on Intel boxes and has an option to
generate compatible code for any Pentium. Lahey do NOT use
extended 80-bit precision, SSE, or Multiply/ADD in one
instruction, with the appropriate compiler switch settings,
and I make a statically linked executable. I also compile at 
the same optimisation level of course to avoid
differences due to different optimisation.

Given all this I was delighted, until I started finding
small numerical difference in a small percentage of runs.
This was relatively easy to spot, as even a difference of
1 in the least significant bit of the mantissa of an IEEE
floating-point number, will be magnified as the SixTrack
particles pass through ~10,000 computational steps of
each of up to one million turns.

To cut a long story short; I finally found that the culprits
were the exp and log functions. Certain parameters to these
functions produce a result which is 1 least significant bit different
between an IA-32 and an ATHLON AMD64. A WEB search uncovered the
crlibm, a library of Elementary functions developed at the
Ecole Normale Sperieur in Lyon (just a couple of hours
drive from Geneva!). I downloaded and tested this library,
and developed a Fortran interface and converted it for
Windows as well. (It had been developed using C on Linux.)
The library provides, sin, cos, sinh, cosh, tan, atan, log, log10 and
exp that I use. It offers rounding to nearest, or rounding up
or down. It is also optimised in the sense that it computes a
sufficient but minimum number of binary digits to produce
a correctly rounded result.

I also implemented some missing elementary functions in terms of
the others they provide; namely acos_rn, asin_rn, atan2_rn in
terms of atan_rn, where _rn implies round to nearest.

This library GUARANTEES to deliver the correctly rounded double
precision result on virtually any computer, and certainly on the
IEEE IA-32, AMD64 machines I am using. The results are also proven 
theoretically to be correct. This is a tremendous piece of work and to
me represents an enormous step forward in the history of computing.
The greatest advance since the invention of IEEE arithmetic itself.
(I have not yet verified on the Intel IA-64 due to the pressure of
work, but I will do, as soon as possible, and Lyon have certainly
tested it.)

My colleague Florent de Dinechen of ENS Lyon, whom we invited to CERN 
afterwards to lecture on floating-point arithmetic, points you to
http://lipforge.ens-lyon.fr/projects/crlibm/
where their work is described.

We shall make a joint presentation (I hope) at the
19th International Symposium on Distributed Computing
DISC 2005
Krakow, Poland, September 25-29, 2005.

and also at CHEP 06 in Mumbai.
* empty log message * svn path=/trunk/boinc/; revision=5503 2005-02-23 06:19:20 +00:00			`[ Notes from Eric McIntosh at CERN on how to`
			`eliminate numerical discrepancies between platforms. ]`

			`First I found a problem with data input on Windows using`
			`an "old" Compaq Visual Fortran compiler. Approximately`
			`1000 out of 16 million magnet errors were one bit too big`
			`on the Windows system. This problem is apparently fixed with`
			`"more modern" compilers, and my colleague Flrent Denichin`
			`from Lyon says we could also have specified a larger number of`
			`decimal digits to avoid this........`

			`However I found that the Lahey Fortran compilers`
			`produce identical results on Linux and Windows.`
			`The company claims it strives for this but does`
			`not guarantee it. I use compatible releases`
			`of their compiler e.g. 5.7 on Windows and 6.1 on Linux`
			`but am now in production with 7.1.1 on Windows and 6.2 on Linux.`
			`The data input problem was thus resolved.`

			`It is very important to note that the compiler disables`
			`extended precision on Intel boxes and has an option to`
			`generate compatible code for any Pentium. Lahey do NOT use`
			`extended 80-bit precision, SSE, or Multiply/ADD in one`
			`instruction, with the appropriate compiler switch settings,`
			`and I make a statically linked executable. I also compile at`
			`the same optimisation level of course to avoid`
			`differences due to different optimisation.`

			`Given all this I was delighted, until I started finding`
			`small numerical difference in a small percentage of runs.`
			`This was relatively easy to spot, as even a difference of`
			`1 in the least significant bit of the mantissa of an IEEE`
			`floating-point number, will be magnified as the SixTrack`
			`particles pass through ~10,000 computational steps of`
			`each of up to one million turns.`

			`To cut a long story short; I finally found that the culprits`
			`were the exp and log functions. Certain parameters to these`
			`functions produce a result which is 1 least significant bit different`
			`between an IA-32 and an ATHLON AMD64. A WEB search uncovered the`
			`crlibm, a library of Elementary functions developed at the`
			`Ecole Normale Sperieur in Lyon (just a couple of hours`
			`drive from Geneva!). I downloaded and tested this library,`
			`and developed a Fortran interface and converted it for`
			`Windows as well. (It had been developed using C on Linux.)`
			`The library provides, sin, cos, sinh, cosh, tan, atan, log, log10 and`
			`exp that I use. It offers rounding to nearest, or rounding up`
			`or down. It is also optimised in the sense that it computes a`
			`sufficient but minimum number of binary digits to produce`
			`a correctly rounded result.`

			`I also implemented some missing elementary functions in terms of`
			`the others they provide; namely acos_rn, asin_rn, atan2_rn in`
			`terms of atan_rn, where _rn implies round to nearest.`

			`This library GUARANTEES to deliver the correctly rounded double`
			`precision result on virtually any computer, and certainly on the`
			`IEEE IA-32, AMD64 machines I am using. The results are also proven`
			`theoretically to be correct. This is a tremendous piece of work and to`
			`me represents an enormous step forward in the history of computing.`
			`The greatest advance since the invention of IEEE arithmetic itself.`
			`(I have not yet verified on the Intel IA-64 due to the pressure of`
			`work, but I will do, as soon as possible, and Lyon have certainly`
			`tested it.)`

			`My colleague Florent de Dinechen of ENS Lyon, whom we invited to CERN`
			`afterwards to lecture on floating-point arithmetic, points you to`
			`http://lipforge.ens-lyon.fr/projects/crlibm/`
			`where their work is described.`

			`We shall make a joint presentation (I hope) at the`
			`19th International Symposium on Distributed Computing`
			`DISC 2005`
			`Krakow, Poland, September 25-29, 2005.`

			`and also at CHEP 06 in Mumbai.`