# Wednesday, 26 April 2006

Did you know...

...that there are (more than) 21 algorithms to compute the factorial of a number?
I discovered this fact today, while implementing the Fisher's exact test for a 2x2 contingency table.
(For those interested: Fisher's exact test for a 2x2 contingency table, like the more famous Chi-square test, is used for the analysis of categorical frequency data: its aim is to determine whether the two categorical variables are associated.
We use it to extract the most significant genes from a gene set experiment: the two categories are "clue present/clue absent", while the two data sets are the control set and the experiment set)
Fisher exact test requires the computation of 9 factorials for each iteration, and one of them is on the total. Typically, applets and code you can found on the internet can handle data up to a total of about 100-200 before giving up due to overflow problems. Even code that uses libraries for arithmetic with arbitrary precision can handle numbers up to 6000-12000 before becoming unsuable.

Here, we are facing two problems:
  • numbers (integers and doubles) provided by the machine (physichal CPU or VM) have a finite precision and extension
  • the simple and dumb algorithm (the one that performs all the multiplications) is very SLOW
The solution is to adopt a good library that handles multiplications of very large numbers in an efficient way (using Karatsuba multiplication or a FFT-based multiplication), and forget about the naive algorithm. At this page you can find implementations and benchmarks for the 21 different algorithms I mentioned. The most efficients use prime numbers (or, better, primorials, i.e. the multiplication of the n first primes), but even the simple and elegant Split Recursive performs a lot better than the naive algorithm!

# Monday, 06 March 2006

Did you know...

...that there are (more than) 21 algorithms to compute the factorial of a number?
I discovered this fact today, while implementing the Fisher's exact test for a 2x2 contingency table.
(For those interested: Fisher's exact test for a 2x2 contingency table, like the more famous Chi-square test, is used for the analysis of categorical frequency data: its aim is to determine whether the two categorical variables are associated.
We use it to extract the most significant genes from a gene set experiment: the two categories are "clue present/clue absent", while the two data sets are the control set and the experiment set)
Fisher exact test requires the computation of 9 factorials, and one of them is on the total. Typically, applets and code you can found on the internet can handle data up to a total of about 100-200 before giving up due to overflow problems. Even code that uses libraries for arithmetic with arbitrary precision can handle numbers up to 6000-12000 before becoming unsuable.

Here, we are facing two problems:
  • numbers (integers and doubles) provided by the machine (physichal CPU or VM) have a finite precision and extension
  • the simple and dumb algorithm (the one that performs all the multiplications) is very SLOW
The solution is to adopt a good library that handles multiplications of very large numbers in an efficient way (using Karatsuba multiplication or a FFT-based multiplication), and forget about the naive algorithm. At this page you can find implementations and benchmarks for the 21 different algorithms I mentioned. The most efficients use prime numbers (or, better, primorials, i.e. the multiplication of the n first primes), but even the simple and elegant Split Recursive performs a lot better than the naive algorithm!