/*
$Id: README,v 1.2 2008/06/04 13:00:14 superlink Exp $

This code is an experimental code for computing  a single bucket as a part of Sum-product algorithm on NVIDIA GPU
using CUDA, and on SMPs using OpenMP.

There are several limitations to this code
1. It is assumed that the variables to sum over are present in all the functions
to be multiplied. 
2. Due to the lack of double precision on GPUs, the code cannot be used as is in
the inference of Bayesian networks.
3. The computations are performed in linear scale. Log-scale computations are
available in separate functions, but are not provided here.
4. Static loop unrolling for the fixed number of input functions is not provided
here
5. Data structures preparation is not fully optimized, and for small input sizes
is far from optimal
6. It works well with CUDA 1.0. NVCC coming with CUDA 1.1 for some reason
decided to generate 17 registers instead of 16, and this breaks the code.

If you are going to use this code, please cite
Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, John D. Owens
Efficient Computation of Sum-products on GPUs Through Software-Managed Cache
in Proceedings of the 22nd ACM International Conference on Supercomputing

Questions and comments are welcome:
Mark Silberstein, marks@cs.technion.ac.il

http://www.technion.ac.il/~marks
 */

