Introduction to Benchmarking
In this post, we will be discussing how to benchmark. This will be an experiment on how we can benchmark large amounts of samples (in the billions) using CPU time to measure performance. Also, this experiment is based around 3 approaches, Naive Algorithm, Lookup-based Algorithm, and Fixed-point Algorithm. I will be taking a sample and scaling the sample by multiplying a sample, randomly generated, with a volume factor and storing it into an array. In the next post titled “Benchmarking Results“, there will be a table with the data I have gathered from experimenting with the three algorithms using sample sizes in billions.
Predictions for Each Algorithm
The Naive Algorithm approaches our problem by taking our sample, and multiplying it with the volume factor. This means for each sample, the program will take an int, convert it to a floating point value when doing the multiplication, then converting it back to an int. This algorithm will have to perform these tasks on every sample. I think this will be the slowest algorithm out of the three.
The Lookup-based Algorithm creates a table by multiplying the sample with the volume factor once. Once the table is created, the algorithm will search for the result stored in the table based on the data received. I think this will be the fastest because once the table is created, a search requires no additional operations.
The Fixed-point Algorithm uses a fixed-point mathematical approach rather than a floating-point approach. This means that we are shifting the high bit to the right and removing the low bit to keep the decimal fixed at one point, which is setting the volume factor as an int, rather then a floating point and then doing the multiplication operation on each sample data. This removes the need for converting an int to floating point value then back to an int. I think this would be an alternative to the Naive Algorithm because it removes the conversion of an int to floating point which reduces the the truncate operation.
Challenges – Obtaining the right amount of Samples
To get a good benchmark, the sample sizes cannot be 1 millisecond. A few challenges occurred when trying to select a large enough data set to use as a benchmark, one of which was the storage size of the data type. In the program, we originally had a for loop index try to loop over 4,000,000,000 (4 billion) samples. This caused a segment fault because an int can only hold up to 2,000,000,000 (2 billion). We fixed this by changing the int to an unsigned int. We also tried to increase sample size to 20,000,000,000 (20 billion), which did not work when we had an unsigned int. Although we changed it to a long int, which can hold more then 20 billion, another segment fault occurred, RAM availability. The purpose of this experiment is to find a good enough sample size where we can compare the three algorithms mentioned, thus 4,294,967,295 (max size of unsigned int) samples should be enough.