Fast algorithm computes histograms

Lukasz Sliwczynski, University of Mining and Metallurgy, Cracow, Poland -- EDN, 10/28/1999

A histogram is a convenient method of presenting statistical features of data; you can use a histogram for an estimate of the probability density function of a random phenomenon. To prepare a histogram entails dividing the interval occupied by data values into some number of smaller intervals and then counting the number of occurrences of data in each subinterval. You can easily generate histograms with the aid of standard mathematical packages, such as Matlab. The problem lies in the fact that calculating a histogram using the standard Matlab function hist{}from a large file of data (for example, 10 million data points) with high accuracy (for example, with 12-bit resolution) may take substantial time-many hours, or even days, depending on computer speed and memory. The problem is that hist{}treats its input data as "analog" data with almost infinite precision, and the routine must use program loops, which are very slow in Matlab.

However, you often perform measurements with the use of A/D conversion, with the result that the input data for the histogram-calculating procedure are discrete and that the smallest difference between any two data values is one. This fact allows you to write a special function for calculating the histogram. You can use the complete Matlab program for calculating histograms from discrete data (Listing 1). The program is intended for placement into a Matlab m-file, which creates a function from the routine. This function is accessible from the Matlab command window, just as any other function. Figure 1 illustrates the operation of the algorithm. The first line (1) of the program prepares input data and gives the data the form of the column vector g (Figure 1a). The second line (2) performs data sorting and arranges the data in ascending order (gs in Figure 1b). The difference vector gsdin Figure 1c has peaks only at those points where data changes; elsewhere, the vector has a value of zero. The distance between two consecutive peaks is equal to the number of data samples of the same values, and the height of a peak is equal to the distance between two consecutive numbers.

Next, the routine (4) calculates the indexes of data changes (Figure 1d) and calculates the values of data V (Figure 1e) appearing in input vector g (5). A "1" in this code line is necessary because a vector index cannot equal zero in Matlab. In the following step (7), the step vector Vindexes the output vector from the procedure N. This indexing operation is simple and fast because both indexes and data are natural numbers and belong to the same interval <0;2L-1>. At this point, the essential part of the procedure is finished. The numbers N(i) (Figure 1f) are equal to the total number of occurrences of data with the value i-1 in the input vector g. The last six lines of the program reduce the initial resolution of the A/D converter that you use to acquire data. To obtain faster execution of the program, you switch on this part of the routine only when necessary-for example, when W does not equal 1.

The effect of using this algorithm rather than the standard one in the Matlab package is significant. Using the standard function hist{}, the time to compute a histogram from 10,000 points is 110 sec; for the same data file with 12-bit resolution, the algorithm fasthist{}requires only 0.55 sec. These figures are based on using a 100-MHz Pentium machine with 33 Mbytes of RAM, running under Windows 95. Using fasthist{}, a histogram from a 10 million-point data file takes approximately a half-hour as opposed to more than a day using hist{}. These results occur using Matlab version 4.2. Version 5.2 gives somewhat different results: 68 sec for hist{}and 0.38 sec for fasthist{}, a 179-times reduction. (DI #2422)




ADVERTISEMENT

ADVERTISEMENT

Feedback Loop


Post a CommentPost a Comment

There are no comments posted for this article.

Related Content

 

By This Author

There are no additional articles written by this author.


ADVERTISEMENT

Knowledge Center



Technology Quick Links

EDN Marketplace


©1997-2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites