Comparing Results Generated by Quartile Calculation Algorithms
The Experiment
The Java applet above illustrates the differences among the results generated by the
quartile calculation algorithms. It implements an experiment
that consists of taking 50,000 simple random samples of a specified size. For each sample, the first and third quartiles are
calculated using each of the three quartile calculation algorithms
resulting in six sets of sample quartiles:
-
Sample Q1 values, length = n-1
-
Sample Q3 values, length = n-1
-
Sample Q1 values, length = n
-
Sample Q3 values, length = n
-
Sample Q1 values, length = n+1
-
Sample Q3 values, length = n+1
A five-number summary is generated for each of these sets
of sample quartiles and displayed as a boxplot.
The samples are drawn from one of two populations. The values in the first population are normally
distributed with μ=0 and σ =1 (the standard normal distribution). The
first quartile for this population is -0.67449 and the third quartile is
0.67449. For a normal population, the distribution of sample quartiles
(first or third) is approximately normal.
The values in the second population are uniformly
distributed within the interval zero to one (0 ≤ x < 1). The
first quartile is 0.25 and the third quartile is 0.75. For a uniform
population, the distribution of sample quartiles (first or third) is
slightly skewed toward the median.
General Conclusions
The length = n-1 algorithm tends to yield Q1 values that
are too high and Q3 values that are too low. The length = n algorithm
tends to yield Q1 values that are a bit too high and Q3 values that are a
bit too low. Nevertheless, this algorithm tends to yield the most accurate
results. The length = n+1 algorithm, on the other hand, tends to yield Q1
values that are too low and Q3 values that are too high. As one would
expect, the differences among the results diminish as the sample size
increases.