Here's a quick question, for what shape distribution does the mean convey the least useful information? Well, there are many answers, but a prime candidate is a one-sided long-tail distribution of the type exhibited by journal citations. The mean, standard deviation and Pearson correlation, are all summary statistics developed for the two-sided normal distribution (exercise for the reader: what are their non-parametric equivalents?). Applying them to anything else is like putting lipstick on a pig (ok, a poor analogy, but it sounds funny :-), but this porcine paintjob is exactly the method used to calculate a journal's Impact Factor.
So, what's the problem? In the context of a one-sided long-tail distribution, the mean is highly sensitive to outliers, and thus almost useless. Let's take an example. Let's suppose there were 99 papers published and each was cited once giving an Impact Factor of 1.0 (99*1/99). Now let's suppose a single additional paper was published which garnered 100 citations. The Impact Factor of the journal is now (99*1 + 1*100)/100 = 2.0. So a single paper, an outlier, has doubled the Impact Factor.
But that wouldn't happen in practice, right? No - you don't get it. The distribution of journal citations has a shape that guarantees this to happen; all those impact factors you read are just measures of outliers. How about instead "the Outlier Factor", or better still the "Extreme Value Factor"?
Still don't believe me? Well, let's take a concrete example. Thomson ISI has just deigned to give J. Cheminf. its first impact factor with a value of 3.42. Let's say that 65 papers have been taken into account, so that's about 222 citations in total. Now let's enter an outlier into the mix, say the Open Babel paper published in Oct of last year. I would expect about 30 to 60 citations a year once it gets going (based on prior citations of the software, as well as experience with the GaussSum paper) - let's just say 50 for a round number, so 100 citations in the 2-year period included in an impact factor. This means that all else being equal, in one year's time the journal's Impact Factor will rise to 4.1, and in two years to 4.9.
I just hope those Avogadro guys don't publish another outlier. :-)