Search This Blog

Thursday, 7 April 2022

Statistics

I'm picking a few bits out of my next book but three, Science is Like That.

Lord Rutherford is supposed to have said “If your experiment needs statistics, you ought to have done a better experiment”. Yet statistical analysis reveals the underlying truths in complex situations, the sort of messes that true physicists used to shy away from. It spoils the story a bit, but Rutherford once sat in on Horace Lamb’s lectures on mathematical statistics to improve his analysis of alpha particle deflections, a task which demanded some serious statistical work.

Once upon a time, simple patterns were solved by simple analysis, with simple mathematics revealing the laws that lay beneath the patterns. By the 19th century, nothing was quite so simple any more. The patterns were more complicated, and even physics needed statistics to help deal with the large masses of data. Most medical and biological research, all social science research and many other areas of modern scientific enquiry can only work by using statistics.

While modern statistics owe more to Karl Pearson, R. A. Fisher and J. B. S. Haldane, the first steps were taken by Adolphe Quételet, and then carried forward by Florence Nightingale. Quételet was a brilliant mathematician, who learned about probability from Pierre-Simon de Laplace while studying in Paris, before he returned to his native Belgium to run a new observatory there. While the observatory was being built, Quételet began exploring the ideas of ‘social physics’ and ‘moral statistics’.

He saw that there were many predictable sets of data. Crimes, suicides and marriages all involved individual free choice, but they happened at predictable rates in different age groups, giving him the starting point for his ‘moral statistics’.

Sad condition of the human race! We can tell beforehand how many will stain their hands with the blood of their fellow-creatures, how many will be forgers, how many poisoners, almost as one can foretell the number of births and deaths.
—Adolphe Quételet, Treatise on Man, 1835.

Florence Nightingale makes an excellent case study, because while we usually know her as a nurse who gained fame during the Crimean War, the Lady of the Lamp, few people are aware that after this middle-aged spinster returned to London in 1857, she used statistics to argue for better nursing.

First, she prepared a pamphlet, based on the report of a Royal Commission, about the Crimean war campaign, where Britain and France had fought Russia. Nightingale wanted to rally public support for nursing reforms.

The pamphlet showed where the problems lay, and her Mortality in the British Army, featured the first use of pictorial charts to present data, those charts with tiny wheat bags, or oil barrels or human figures lined up like so many paper dolls. She hammered away again in 1858 in her Report on the Crimea:

It is not denied that a large part of the British force perished from causes not the unavoidable or necessary results of war…(10,053 men, or sixty percent per annum, perished in seven months, from disease alone, upon an average strength of 28,939. This mortality exceeds that of the Great Plague)…The question arises, must what has here occurred occur again?

In 1858, Nightingale was elected to the newly formed Statistical Society and turned her attention to hospital statistics on disease and mortality in Britain. You could never, she said, discover trends unless figures were recorded in the same way. She prepared a plan, published in 1859, for uniform hospital statistics. Her aim was to compare the death rates for each disease in different hospitals, which could not be done without a standardised recording system.

Others could also be counted as part-founders of statistics. John Graunt published his Observations on the Bills of Mortality of the City of London in 1662. This work has sometimes been attributed to Sir William Petty, but George Udny Yule showed by statistical analysis (how else?) that the sentence length in Observations did not match known samples of Petty’s writing. Yule turns up again in chapter 5 (but you may have to buy the book to find out about that).

Graunt’s figures became the basis of the first life insurance tables, but he also revealed that for a small fee, a death from “French-pox” (syphilis) could be listed as “consumption” saving the family of the deceased much embarrassment, while hiding a medical truth. Before the 19th century, statistics were just numbers describing the state of a nation, and this is what Mark Twain had in mind when he spoke of “Lies, damned lies, and statistics”.

After 1860, statistics began to take on a whole new meaning, with a statistic becoming a summary figure for a large number of measurements, a way of getting a handle on complex data. To experienced eyes, the mean and standard deviation of a set of measures is a quick summary, though lay people may still say statistics cannot be trusted.

The simple fact is that figures don’t lie, but liars can figure. “Statistics” always need to be looked at carefully, but the use of statistics in science is fully justified. Statistical analysis can reveal such things as Burt’s fraudulent work on twins and inherited intelligence (chapter 9 of the book), or Mendel probably massaging his data, where he faked his data. Statistics can also reveal amazing patterns, laws and truths.

Statistics would end up being the glue which tied together evolution and genetics in the 1920s, helping biologists to understand what was going on in large populations. In time, ecology would absorb pattern analysis as a powerful tool, just as numerical methods would find a place in biological taxonomy and classification. Tied in with this were tests of significance in  sets of results, tests which provide an estimate of how likely numbers are to mean something.

It took statistics, wielded by epidemiologists, to prove what people suspected in the 19th century, that tobacco causes lung cancer and other diseases. You can trust statistics, if they are properly used. Mind you, in the data set <1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 10>, the mean is 3, the median is 2, and the mode is 1—and the average statistician won’t tell you about that!

Most modern scientific advances owe a great deal to statistical analysis, often in the form of correlation coefficients. Now if I can claim any special professional expertise aside from story-telling, it is to be found in the application of statistics, and in particular to the honest and dishonest uses of such statistics. I used statistical analysis to catch my frauds.

But that's another story...

No comments:

Post a Comment