On the use of statistics in considering ‘climate change’

I am not a statistician, and have never taken a course in the subject since my undergraduate days, though I have run seminars for postgraduate students about the uses of statistics in political science. I am best described as a counter, the sort of person who, while waiting for his meal in a restaurant, counts the number of diners, estimates the average cost of meals, estimates the number of employees and their average pay, and comes to a conclusion about the long-term survival of the restaurant. That’s arithmetic rather than statistics.

Over the years I’ve developed a degree of scepticism about the use of statistics in social science and elsewhere, notably climate science. I’ve written about some of that scepticism on this website, and it was what made me almost instantly suspicious about the forecasts of doom, when I began reading widely in the area around ten years ago. For example, I couldn’t believe that scientists thought you could talk seriously about changes to a temperature anomaly given to three decimal places when the accuracy of the instrument was at best one decimal place.

A couple of weeks ago, Judith Curry’s ‘Week in Review’ post sent me off to a number of other sites, one of them run by Matt Briggs, called Statistician to the Stars. I’ve been there before. Briggs (William M.) is a former meteorologist who did a PhD in statistics at Cornell, and has served as a professor of statistics. His interests are enormous, wider than mine, and he writes well, too.

What exercised him was a dispute in the Netherlands between sceptics and the orthodox. You can read it all here. The title, ‘Netherlands Temperature Controversy: Or, Yet Again, How Not To Do Time Series’ says it all, though I think the moral is wider than simply time series. I can’t do his essay justice in the space available here, because he used lots of stuff from the Dutch controversy. But I can summarise the several lessons he offers for those who want to use statistics to make their point. Here they are.

Lesson 1   Never homogenize.

Every time you move a thermometer, or make adjustments to its workings, you start a new series. The old one dies, a new one begins. If you say the mixed marriage of splicing the disjoint series does not matter, you are making a judgment. Is it true? How can you prove it? It doesn’t seem true on its face. Significance tests are circular arguments here. After the marriage, you are left with unquantifiable uncertainty.

Lesson 2   Carry all uncertainty forward.

If you make any kind of statistical judgment, which include instrument changes and relocations, you must always state the uncertainty of the resulting data. If you don’t, any analysis you conduct “downstream” will be too certain. Confidence intervals and posteriors will be too narrow, p-values too small, and so on.

Lesson 3 Look at the data.

The data are what you have (this is DA speaking). Don’t ignore outliers — they’re telling you something. What is it? Don’t homogenise (see lesson 1). Jennifer Marohasy has been criticising the Bureau of Meteorology for doing this (see, for example, here), and she is right to do so.

Lesson 4 Define your question.

Everybody is intensely interested in “trends”. What is a “trend”? That is the question, the answer of which is: many different things. It could mean (A) the temperature has gone up more often than it has gone down, (B) that it is higher at the end than at the beginning, (C) that the arithmetic mean of the latter half is higher than the mean of the first half, (D) that the series increased on average at more or less the same rate, or (E) many other things. Most statisticians, perhaps anxious to show off their skills, say (F) whether a trend parameter in a probability model exhibits “significance”.

All definitions except (F) make sense. With (A)-(E) all we have to do is look: if the data meets the definition, the trend is there; if not, not. End of story. Probability models are not needed to tell us what happened: the data alone is enough to tell us what happened (see lesson 3).

Lesson 5  Only the data are the data.

To create an anomaly is to replace the data with something that isn’t the data. It is common to take the average of each month’s temperature from 1961-1900 and subtract them from all the other months. What makes the interval 1961-1990 so special? Nothing at all. It’s ad hoc, as it always must be. What happens if you change this 30-year-block to another 30-year-block? There are all sorts of possibilities, and they can give you different answers.

Which is the correct one? None and all — what was your question again? (see Lesson 4). And that’s just the 30-year-blocks. Why not try 20 years? Or 10? Or 40? You get the idea. We are uncertain of which picture is best, so recalling Lesson 2, we should carry all uncertainty forward.

Lesson 6 The model is not the data.

The model most often used is a linear regression line plotted over the anomalies. Many, many other models are possible, the choice subject to the whim of the researcher. If you want to make a point about the data, you will find the model that does that best. But the model is not the data…

The conclusion

Now for the shocking conclusion. Ready?

Usually time series devotees will draw a regression line starting from some arbitrary point… and end at the last point available. This regression line is a model. It says the data should behave like the model; perhaps the model even says the data is caused by the structure of the model (somehow). If cause isn’t in it, why use the model?

But the model also logically implies that the data before the arbitrary point should have conformed to the model. Do you follow? The start point was arbitrary. The modeler thought a straight line was the thing to do, that a straight line is the best explanation of the data. That means the data that came before the start point should look like the model, too…

But they mostly don’t …

Read the original — it is great fun. And have a look at his ‘Fallacies’, too. I’ll do a post on them one day.

 

 

Join the discussion 11 Comments

  • JMO says:

    Don, you question what makes the interval 1961-1990 so special. The reason is it is the 3rd Climate Normal. The first 30-year Climate Normal started on January 1 1901.

    Towards the end of the 19th century a huge amount of weather observations had built up from numerous meterological stations scattered around the world; the value of which could only be achieved if the data were systematised. In 1873 the International Meteorological Organisation (IMIO) was formed.

    At the Warsaw meeting in 1935, the IMO members agreed on an international standard of comparison by which longer term climate change could be noticed from variability. They agreed a 30-year criterion as the appropriate time over which to average weather data. They also agreed the period 1901-1930 as the first Climate Normal.

    So it annoys me when I hear for alarmists, BOM and pseudo climate scientists referring to a decade or even shorter (eg this decade was warmer than the last or this month is the warmest in 5 years etc).

  • kvd says:

    “counts the number of diners, estimates the average cost of meals, estimates the number of employees and their average pay…” – that’s arithmetic

    “and comes to a conclusion about the long-term survival of the restaurant” – that’s lunacy.

    “That’s arithmetic rather than statistics” – no, it’s a single solitary datapoint, based upon very rough arithmetic – nothing more. But I hope the meal was nice.

    • Don Aitkin says:

      It’s not lunacy, because I don’t own the restaurant. or advise the owner. Rather it’s the sort of game I play in my head, in which counting and reasoning play a special part. There are nights in which the restaurant simply be paying for itself, and other nights when there is a queue outside the door. To play the game properly you’d need to be there every night for quite a time…

  • David says:

    Don
    I agree with what you and Briggs have written about the pitfalls of statistical analysis Lesson 1 thru 6. And yes some people manage these issue better than others. But in Lesson 7 Briggs says

    “The only reason to use statistics is to use models to predict data never before seen. If our anomaly regression or other modeled line was any good, it will make skillful forecasts. Let’s wait and see if it does.”

    The whole point of a predictive model is to predict, …. as in before the fact. To “wait and see” would be like me predicting the Melbourne Cup winner after the race had been run. I would obviously be correct. But what use is my prediction?

    If we want to predict then we need some sort of statistical model. We look at past
    observations and project to the future. We can then decided to either (i) do nothing or (ii) do something.

    • DaveW says:

      Hi David,
      I don’t know about you, but I would prefer place my bets only when a model has been demonstrated to be able to predict winners. You are free to use your money anyway you like, but I’d prefer that none of mine is thrown away on the forecasts of GCMs that seem to have no predictive value. Not wasting taxpayers’ money is not equivalent to ‘do nothing’ nor is using that money to address real problems.
      Cheers

      • David says:

        That’s fine, But you should realise you still “placing a bet” in the same way as someone who choses not to insure their house is also “placing a bet.”

        • DaveW says:

          Not really. It is more like I am declining to pay for the insurance on Al Gore’s mansion.

          Say I thought I could construct a model that predicted the stock market, but it wasn’t validated and recent stock price fluctuations looked nothing like the model output. Should I put down the downpayment for the mansion I’ve always wanted or insure my house? That is more like the bet.

  • […] warming and cooling. Yes, I know that time series are dubious constructs, because I wrote about it recently. The doubtful will argue that if CO2 is the villain, then there is something odd about the last […]

  • […] have written before about William Briggs, the American statistician, and have corresponded with him, too. He has now […]

Leave a Reply