Post by Chris Buckley
I thought I would go a little bit deeper into my worries about
the reliability of climate change models.
As I said, I do research in IR - Information Retrieval (think
of the algorithms underlying google search.) The focus of IR
is the construction of models of languages, and the testing of
how well those models perform in reality. So I've been doing
modeling for more than 30 years (mostly retired now).
My major worry about models is that there is a large gap
between the reliability of the models applied retrospectively,
where you know what the desired result is while constructing
the model, and applied predictively, where you don't. Much of
the work in IR infrastructure is setting up environments and
procedures so you can believe that the models tested work
predictively, not just retrospectively.
I can construct models, giving very plausible explanations
every step of the way, that do quite well when tested on a
particular environment (that I had in mind), but that do quite
poorly in general. My model is overfitted to that one
In the climate model case, overfitting might occur, for
example, if the modeler determined that including one
particular interaction at a particular strength meant that the
model better matched the historical record. In reality,
though, that improvement effect was due to the combination of
these other two interactions. By including the one
interaction in the model, the retrospective power of the model
was improved, but the predictive power was hurt.
I have absolutely no doubt that the danger of overfitting was
known to climate change modelers from the beginning, and that
they tried their best to avoid it. But I have absolutely no
doubt that they failed. There simply is not enough information
in the very incomplete record of a single planet's climate
that we currently have to distinguish between all of the
possible causes of effects in the historical record.
An example from IR showing the importance of information: In
IR, machine learning models (much easier to do for us than for
climate researchers!) were tried (eg Fuhr) on the early toy
collections of the 1980s, and the million document collections
of the early 90's and weren't any more successful than other
models. In the late 90's, with collections in the 10's of
millions of documents, they started to become a bit better. In
the 2000s, with test collections of a billion documents
machine learning models were clearly better even in academia,
and places like Google were showing just how good search could
be with document collections several orders of magnitude
larger, and trillions of sample searches.
So given that the climate models are overfitted (and thus can
do very well on the historical record while not doing as well
predictively), are they importantly flawed? Nobody knows.
It's extremely difficult to come up with a measure of
overfitting that means anything, and scientists are very
reluctant to spend time publishing all of the details of
potential weaknesses of their work.
One worrisome indication of problems is the number of
different papers that were published that had new model
variations that explained why global temperatures had not gone
up for 10 years (his was before it was decided that they had
gone up). If the basic models can be tweaked that readily,
and in that many ways, then they seem to have important
problems with reliability.
But I'm not a climate scientist and don't know the details at
all about the models. As I said, the weaknesses don't get
discussed in public in general by scientists. However, in this
particular case of climate change, with millions of lives at
risk no matter what gets decided, it's important to understand
those weaknesses. I'm very upset at climate scientists who
try to shut down forums that discuss climate model weaknesses.
out of the atmosphere.
Earth and the Sun. There are others that are even further out there.