Simon Donner posted about “climate data” today , what it is and what it isn’t. I tweeted that I disagree. I have to admit, that my brain omitted the term “climate” in the title, so my thoughts are not restricted to climate data.
His main point is, that we misuse the term “data”, for example, if we use constructs like “future climate data”. I had to google whether this is really used, but indeed it is (e.g., Google, Google Scholar).
To be clear, I (hopefully) would not use the term “future climate data”. However, there is data from a simulation/model/numerical experiment on future climate projections.
Like Simon Donner, I guess, it is good to start from a definition of data: Merriam-Webster online defines “data” as “factual information […] used as a basis for reasoning, discussion, or calculation”.
I omitted something there, the idea that “data” has to be measured or to put it broader has to be observed. I don’t think so. The dictionary also gives an abbreviated definition: “facts or information used usually to calculate, analyze, or plan something”.
So I disagree with Donner, when he writes, ‘[data …] was measured. […] they were observed and recorded. Without a time machine, there is no “data” about the future.’
Reading the dictionary definition and the entries on data and information in the wikipedia there is an interesting mixture of terms, for the one is data information, but for the other, data is “collected and analyzed to create information”.
It is true, there cannot be observations about the future, that’s not open to discussion. But nevertheless, the output from a model can be seen as data. The “numbers” put out by the model are uninterpreted information. They are information, information that has to be reasoned on, discussed and possibly used in further calculations for meaningful interpretation and inferences.
Eric Winsberg writes in the Stanford Encyclopedia of Philosophy on Computer Simulations in Science: “simulations are meant to replace experiments and observations as sources of data about the world because the relevant experiments or observations are out of reach, for principled, practical, or ethical reasons.” This may not be accepted definition, but rather Winsberg’s opinion.
So while, Donner wants to distinguish between “actual data” and “model output/results” I see model output and recorded observations (i.e. measurements) and their derived quantities both as different forms of data. In writing and talking and communicating scientists (we) have to be clear, which origin our “data” has, and to highlight the pecularities of each. Indeed, this clarity of origin and pecularities is sometimes/often missing. Nevertheless, a global mean temperature for the year 2300 from a earth system model is not less “data” than the temperature reading from the thermometer in my living room. I probably accept neither at face value.
Much of my disagreement may be hair-splitting.
My point of view is here: there are no first class “actual data” and second class “model results”. Members of both types of “data” are to a specific level reliable and can be used to derive information. It is not the distinction between “recorded measurements” and “model output” that may bridge the gap between the expectations from society and the power of science, but clarity about the abilities of specific “data” sets.