
Blog Archive - February 2008
Tropical cyclone history - part II: Paleotempestology still in its infancy
While analyzing tropical cyclone records is difficult enough (see 'Tropical cylone history - part I'), it is even more challenging to reliably estimate hurricane activity back in time. Recently, Nature published an attempt to reconstruct past major hurricane activity back to 1730 (Nyberg et al. 2007). The authors concluded that the phase of enhanced hurricane activity since 1995 is not unusual compared to other periods of high hurricane activity in the record and thus appears to represent a recovery to normal hurricane activity. The paper was advertised in a press release put out by Nature and received broad media attention.
Although the approach outlined by the authors is interesting, the study contains in my view a number of problems, as outlined in a comment published in Nature today (Neu 2008):
The authors use a couple of coral records and a marine sediment core from the Caribbean to reconstruct first wind shear and then major hurricane activity in the tropical Atlantic. First they find a good correlation of their proxy records to wind shear measurements in the mean hurricane development region (MDR). There are two interesting features here: the coral proxies show a negative correlation to wind shear over the MDR, but a positive correlation north of it. Thus, in relation to hurricane activity there is an opposite effect of wind shear in different regions (Fig. 2 of Nyberg et al.). We’ll come back to that later. The sediment proxy shows a positive correlation to wind shear over the MDR and no correlation north of it. Thus, since the two proxy records are correlated to wind shear in the opposite direction, one would expect that the proxies show opposite patterns and trends. This is more or less true for the period 1950-1990 used to calibrate (or ‘train’) the statistical model. However, for the preceding 230 years, the long-term trend is the same for both proxies. This clearly indicates, that at least one of the proxies has a wrong long-term trend, because they show an opposite long-term trend of wind shear. Thus the long-term trend of a reconstruction using both proxies does not seem very reliable. Nyberg et al. did not comment on this basic problem in their reply (Nyberg et al. 2008).
For the reconstruction of past major hurricane activity they use these two proxies again together with a proxy for SSTs. Besides the problem of two proxies with opposite long-term trend, the coral proxy has, as mentioned, an opposite correlation to wind shear within and outside the MDR. Since 1944 about 50% of major Atlantic hurricanes have reached major hurricane strength only outside the area of positive correlation (in the area north of 20ºN between 50 to 75ºW and north of 25ºN outside this section), i.e. in an area with no or an inverse correlation to the proxies. Since wind shear seems to be high in this area at times when it is low in the MDR, the relation of major Atlantic hurricane frequency to the coral proxy does not seem so evident from physical considerations. Moreover, the fraction of major hurricanes observed outside the MDR varies with time (Figure 1) and might have changed over the last decade, which further complicates the relationship.

Figure 1. Major hurricanes outside the MDR. The annual number of major hurricane tracks where major hurricane status is only reached north of 25ºN or north of 20ºN between 50-75ºW (red line) and the total annual number of major hurricanes in the Atlantic (blue line), shown both as 5-year-running mean. Data is from the NOAA National Hurricane Center best track data set (HURDAT).
Although the correlation patterns presented by Nyberg et al. suggest a good correlation of their Caribbean proxies to wind shear over a large part of the Northern tropical Atlantic, this does not mean automatically that the correlation of the proxies to total Atlantic major hurricane activity is representative for the whole Atlantic as well. Firstly, hurricane activity is influenced by other factors than wind shear (which might vary over different areas), and secondly the effect of wind shear evidently is opposite in different regions.
There is a correlation of the proxies to major Atlantic hurricane activity over the period 1946-1995, however, the large activity increase already starts more than 5 years earlier in the proxies than in the hurricane record. The authors explain this time lag by the El Niño in the early 1990ies, however, since the influence of El Niño partly works over wind shear, this influence should be already included in the proxies.
Neither the reconstruction nor the calibration period covers the strong increase in major hurricane activity in the 1990ies, but it is obvious from observed wind shear, and recognized by the authors, that this increase cannot be explained by wind shear and is probably due to the increase in SSTs as several studies have pointed out (e.g. Hoyos et al. 2006). Since the training period of their reconstruction model covers a phase, where the main variation of hurricane activity seems to be mostly through wind-shear (1946-1990), but not the following phase, where SST changes are the most important factor, the model probably does not represent the influence of SSTs very well.
Now we come back to the question of the reliability of the hurricane record. One of the big problems of the Nyberg et al. reconstruction is, that it is far off the hurricane record before 1944 (Figure 3 in Nyberg et al). For the period 1850-1944, the reconstruction shows more than 3 (about 3.3) major hurricanes per year on average, while the hurricane record has less than 1.5 per year. Nyberg et al (2007) explain this discrepancy by the unreliability of the hurricane record. This explanation seems too simple. There are, as discussed above, uncertainties in the hurricane record. These are, however, not infinitely large. I have tried to estimate an upper error bar for major hurricane activity. Since there are no estimations of underreporting biases of major hurricanes, I assumed a constant proportion of major hurricanes to total tropical storm number. As Holland and Webster (2007) have shown, this proportion has some multidecadal variation, but these variations are not very large on the 50 year time scale, and there seems to be no significant long-term trend. I used the highest reporting bias estimations of tropical storms by Landsea (2004, 2007) - which are probably too high, as discussed in 'Tropical cyclone history - Part I', i.e. 1851-1885 plus 3 tropical storms per year on average; 1885-1965 plus 2; 2003-2006 minus 1; see Figure 2, black line).

Figure 2. Major hurricanes compared to the total number of tropical storms. The red line shows the 5-year moving average of the ratio between major hurricanes and the total number of tropical storms calculated from the best track data of the NOAA National Hurricane Center (HURDAT). The black line shows the ratio after correction of the observational bias proposed by Landsea et al. The green line shows the ratio after correction for a possible observational bias of major hurricanes before 1910.
Nyberg et al. claim that Landsea suggests 0-6 additional storms per year before 1885 and 0-4 per year before 1900 and thus the ‘upper limit’ correction should be 6 and 4 storms, respectively. However, the range given by Landsea represents the annual variation of the bias, not an uncertainty range of the average (I very much doubt that Landsea thinks a zero correction on average to be possible).
The average ratio from 1910 to 1965, i.e. before the satellite area, is the same as after 1965 (21%). Thus there is no evidence to assume a significant underestimation of major hurricanes. However, for the period before 1910 a correction of plus 1 major hurricane per year is needed to attain a similar mean ratio (21% major hurricanes) as in the satellite period (Fig. 2, green line). The adjusted record has a mean of just 2 major hurricanes per year between 1851 and 1940. Thus even if taking into account a high observation bias, the Nyberg et al (2007) reconstruction overestimates major hurricane frequency before 1940 by at least 50%. Observation uncertainties therefore do not explain the mismatch of reconstruction and observation as supposed by the authors.
Even if using an extreme correction of plus 5 TCs before 1900, the corrected record does not overlap the error bars given by Nyberg et al (2007). Or, looking the other way round, to get major hurricane frequency as shown by the reconstruction, the observational bias prior to 1900 would have to be in the order of 10 tropical storms per year on average, which is not very likely.
In summary, there are serious doubts about the representativeness of the proxies used by Nyberg et al., and there is a clear mismatch of reconstruction and earlier observation. Therefore the conclusions drawn in that paper are on very weak ground.
Well, what can we conclude from the discussion: Paleo-tempestology is a brand new field of study and there is undoubtedly a long way to go before the reconstruction of extreme events (like hurricanes) in the past will be anything more than suggestive. However, there's a lot more data out there waiting to be collected and analysed.
References:
Holland, G.J., and P.J. Webster (2007): Heightened tropical cyclone activity in the North Atlantic: natural variability or climate trend? Philos. Trans. R. Soc. Ser. A, 365, 2695– 2716, doi:10.1098/rsta.2007.2083.
Hoyos, C.D., P.A. Agudelo, P.J. Webster, & J.A. Curry (2006). Deconvolution of the Factors Contributing to the Increase in Global Hurricane Intensity. Science, 312, 94-97.
Landsea, C. W. (2007), Counting Atlantic Tropical Cyclones Back to 1900. EOS, 18, 197-208.
Landsea, C. W., C. Anderson, N. Charles, G. Clark, J. Dunion, J. Fernandez-Partagas, P. Hungerford, C. Neumann, and M. Zimmer (2004), The Atlantic hurricane database re-analysis project: Documentation for the 1851–1910 alterations and additions to the HURDAT database, in Hurricanes and Typhoons: Past, Present and Future, edited by R. J. Murname and K.-B. Liu, pp. 177–221, Columbia Univ. Press, New York.
Neu, U. (2008): Is recent hurricane activity normal? Nature, 451, E5 (21 February 2008)
Nyberg, J., B.A. Malmgren, A. Winter, M.R. Jury, K.H. Kilbourne & T.M. Quinn (2007): Low Atlantic hurricane activity in the 1970s and 1980s compared to the past 270 years. Nature, 447, 698-701.
Nyberg, J., B.A. Malmgren, A. Winter, M.R. Jury, K.H. Kilbourne & T.M. Quinn (2008): Reply to ‘Is recent hurricane activity normal?’ Nature, 451, E6 (21 February 2008)
Tropical cyclone history - part I: How reliable are past hurricane records
When discussing the influence of anthropogenic global warming on hurricane or tropical cyclone (TC) frequency and intensity (see e.g. here, here, and here), it is important to examine observed past trends. As with all climate variables, the hurricane record becomes increasingly uncertain when we go back in time. However, the hurricane record has some peculiarities: hurricanes are highly confined structures, so you have to be at the right place at the right time to observe them. Secondly, hurricanes spend most of their life in the open oceans, i.e. in regions where there are very few people and no fixed observations. This means that the reliability of the long-term hurricane record is dependent on who was measuring them, and how, at any given time. The implementation of new observation methods, for example, might have altered the quality of the record considerably. But how much? This crucial question has been widely discussed in the recent scientific literature (e.g. Chang and Guo 2007, Holland and Webster 2007, Kossin et al. 2007, Landsea 2007, Mann et al. 2007). Where do we stand at the moment? This post will concentrate on the North Atlantic, which has the longest record.
The official Atlantic hurricane record provided by the U.S. National Hurricane Center (HURDAT) represents the reference data base for most of the studies and contains all observed TCs, their individual tracks and intensity. The record has been extended back to 1850, and the earlier periods (until 1914) have been re-analysed in recent years. Reanalysis work continues, and updates and corrections are regularly reported.
This record contains two important abrupt inhomogeneities. The introduction of air reconnaissance flights in 1944 and the launch of the first geostationary satellite ATS-I in December 1966 mark two important improvements of measurement facilities and thus the observational coverage of the area under examination. Landsea (2007) claims a third one in 2002, since the new advanced microwave sounding unit (Quikscat) has lead to the retrospective detection of additional tropical cyclones in the last few years. Some also argue for placing the start of the satellite area later, in the mid-1970s relying on the launch of the GEOS-satellites. Furthermore, there are some changes in ship track patterns after 1914 with the opening of the Panama Canal and during the two world wars.
In addition, there are also a number of gradual observational improvements over time, e.g. the increasing quality of satellite images, or in earlier times the increase of the number of ship tracks or the growing population density on the coastlines, both of which enhance the probability that a TC would have been observed. And last but not least, there might be inhomogeneities due to the subjective component in the classification of tropical storms (the so-called Dvorak method, Dvorak 1984) which might lead to systematic differences between different forecasters. However, the homogeneous reanalysis of the last 23 years has shown that this subjectivity and improved observations has not lead to a noticeable alteration of the long-term trend for Atlantic storms (Kossin et al. 2007).
Climate change impacts on hurricanes generally focus on two key quantities, the frequency and intensity of storms. There is no reason to believe that both quantities will change similarly. In the discussion of past activity, the frequency is described by the number of tropical storms that occur annually in each basin. The maximum intensity (or even more complex metrics such as the Power Dissipation Index which integrates intensity over space and time; Emanuel 2005) is harder to measure, because it requires detailed information about the storm along the storm track over its entire lifetime. Therefore most discussions of historical trends have focused on tropical storm frequency/number or in some cases the number of intense storms (e.g. the number of major hurricanes).
Thus the key question is: how many storms did we miss in the past? Recently there have been a number of attempts to estimate this ‘undercount bias’ in the tropical storm record. These attempts have included:
- reconstruction of the observational bias by relating past observation density (e.g. ship tracks) to modern storm tracks
- using the relation between total TC number and better known subsets of the TC record (e.g. landfalling storms)
- using relationships of known underlying variables (e.g. relevant climate indices) to annual TC numbers to create a ‘predicted’ TC record, and compare it to the observed record.
All these approaches have a common caveat, namely the assumption that the relationships they rely upon are constant over time. The validity of this assumption therefore has to be examined in any studies using such approaches. Let us consider some recent such studies:
(1) Landsea [2007] performed a simple analysis to estimate the observational bias for the time from 1900 until the begin of the satellite period in 1966. He examined the percentage of tropical cyclones that struck land (PTL) and notices a considerable difference between the time periods 1900-1965 (pre-satellite period, PTL=75%) and 1966-2006 (PTL=59%). He suggests that this difference indicates an underestimation of about 2 tropical cyclones per year before 1965.
Unfortunately, Landsea does not discuss the evolution of PTL before 1900 (left side of the red dashed line in Figure 1). If PTL really is a proxy for underreporting due to decreasing observation density, PTL should further increase before 1900. However, there is a decrease. The period 1851-1899 has an average PTL of 67%, the period 1851-1885 even has an average PTL of 61%, which is not significantly different from the satellite period after 1966 (59%).

Figure 1. Percentage of all reported tropical storms, subtropical storms, and hurricanes that struck land 1851-2006. Extension of Fig. 2b in Landsea [2007].
Therefore it is questionable if PTL really is a reliable proxy of underreporting. One might argue, as Landsea implicitly does, that after 1900 the population density on the coasts and islands was high enough to catch all tropical cyclones, and the underreporting is only due to decreasing density of shipping tracks, while before 1900 also some tropical cyclones that struck land were missed. However, before 1900 not only population density but also shipping track density was lower and therefore PTL likely should be about at the same level as after 1900 but not significantly lower. In addition, Landsea contradicts that argument himself by stating that even in 2005 a retrospective analysis reveals that there was a tropical cyclone that made landfall in a sparsely populated area and was therefore not initially included as a landfalling storm.
Moreover, in another study (Holland, 2007) it has been shown, that the natural variability of TC numbers is different for different regions of the tropical Atlantic. Therefore, the proportion of TCs over the open sea also varies naturally, altering the landfall proportion for reasons unrelated to observation bias. Thus PTL seems not to be a good proxy for observational biases. The study showed that the decrease in PTL in the 1960s is mainly due to a decrease of TC number in the Caribbean and the Gulf of Mexico. Since these regions are well observed by dense ship tracks and a number of islands, this decrease is very unlikely to be mainly an observational bias.
(2) Sticking to ship tracks, Chang and Guo (2007) performed a different type of analysis: They compared the ship tracks of the years before the satellite era with TC tracks of recent years. For example, they took the ship tracks of the year 1917 and overlaid the TC tracks of 1999 and determined how many of the 1999 tropical cyclones would have been observed if the ships had navigated as in 1917.
By comparing the years before the satellite era to all the ‘satellite’ years (after 1965) they obtained statistics for how many TCs would likely have been missed in earlier years if the distribution of TC tracks had been similar to that during the satellite period. In this way they estimated a TC undercount of about 2 per year in the period 1903-1914, 1-2 per year 1915-1925 and of less than 1 per year from 1925-1965.
The adequacy of this estimation depends on several assumptions: a) that the distribution of the hurricane tracks is about the same in the satellite period than in the periods before. As we have seen before (Holland 2007), there are shifts in the regional distribution over the 20th century. This could influence the estimation of underreporting both in a positive or negative way; b) that all landfalling storms have been detected correctly. It is likely that some of the landfalling storms (or their true strength) might have been missed, which would bias the estimate artificially low; c) that ships did not circumnavigate the storms (e.g. on the basis of predictions). If this was the case, the undercount estimate might be too conservative; d) that ships measured wind correctly. Because this error is random, a significant systematic bias of the undercount estimate is unlikely; and e) that all ship tracks are recorded in the database used. There may have been other, unrecorded ship tracks, which might have detected a storm missed by known ship tracks. This would lead to an exaggerated undercount estimate. Altogether these assumptions likely tend to somewhat underestimate the undercount.
(3) In a third study (Mann et al. 2007; in full disclosure, I was a co-author of this paper), an alternative approach was used employing the statistical relationship between Atlantic TC numbers and three climate variables influencing Atlantic TC activity (1. August-October sea surface temperatures over the main development region (“MDR”); 2. the El Niño/Southern Oscillation, and 3. the North Atlantic Oscillation) during the modern period of reconnaissance flights and satellite observations (1944-2006). This relationship was then used to predict TC numbers for the period 1870-1943 as would be expected from the behavior of the three climate variables used over that period. These estimates were then compared to the observed TC record, the difference providing an estimate for the underreporting. The results yielded an undercount before 1944 of 1.2 TCs per year (best estimate), with a range of 0.5-2 TC per year.
This analysis also relies on several assumptions. Namely, that a) the underlying climate variables do not contain artificial trends or other inhomogeneities that might bias the results. That the results were insensitive to using different alternative SST datasets, or switching the role of training period and prediction period (i.e. training on 1870-1943 and predicting for 1944-2006) was taken, however, as evidence against this being a significant issue. b) that there are no long-term trends in other climate variables not included in the statistical model, but that do influence TC numbers (e.g. wind shear or vertical stability) or might influence the relationships between TC numbers and the variables that are used. Although such an influence cannot be excluded, cross-checks that were performed such as statistical validation and switching the order of training and prediction intervals, seem to argue against this being a problem.
In summary, according to current knowledge, the best estimate for the underreporting bias in the hurricane record seems to be about one tropical cyclone per year on average over the period 1920-1965 and between one and three tropical cyclones per year before 1920. With only a few years of data available, the influence of Quikscat analyses after 2002 as discussed by Landsea, is difficult to as yet meaningfully estimate.
References:
Chang, E. K. M., and Y. Guo (2007): Is the number of North Atlantic tropical cyclones significantly underestimated prior to the availability of satellite observations? Geophys. Res. Lett., 34, L14801, doi:10.1029/2007GL030169.
Holland, G. (2007): Misuse of landfall as a proxy for Atlantic tropical cyclone activity. Eos Trans. AGU, 88, 349.
Holland, G.J., and P.J. Webster (2007): Heightened tropical cyclone activity in the North Atlantic: natural variability or climate trend?
Philos. Trans. R. Soc. Ser. A, 365, 2695– 2716, doi:10.1098/rsta.2007.2083.
Kossin, J. P., K. R. Knapp, D. J. Vimont, R. J. Murnane, B. A. Harper (2007): A globally consistent reanalysis of hurricane variability and trends. Geophys. Res. Lett., 34, L04815, doi:10.1029/2006GL028836.
Landsea, C. W. (2007), Counting Atlantic Tropical Cyclones Back to 1900. EOS, 18, 197-208.
Mann, M.E., T.A. Sabbatelli, U. Neu (2007): Evidence for a modest undercount bias in early historical Atlantic tropical cyclone counts. Geophys. Res. Lett., 34, L22707, doi:10.1029/2007GL031781.
Antarctica is Cold? Yeah, We Knew That
Despite the recent announcement that the discharge from some Antarctic glaciers is accelerating, we often hear people remarking that parts of Antarctica are getting colder, and indeed the ice pack in the Southern Ocean around Antarctica has actually been getting bigger. Doesn’t this contradict the calculations that greenhouse gases are warming the globe? Not at all, because a cold Antarctica is just what calculations predict… and have predicted for the past quarter century.
It’s not just that Antarctica is covered with a gazillion tons of ice, although that certainly helps keep it cold. The ocean also plays a role, which is doubly important because of the way it has delayed the world’s recognition of global warming.
When the first rudimentary models of climate change were developed in the early 1970s, some modelers pointed out that as the increase of greenhouse gases added heat to the atmosphere, much of the energy would be absorbed into the upper layer of the oceans. While the water was warming up, the world’s perception of climate change would be delayed. Up to this point most calculations had started with a doubled CO2 level and figured out how the world’s temperature would look in equilibrium. But in the real world, when the rising level of gas reached that point the system would still be a long way from equilibrium. “We may not be given a warning until the CO2 loading is such that an appreciable climate change is inevitable,” a National Academy of Sciences panel warned in 1979.(1)
Modelers took a closer look and noticed some complications. As greenhouse gases increase, the heat seeps gradually deeper and deeper into the oceans. But when larger volumes of water are brought into play, they bring a larger heat capacity. Thus as the years passed, the atmospheric warming would increasingly lag behind what would happen if there were no oceans. In 1980 a New York University group reported that “the influence of deep sea thermal storage could delay the full value of temperature increment predicted by equilibrium models by 10 to 20 years” just between 1980 and 2000 A.D. (2)
The delay would not be the same everywhere. After all, the Southern Hemisphere is mostly ocean, whereas land occupies a good part of the Northern Hemisphere. A model constructed by Stephen Schneider and Thompson, highly simplified in modern terms but sophisticated for its time, suggested that the Southern Hemisphere would experience delays decades longer than the Northern. Schneider and Thompson warned that if people compared observations with what would be expected from a simple equilibrium model, “we may still be misled… in the decade A.D. 2000-2010.” (3)
The pioneer climate modelers Kirk Bryan and Syukuro Manabe took up the question with a more detailed model that revealed an additional effect. In the Southern Ocean around Antarctica the mixing of water went deeper than in Northern waters, so more volumes of water were brought into play earlier. In their model, around Antarctica “there is no warming at the sea surface, and even a slight cooling over the 50-year duration of the experiment.” (4) In the twenty years since, computer models have improved by orders of magnitude, but they continue to show that Antarctica cannot be expected to warm up very significantly until long after the rest of the world’s climate is radically changed.
Bottom line: A cold Antarctica and Southern Ocean do not contradict our models of global warming. For a long time the models have predicted just that.
(1) National Academy of Sciences, Climate Research Board (1979). Carbon Dioxide and Climate: A Scientific Assessment (Jule Charney, Chair). Washington, DC: National Academy of Sciences.
(2) Martin I. Hoffert, et al. (1980) J. Geophysical Research 85: 6667-6679.
(3) Stephen H. Schneider and S.L. Thompson (1981) J. Geophysical Research 86: 3135-3147.
(4) Kirk Bryan et al. (1988). J. Physical Oceanography 18: 851-67. For the story overall see Syukuro Manabe and Ronald J. Stouffer (2007) Journal of the Meteorological Society of Japan 85B: 385-403.
A day when Hell was frozen
I was honoured to be invited to the annual regional conference for Norwegian journalists, taking place annually in a small town called ‘Hell’ (Try Earth Google 'Hell, Norway'). During this conference, I was asked to participate in a panel debate about the theme: ‘Climate – how should we [the media] deal with world’s most pressing issue?’ (my translation from Norwegian; by the way 'Gods expedition' means 'Cargo shipment' in 'old' Norwegian dialect).
This is the first time that I have been invited to such a gathering, and probably the first time that a Norwegian journalists' conference invited a group of people to discuss the climate issue. My impression was that the journalists more or less now were convinced by the message of the IPCC assessment reports. This can also be seen in daily press news reports where contrarians figure less now than ~5 years ago. But the public seemed to think that the scientists cannot agree on the reality or cause of climate change.
I find that the revelation of a perception of the climate problem within the climate research community that doesn't match that of the general public problematic. What I learned is that this also seems to be true for the journalists: it was stated that their perception of climate change and its causes were different to the general public too.
The panel in which I participated consisted of a social/political scientist who had investigated how media deals with the issue of climate change and the public perception thereof, a science journalist, an AGW-skeptic, and myself. Despite the name of the place, the debate was fairly civil and well-behaved (although the AGW-skeptic compared climate scientists to mosquitoes, and brought up some ad hominem attacks on Dr. Pachauri).
The science journalist in the panel advocated the practice of reporting on issues that are based on publications from peer reviewed scientific literature. I whole-heartedly concur. I would also advice journalists to do some extensive search on the publication record of the individuals, and consider their affiliations – are they from a reputable place? Also, it’s recommended that they consider which journal in which the article is published – an article on climate published in the Journal of American Physicians and Surgeons is less likely to receive a review of competent experts (peers) than if it were published in a mainstream geophysics journal. Finally, my advice is to try to trace the argument back to its source - does it come from some of those think tanks? But I didn't get the chance to say this, as the debate was conducted by a moderator whose agenda was more focused on other questions.
Short of telling the journalists to start to read physics in order to understand the issues at hand, I recommended the reading of Spencer Weart’s ‘The Discovery of Global Warming’. The book is an easy read and gives a good background about the climate sciences. It also reveals that a number of arguments still forwarded by AGW-skeptics are quite old and have been answered over time. The book gives the impression of a déjà vu regarding the counter arguments, the worries, politics, and the perceived urgency of the problem. I would also strongly recommend the book for the AGW-skeptics.
One reservation I had regarding the discussion is being cut off when I get into the science and the details. I had the feeling of taking part in a football match where the referee and all the spectators were blind and then tried to convince them that I scored a goal. The problem is that people without scientific training often find it hard to judge who's right and who's wrong. It seems that communication skills are more important for convincing the general public that scientific skills. Scientists are usually not renowned for their ability to explain complicated and technical matters, but rather tend to shy off.
I’d suggest that journalists should try to attend the annual conferences such as the European (EMS) and American (AMS) meteorological societies. For learning what's happening within the research, mingling with scientists/meteorologists, and because these conferences have lot to offer media (e.g. media sessions). Just as journalists go to the Olympics, would it not be natural for journalists to attend these conferences? – but I missed the opportunity to make this suggestion.
Hell seems to be fairly dead on a Sunday afternoon. I almost caught a cold from the freezing wait for the train – although the temperature was barely -3C. This January ranked as the third warmest in Oslo, and I have started to acclimatise myself to all these mild winters (the mountain regions, however, have received an unusually large amount of snow). Our minister of finance was due to attend the meeting to talk about getting grief, but she didn't make it to Hell due to a snow storm and chaos at the air port (heavy amount of wet snow due to mild winter conditions).
The IPCC model simulation archive
In the lead up to the 4th Assessment Report, all the main climate modelling groups (17 of them at last count) made a series of coordinated simulations for the 20th Century and various scenarios for the future. All of this output is publicly available in the PCMDI IPCC AR4 archive (now officially called the CMIP3 archive, in recognition of the two previous, though less comprehensive, collections). We've mentioned this archive before in passing, but we've never really discussed what it is, how it came to be, how it is being used and how it is (or should be) radically transforming the comparisons of model output and observational data.
First off, it's important to note that this effort was not organised by IPCC itself. Instead, it was coordinated by the Working Group on Coupled Modelling (WGCM), an unpaid committee that is part of an alphabet soup of committees, nominally run by the WMO, that try to coordinate all aspects of climate-related research. In the lead up to AR4, WGCM took up the task of deciding what the key experiments would be, what would be requested from the modelling groups and how the data archive would be organised. This was highly non-trivial, and adjustments to the data requirements were still being made right up until the last minute. While this may seem arcane, or even boring, the point I'd like to leave is that just 'making data available' is the least of the problems in making data useful. There was a good summary of the process in Bulletin of the American Meteorological Society last month.
Previous efforts to coordinate model simulations had come up against two main barriers: getting the modelling groups to participate and making sure enough data was saved that useful work could be done.
Modelling groups tend to work in cycles. That is, there will be a period of a few years of development of a new model then a year or two of analysis and use of that model, until there is enough momentum and new ideas to upgrade the model and starting a new round of development. These cycles can be driven by purchasing policies for new computers, staff turnover, general enthusiasm, developmental delays etc. and until recently were unique to each modelling group. When new initiatives are announced (and they come roughly once every six months), the decision of the modelling group to participate depends on where they are in their cycle. If they are in the middle of the development phase, they will likely not want to use their last model (because the new one will almost certainly be better), but they might not be able to use the new one either because it just isn't ready. These phasing issues definitely impacted earlier attempts to produce model output archives.
What was different this time round is that the IPCC timetable has, after almost 20 years, managed to synchronise development cycles such that, with only a couple of notable exceptions, most groups were ready with their new models early in 2004 - which is when these simulations needed to start if the analysis was going to be available for the AR4 report being written in 2005/6. (It's interesting to compare this with nonlinear phase synchronisation in, for instance, fireflies).
The other big change this time around was the amount of data requested. The diagnostics in previous archives had been relatively sparse - the main atmospheric variables (temperature, precipitation, winds etc.) but not huge amounts extra, and generally only at monthly resolution. This had limited the usefulness of the previous archives because if something interesting was seen, it was almost impossible to diagnose why it had happened without having access to more information. This time, the diagnostic requests for the atmospheric, ocean, land and ice were much more extensive and a significant amount of high-frequency data was asked for as well (i.e. 6 hourly fields). For the first time, this meant that outsiders could really look at the 'weather' regimes of the climate models.
The work involved in these experiments was significant and unfunded. At GISS, the simulations took about a year to do. That includes a few partial do-overs to fix small problems (like an inadvertent mis-specification of the ozone depletion trend), the processing of the data, the transfer to PCMDI and the ongoing checking to make sure that the data was what it was supposed to be. The amount of data was so large - about a dozen different experiments, a few ensemble members for most experiments, large amounts of high-frequency data - that transferring it to PCMDI over the internet would have taken years. Thus, all the data was shipped on terabyte hard drives.
Once the data was available from all the modelling groups (all in consistent netcdf files with standardised names and formatting), a few groups were given some seed money from NSF/NOAA/NASA to get cracking on various important comparisons. However, the number of people who have registered to use the data (more than 1000) far exceeded the number of people who were actually being paid to look at it. Although some of the people who were looking at the data were from the modelling groups, the vast majority were from the wider academic community and for many it was the first time that they'd had direct access to raw GCM output.
With that influx of new talent, many innovative diagnostics were examined. Many, indeed, that hadn't been looked at by the modelling groups themselves, even internally. It is possibly under-appreciated that the number of possible model-data comparisons far exceeds the capacity of any one modelling center to examine them.
The advantages of the database is the ability to address a number of different kinds of uncertainty, not everything of course, but certainly more than was available before. Specifically, the uncertainty in distinguishing forced and unforced variability and the uncertainty due to model imperfections.
When comparing climate models to reality the first problem to confront is the 'weather', defined loosely as the unforced variability (that exists on multiple timescales). Any particular realisation of a climate model simulation, say of the 20th Century, will have a different sequence of weather - that is, the weather pattern on Jan 31, 1967 in one realisation will be uncorrelated to the weather pattern on Jan 31, 1967 in another realisation, even though each run has the same climate forcing (increases in greenhouse gases, volcanoes etc.). There is no expectation that the weather in any one model will be correlated to that in the real world either. So any comparison of climate models and data needs to estimate the amount of change that is due to the weather and the amount related to the forcing. In the real world, that is difficult because there is certainly a degree of unforced variability even at decadal scales (and possibly longer). However, in the model archive it is relatively easy to distinguish.
The standard trick is to look at the ensemble of model runs. If each run has different, uncorrelated weather, then averaging over the different simulations (the ensemble mean) gives an estimate of the underlying forced change. Normally this is done for one single model and for metrics like the global mean temperature, only a few ensemble members are needed to reduce the noise. For other metrics - like regional diagnostics - more ensemble members are required. There is another standard way to reduce weather noise, and that is to average over time, or over specific events. If you are interested in the impact of volcanic eruptions, it is basically equivalent to run the same eruption 20 times with different starting points, or collect together the response of 20 different eruptions. The same can be done with the response to El Niño for instance.
With the new archive though, people have tried something new - averaging the results of all the different models. This is termed a meta-ensemble, and at first thought it doesn't seem very sensible. Unlike the weather noise, the difference between models is not drawn from a nicely behaved distribution, the models are not independent in any solidly statistical sense, and no-one really thinks they are all equally valid. Thus many of the pre-requisites for making this mathematically sound are missing, or at best, unquantified. Expectations from a meta-ensemble are therefore low. But, and this is a curious thing, it turns out that the meta-ensemble of all the IPCC simulations actually outperforms any single model when compared to the real world. That implies that at least some part of the model differences is in fact random and can be cancelled out. Of course, many systematic problems remain even in a meta-ensemble.
There are lots of ongoing attempts to refine this. What happens if you try and exclude some models that don't pass an initial screening? Can you weight the models in an optimum way to improve forecasts? Unfortunately, there doesn't seem to be any universal way to do this despite a few successful attempts. More research on this question is definitely needed.
Note however that the ensemble or meta-ensemble only gives a measure of the central tendency or forced component. They do not help answer the question of whether the models are consistent with any observed change. For that, one needs to look at the spread of the model simulations, noting that each simulation is a potential realisation of the underlying assumptions in the models. Do not - for instance, confuse the uncertainty in the estimate of the ensemble mean with the spread!
Particularly important simulations for model-data comparisons are the forced coupled-model runs for the 20th Century, and 'AMIP'-style runs for the late 20th Century. 'AMIP' runs are atmospheric model runs that impose the observed sea surface temperature conditions instead of calculating them with an ocean model, optionally using other forcings as well and are particularly useful if it matters that you get the timing and amplitude of El Niño correct in a comparison. No more need the question be asked 'what do the models say?' - you can ask them directly.
The usefulness of any comparison is whether it really provides a constraint on the models and there are plenty of good examples of this. What is ideal are diagnostics that are robust in the models, not too affected by weather, and can be estimated in the real world e.g Ben Santer's paper on tropospheric trends, the discussion we had on global dimming trends, and the AR4 report is full of more examples. What isn't useful are short period and/or limited area diagnostics for which the ensemble spread is enormous.
CMIP3 2.0?
In such a large endeavor, it's inevitable that not everything is done to everyone's satisfaction and that in hindsight some opportunities were missed. The following items should therefore be read as suggestions for next time around, and not as criticisms of the organisation this time.
Initially the model output was only accessible to people who had registered and had a specific proposal to study the data. While this makes some sense in discouraging needless duplication of effort, it isn't necessary and discourages the kind of casual browsing that is useful for getting a feel for the output or spotting something unexpected. However, the archive will soon be available with no restrictions and hopefully that setup can be maintained for other archives in future.
Another issue with access is the sheer amount amount of data and the relative slowness of downloading data over the internet. Here some lessons could be taken from more popular high-bandwidth applications. Reducing time-to-download for videos or music has relied on distributed access to the data. Applications like BitTorrent manage download speeds that are hugely faster than direct downloads because you end up getting data from dozens of locations at the same time, from people who'd downloaded the same thing as you. Therefore the more popular an item, the quicker it is to download. There is much that could be learned from this data model.
The other way to reduce download times is to make sure that you only download what is wanted. If you only want a time series of global mean temperatures, you shouldn't need to download the two-dimensional field and create your own averages. Thus for many purposes, automatic global, zonal-mean or vertical averaging would have saved an enormous amount of time.
Finally, the essence of the Web 2.0 movement is interactivity - consumers can also be producers. In the current CMIP3 setup, the modelling groups are the producers but the return flow of information is rather limited. People who analyse the data have published many interesting papers (over 380 and counting) but their analyses have not been 'mainstreamed' into model development efforts. For instance, there is a great paper by Lin et al on tropical intra-seasonal variability (such as the Madden-Julian Oscillation) in the models.
Their analysis was quite complex and would be a useful addition to the suite of diagnostics regularly tested in model development, but it is impractical to expect Dr. Lin to just redo his analysis every time the models change. A better model would be for the archive to host the analysis scripts as well so that they could be accessed as easily as the data. There are of course issues of citation with such an idea, but it needn't be insuperable. In a similar way, how many times did different people calculate the NAO or Niño 3.4 indices in the models? Having some organised user-generated content could have saved a lot of time there.
Maybe some of these ideas (and any others readers might care to suggest), could even be tried out relatively soon…
Conclusion
The diagnoses of the archive done so far are really only the tip of the iceberg compared to what could be done and it is very likely that the archive will be providing an invaluable resource for researchers for years. It is beyond question that the organisers deserve a great deal of gratitude from the community for having spearheaded this. |