This Web-Project describes spatial and spatio-temporal statistical methodology for mapping CO2 globally, based on very-large-to-massive remotely sensed datasets.
The Global Carbon Cycle
The global carbon cycle describes where carbon is stored and the movement of carbon between these reservoirs. Atmospheric carbon in the form of CO2 has risen steadily since the industrial revolution (when it was about 280 ppm) to a current level of about 385 ppm; it was about 315 ppm in 1960. The rate of increase would certainly have been greater had it not been for the uptake of CO2 by the oceans and vegetation/soil (i.e., sinks). Sources of CO2 include anthropogenic emissions (e.g., burning coal, oil, or natural gas) and fires. Of the approximately 8 Gt per year that enters the atmosphere, about half is anthropogenic. About 4 Gt stays in the atmosphere, and the other 4 Gt is absorbed roughly equally by the oceans and terrestrial processes. This yearly increase of 4 Gt of atmospheric CO2 is almost certainly unsustainable in the long term.
Along with water vapor and other gases, CO2 traps the sun's energy and warms the earth's surface and lower atmosphere. This global warming will likely lead to very complex changes in our climate. The consequences could be enormous, displacing hundreds of millions of people due to melting ice caps and sea-level rise, highly variable weather events, and a scarcity of potable water due to droughts and melting glaciers.
Mitigation for and adaptation to climate change requires a deep understanding of the drivers of climate. However, atmospheric CO2's spatio-temporal cycle is largely unknown at scales where mitigation might be possible. Further, climate models require such knowledge for accurate projections into the future.
Statistical Remote Sensing
Remote sensing is an area of environmental science that is more than building an instrument and putting it in orbit. Hardware is useless without software, and an instrument on a satellite platform is scrap metal without algorithms to process the raw radiances and mapping technologies to give a noise-filtered complete map of the phenomenon being studied. Often, such a map becomes “data,” and these derived data have derived uncertainties associated with them (e.g., spatio-temporal variances and covariances). When remote sensing is put into a rigorous statistical framework, the information content in the raw data increases dramatically. Scientific inferences can be made, and information becomes knowledge. The study of global atmospheric CO2, informed by both data and scientific knowledge, is an important scientific endeavor in which spatial and spatio-temporal statistics can play an important role.
Spatial Analysis of CO2
We begin by describing the spatial-only analysis of large, global datasets of CO2. The goal of most spatial analyses is to predict the distribution of a process of interest based on a number of observations containing measurement error. Traditional spatial statistics requires inversion of the covariance matrix of the data, which requires on the order of n3 computations when there are n observations in the dataset. This is computationally feasible for n up to a few thousand, but it is not feasible when n is in the tens of thousands and above. In such large-data situations, one has to resort to dimension reduction. We do this here by making use of a Spatial Random Effects (SRE) model (Cressie and Johannesson, 2006, 2008), in which the process of interest is modeled as a linear combination of spatial basis functions plus a fine-scale-variation term. Inference using this geostatistical model, termed Fixed Rank Kriging (FRK) by Cressie and Johannesson (2006, 2008), is then feasible even for very large n (up to a million or more), as long as the number of spatial basis functions (i.e., the number of random effects) is much smaller than the number of observations.
From a noisy and incomplete dataset of CO2, complete global maps of predictions and accompanying standard errors can be obtained:
Parameter estimation or inference can be carried out from an empirical-Bayes or a fully Bayesian perspective. Cressie and Johannesson (2008) and Kang, Cressie, and Shi (2010) describe binned-method-of-moments parameter estimation, and Katzfuss and Cressie (2009) describe maximum-likelihood estimation via an EM algorithm. These two estimation methods are also described and compared in the Tutorial on Fixed Rank Kriging (FRK) of CO2 Data.
Bayesian estimation (Kang and Cressie, 2011) provides more accurate uncertainty quantification, but it is also more computationally intensive.
Spatio-Temporal Analysis of CO2
We now discuss extensions of the spatial-only statistical methods
given above to spatio-temporal models. Satellites only cover certain
tracks on the globe on any given day, and so the tracks covered on
previous days may be highly complementary. We could analyze the
data spatially in a sequence of moving windows, which was done for Arctic
carbon dioxide using a 9-day window and the SRE model. However,
the (spatio-)temporal dependence in the data can be incorporated
statistically by specifying the dynamical evolution of the spatial
random effects over time. This is done through a first-order vector
autoregressive model (VAR(1)) that is defined by a propagator matrix
and an innovations variance-covariance matrix. The resulting
spatio-temporal random effects (STRE) model (Cressie, Shi, and Kang,
2010) can then be used to exploit both spatial and temporal
dependencies in the process when predicting its distribution at
all locations in space and time.
Cressie, Shi, and Kang (2010) called inference in the STRE model Fixed Rank Filtering (FRF) when predictions are made for the current day based on data on the current day and previous days, and Fixed Rank Smoothing (FRS) when predictions are made retrospectively based on all data obtained during a time period of interest. Again, both empirical-Bayes inference and fully Bayesian inference are possible. From the empirical-Bayes perspective, binned-method-of-moments estimation of the parameters of the STRE model is described in Kang, Cressie, and Shi (2010), and EM estimation is described in Katzfuss and Cressie (2011); we refer to this as EM-FRS. Fully Bayesian inference for the STRE model is described in Katzfuss and Cressie (2012); we refer to this as FB-FRS. The prior distribution on the propagator matrix (which describes the temporal evolution of the process on the reduced space) in the latter article can be rather involved; we provide some results here based on the assumption that the propagator matrix is diagonal, but not the identity matrix. The prior here is a special case of the prior in Katzfuss and Cressie (2012), although it still results in complex spatio-temporal dependence.
The Atmospheric InfraRed Sounder (AIRS) on board NASA's Aqua satellite measures spectra from which mid-tropospheric (approximately between altitudes of 2km and 8km) CO2, at roughly 1:30 pm local time, can be retrieved using elaborate algorithms.
The resulting data is provided here and was originally obtained from the
AIRS (NASA) website. The unit of measurement is parts per million (ppm).
Figure 2: Spatio-temporal prediction of CO2 using AIRS data (top): Results from EM-FRS (left) and FB-FRS (right)
Combining Observations from Two Instruments: Spatio-Temporal Data Fusion
Up to this point, we have assumed that only one source of observations is available
(i.e., the AIRS instrument). However, more satellite instruments (e.g., the one on board the Japanese
GOSAT satellite) capable of retrieving CO2 are now in operation, and NASA's OCO-2 is under
development. It seems natural to combine the information from these two instruments to obtain better
predictions of the true CO2 field. However, the two instruments have different sensitivities to
CO2 in the atmospheric column, different measurement-error properties, and different footprints,
so making predictions based on the data observed by both instruments is not trivial. The general problem
of (spatial-only) data fusion is discussed in Nguyen et al. (2010). Nguyen et al. (2012) extends the data-fusion methodology to the spatio-temporal case and applies it to CO2.
Datasets and Matlab Code
For the spatial-only case, this .zip file
contains the Matlab functions FRK.m, EM.m (EM algorithm), variogram_estimation.m (variogram estimation of the fine-scale-variation variance and the measurement-error variance), and binest.m (binned method-of-moments estimation). The file called xco2analysis.m contains code for the analysis of the synthetic CO2 dataset (Figure 1) using these Matlab functions; click here to download the synthetic set of CO2 “measurements.” For more details, see the Tutorial on Fixed Rank Kriging (FRK) of CO2 Data.
For the spatio-temporal case, this .zip file contains the spatio-temporal functions EM_FRS.m (which implements the EM algorithm), and FRS_vxi.m (which implements the FRS procedure, for the case with heterogeneous fine-scale-variation variance). Click here to download the 16 days of AIRS CO2 data used in Figure 2.
The research was partially supported by NASA under grant NNX08AJ92G issued through the ROSES Carbon Cycle Science Program and grant NNH08ZDA001N issued through the Advanced Information Systems Technology ROSES 2008 Solicitation, and by the Office of Naval Research Grant N00014-08-1-0464.
Cressie, N., and Johannesson, G. (2006), Spatial prediction of massive datasets, in Mastering the Data Explosion in the Earth and Environmental Sciences: Proceedings of the Australian Academy of Science Elizabeth and Frederick White Conference, Canberra, Australia: Australian Academy of Science.
Cressie, N., and Johannesson, G. (2008), Fixed rank kriging for very large spatial data sets, Journal of the Royal Statistical Society, Series B, 70, 209–226.
Cressie, N., Shi, T., and Kang, E. L. (2010), Fixed rank fitering for spatio-temporal data, Journal of Computational and Graphical Statistics, 19, 724–745.
Kang, E. L., and Cressie, N. (2011), Bayesian inference for the Spatial Random Effects model, Journal of the American Statistical Association, 106, 972-983.
Kang, E. L., Cressie, N., and Shi, T. (2010), Using temporal variability to improve spatial mapping with application to satellite data, Canadian Journal of Statistics, 38, 271–289.
Katzfuss, M., and Cressie, N. (2009), Maximum likelihood estimation of covariance parameters in the spatial-random-effects model, Proceedings of the 2009 Joint Statistical Meetings. Alexandria, VA: American Statistical Association, 3378-3390.
Katzfuss, M., and Cressie, N. (2011), Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets, Journal of Time Series Analysis, 32, 430-446.
Katzfuss, M., and Cressie, N. (2012), Bayesian hierarchical spatio-temporal smoothing for massive datasets, Environmetrics, 23, 94-107.
Nguyen, H., Cressie, N., and Braverman, A. (2010), Spatial statistical data fusion for remote sensing applications. Department of Statistics Technical Report No. 849, The Ohio State University, Columbus, OH.
Nguyen, H., Katzfuss, M., Cressie, N., and Braverman, A. (2011). Spatio-temporal data fusion for very large remote sensing datasets, Department of Statistics Technical Report No. 859, The Ohio State University, Columbus, OH.