SSES
Research
Massive data
STB
West Nile
CMCKriging
Preprints
Teaching
Web-Projects
Events
People
Archive
Links
THE SSES PROGRAM
Massive Spatio-Temporal Datasets


Change in influenza risk
Spatial prediction (using Fixed Rank Kriging) of log aerosol optical depth (AOD) from the MISR instrument on the Terra satellite.

Optimal Mapping when Datasets are Massive
Project Overview
Spatial statistics for large spatial datasets is challenging. The size of the dataset, n, causes problems in computing optimal spatial predictors such as kriging, since its computational complexity is on the order of the cube of n. In addition, a large dataset is often defined on a large spatial domain, so that the spatial process of interest typically exhibits nonstationary behavior over that domain. In this research, a family of nonstationary covariance functions is defined using a set of basis functions that is fixed in number, which is motivated by a spatial random effects (SRE) model. This leads to a spatial prediction method we call Fixed Rank Kriging (FRK). FRK relies on computational simplifications when n is large, for obtaining the spatial best linear unbiased predictor (BLUP) and its mean squared prediction error for a hidden spatial process. A weighted-least-squares method is derived to estimate the covariance-function parameters, and these are substituted into the FRK equations. The article, "Fixed rank kriging for very large spatial datasets" by Noel Cressie and Gardar Johannesson appears in 2008 in the Journal of the Royal Statistical Society, Series B.

A related approach can be taken for spatio-temporal datasets, where the SRE model becomes a spatio-temporal random effects (STRE) model. Here, the hidden random effects are assumed to evolve dynamically. This results in a filtering methodology for massive data, which we call Fixed Rank Filtering (FRF); Fixed Rank Smoothing and Fixed Rank Forecasting can also be derived. The article, "Using temporal variability to improve spatial mapping with application to satellite data" by E.L. Kang, N. Cressie, and T. Shi, will apear in the Canadian Journal of Statistics, in 2010.

The dataset analyzed the in the FRF paper referred to above is available for download. Level-2 aerosol data are collected at high spatial resolution, 17.6 km x 17.6 km. Level-3 data products are generated from level-2 data at a much lower spatial resolution (0.5 deg. x 0.5 deg.), by averaging level-2 observations falling in the level-3 pixels in a given time period. The dataset analyzed consists of MISR daily level-3 AOD data from July 1, 2001 through August 9, 2001. There are 720 x 300 = 259,200 level-3 pixels, but only pixels where retrievals are obtained and which are on the satellite's orbit have AOD data. In the FRF paper, a rectangular region D between longitudes -125 deg. and +3 deg., and between latitudes -20 deg. and +44 deg., are chosen. This covers North and South America, the western part of the Sahara desert in Africa, the Iberian Pennisula in Europe, and parts of the Atlantic and Pacific Oceans; it was chosen because of expected aerosol activity coming from the Sahara Desert.

Click here for the MISR AOD data used in the FRF paper.

Click here for more complete MISR AOD data.

In the Technical Report ``Fixed-rank filtering for spatio-temporal data'' by N. Cressie, T. Shi, and E.L. Kang, a simulation experiment is given that compares FRF to FRK. The code used to perform the experiment is available:

Click here for simulation-expereiment code to compare FRF with FRK.