
|
|
Tutorial on Bayesian Statistics for Geophysicists
|
|
Introduction
|
|
Modern studies of the behaviors of glaciers, ice sheets, and
ice streams rely heavily on both observations and physically based models.
Data acquired via
remote sensing provide critical information on geometry and movement
of ice over large sections of Antarctica and Greenland.
Though these datasets are significant advances in terms of spatial
coverage and the variety of processes we can observe, the physical
systems to be modeled are nevertheless imperfectly
observed. Uncertainties associated with measurement errors
are present, and physical models are also subject to
uncertainties. Hence, there is a need for combining observations and
models in a fashion that incorporates uncertainty and quantifies its
impact on conclusions.
The goal of combining models and
observations is hardly new in glaciology, or in the broad areas
of the geosciences (e.g., data assimilation as
practiced in numerical weather forecasting).
We focus on the development of statistical
models with strong reliance on physical modeling, a strategy
Berliner (2003) called physical-statistical modeling, and then use Bayes' Theorem to make inference on all unknowns
given the data. This is different from traditional
physical modeling, perhaps with data-based parameter
estimates, and traditional statistical modeling, perhaps relying on vague,
qualitative physical reasoning.
In the paragraphs that follow, we develop statistically enhanced versions of
a simple physical model of driving stress and a familiar model for velocity
based on stress. This presentation is based on the preprint, Berliner,
Jezek, Cressie, Kim, Lam, and van der Veen (2005).
Glaciological Motivations
Since glaciers flow under the force of gravity, important factors in
determining velocities include quantities such as
the ice thickness acting in combination
with forces acting along the sides and at the base of the glacier and
under the constraints of the constitutive relationship. In particular,
driving stress is associated with gravitational force acting on the
ice. Hence, spatial variations in the stress arise from longitudinal
gradients in ice-surface elevation and ice thickness. Based on existing
theory (e.g., Paterson 1994), we consider equating driving stress to
stresses acting on the sides and base of the glacier.
A simple approximation equates
driving stress along the flow to basal shear stress as follows:
 |
(1) |
where is ice-surface elevation, is the ice thickness,
is the density of ice,
and is the gravity constant.
Under these assumptions, it is reasonably straightforward to estimate
directly driving stress based on observations of and .
However, even though estimation may be relatively
straightforward, assessment of uncertainties in such estimates can be
difficult. Furthermore, a
concern in estimating driving stress from geometry is
that the reliance on the slope of the upper ice surface in
(1) implies that results are very sensitive to
small-scale variations in surface topography, and to small-scale,
perhaps unimportant variations in ice
thickness. From (1), there is no
theoretical requirement that driving stress be spatially averaged,
however it
is usually calculated over horizontal distances of a few ice
thicknesses or so to eliminate small-scale flow features not important
to the large-scale flow (e.g., Kamb and Echelmeyer 1986). Indeed,
if averaging is not done, the driving stress estimates exhibit
unreasonably large variations.
Data
We assembled surface topography and ice thickness observations for a
portion of the Northeast Ice Stream in Greenland; see Figure 1. The data were gathered as part of the Program for Arctic Climate
Regional Assessments (PARCA). Surface topography and ice thickness
were sampled every few hundred meters using equipment mounted on the
Wallops Flight Facility P-3 aircraft. Surface velocity data were
calculated by Ian Joughin and provided as part of the PARCA
dataset. The three derived datasets are: (Figure 2), surface topography;
(Figure 2), basal topography; and (Figure 3), surface velocities.

Figure 1. NE Ice Stream Showing PARCA Flight Line.

Figure 2. Surface and Basal Elevation.

Figure 3. Surface Velocities.
Bayesian Analysis
The primary output of a Bayesian analysis is a posterior distribution,
namely, the joint probability
distribution for unknown quantities conditional on the observed
data. Even in our simple illustration for the Northeast Ice Stream in Greenland, we have on the order of 8,000
unknowns, so explicit presentation of their joint distribution is
not feasible. Hence, a key aspect of Bayesian analysis in such
high-dimensional settings is the ability to generate realizations (or
ensembles) from the posterior distribution; the posterior is then studied
through statistical summaries of such ensembles. A separate
webpage can be viewed for an introduction to Bayesian Statistics:
Tutorial on Bayesian Statistics for Geophysicists
|
|
Physical-Statisical Modeling of the NE Ice Stream
|
|
Recall that our three datasets are: , surface observations; ,
basal observations; and , velocity data. The
corresponding processes of interest
are true surface topography , true basal topography
, and true velocities , where indexes a transect down the middle of the ice
stream. There are no observations on the stresses acting
on the ice, though, as we shall see, physical relations allow us to
make inference on modeled stresses.
Physics
We incorporate three physically based models. First, following the
discussion leading to
(1), we consider the stress,
 |
(2) |
where the ice thickness is , is the density of ice,
and is the gravity constant. The negative sign present in
(1) is omitted here because we model and
velocity in the negative- direction.
In all computations, we set
and
.
Second, under a laminar-flow assumption and
treating the flow parameter as
a constant, the surface velocity is given by,
 |
(3) |
where is the sliding velocity and
is a flow parameter (e.g., Paterson 1994, p. 251,
eq. 21).
Finally, as suggested by the analysis given in Paterson (1994, p. 243,
eq. 8), we consider the following basic model for the surface:
 |
(4) |
Bayesian Hierarchical Modeling
We see from
Tutorial on Bayesian Statistics for
Geophysicists that our main tasks are the
development of the following probability distributions:
where
denotes the collection of all model parameters.
The specifications of these
probability distributions are described in
detail in
Berliner et al. (2005). Our goal is to obtain the posterior distribution
, which then can be used to
obtain the posterior distribution of stresses,
.
Our main assumption regarding the data model is that we assume it
takes the form
![\begin{displaymath}[{\bf B},{\bf S},{\bf U}\mid b,s,u,\mbox{\boldmath$\theta$}]=...
...ath$\theta$}_S]
[{\bf U}\mid u,\mbox{\boldmath$\theta$}_U]\,,
\end{displaymath}](Text/ice/img48.png) |
(5) |
where notation such as
is used to indicate those
parameters (subsets of
)
explicity appearing in the indicated models.
A possible objection one might make to (5) is
that because the
basal data is actually computed as the
differences of surface and thickness observations, the assumed conditional
independence may not hold. We checked our posterior results for
indications of
degrees of departure from this assumption and found none that would
affect our results. See Berliner et al. (2005) for morre details.
Our process model begins with a probabilistic equality (i.e., this
is not an assumption, but a fact):
![\begin{displaymath}[b,s,u \mid \mbox{\boldmath$\theta$}]= [u \mid b,s,\mbox{\boldmath$\theta$}] [b, s \mid \mbox{\boldmath$\theta$}].
\end{displaymath}](Text/ice/img50.png) |
(6) |
Then assuming that the base and the surface are independent
conditional on the model parameters, we obtain:
![\begin{displaymath}[b,s,u \mid \mbox{\boldmath$\theta$}]=
[u \mid b,s,\mbox{\bo...
...{\boldmath$\theta$}_b] [s \mid \mbox{\boldmath$\theta$}_s]\,,
\end{displaymath}](Text/ice/img52.png) |
(7) |
where again notation such as
is used to indicate
appropriate subsets of
. It is critical to note here
that we are not assuming that
the base and surface are independent. Our modeling of both the base
and surface is conditional upon smooth processes included
in definitions of
and
. Our assumption then is
that the small-scale departures
from those large-scale processes are independent.
Finally, we turn to the conditional model for in
(7). We assume that the velocity
profile depends on the base and surface
only through their respective smoothed versions
and
; that is,
![\begin{displaymath}[u \mid b,s,\mbox{\boldmath$\theta$}_b, \mbox{\boldmath$\thet...
...$}_b, \mbox{\boldmath$\theta$}_s, \mbox{\boldmath$\theta$}_u].
\end{displaymath}](Text/ice/img55.png) |
(8) |
Once we get this far in the analysis, we assume
that the relationship is deterministic (this assumption could be
relaxed). That is, the probability
distribution on the right-hand side of (8) is
degenerate and is based on (3).
The parameter model (i.e., specification of prior
distributions) is given in Berliner et al. (2005).
|
|
Components of the Process Model
|
|
Bed Model
We choose to use wavelets to model the true basal topography (i) because
of their flexibility in representing highly variable processes, and
(ii) because we can easily control the smoothness of the fitted
wavelets. Wavelets do best
for equally spaced data where the number of data points is an integer
power of . Hence, we partition the domain of the data into bins of equal length ( ).
Let denote the 2048-dimensional vector constructed by
averaging within each bin. Note that is not observed;
define the associated basal data vector of length 2048 with
element given by the simple arithmetic average
of those basal observations lying in bin . The data model we propose for
in (5) implies that
the elements of are conditionally independent,
each being normally distributed with
mean equal the corresponding element of and variance
determined by the measurement error variability of an individual
observation, denoted by , and the number of observations
in the corresponding bin (see Berliner et al., 2005).
After converting to a discrete wavelet form and fixing the resolution,
we obtain a linear model for
in
(7); that is,
 |
(9) |
where is the matrix of discretized wavelet basis
functions, is the
vector of wavelet coefficients, is determined by
the chosen resolution,
is the correlation matrix of an
autoregressive process of order two (AR(2)) with variance
, and
.
The selection of an AR(2) error model to account for
spatial dependence among these model errors (i.e., local variations in
basal topography) was based on
preliminary data analysis and practicality;
our Bayesian computations
require repeated inversion of a
matrix involving the inverse of
, which
is simple for an AR(2) process.
Different choices of k lead to different resolutions of the mean basal
elevation; we performed analyses for four resolutions,
, corresponding to
and coefficients in (9); see Berliner et
al. (2005) for more details.
Surface Model
Our modeling strategy for
in
(7), separates the large-scale
and small-scale behaviors of the surface. We suppose
 |
(10) |
where the large-scale surface is given by a parameterized function
, assumed known up to a low-dimensional set of parameters,
and is a zero-mean spatial stochastic process,
described in Berliner et al. (2005). To
model , we rely on the physical model
(4); that is, we assume that
 |
(11) |
where , and are treated as the unknown parameters. In the analysis
here, we set , though we could model as an unknown as well.
We use only the large-scale surface (11) to compute
ice thickness, the derivative, and hence the stress in (2).
Enhancements that incorporate will be explored elsewhere.
Nevertheless, the presence of is important when determining
the data model in (5). Under the modeling strategy
that uses (11) to obtain the stress, we need
, which is
given in Berliner et al. (2005).
Velocity Model
Our data model
is again a basic
measurement-error model; that is, we assume
that conditional on the true velocities, the data vector has a
Gaussian distribution with mean equal to the vector of velocities at
the corresponding locations, common variances , and are
independent (see Berliner et al., 2005).
Turning to the process model, recall from
(8) that we want
based on
(2) and (3).
We consider (i) the corresponding smoothed versions of ice thickness,
defined as
 |
(12) |
where is the -dimensional vector of parameterized surface
elevations;
and (ii) similarly defined smoothed values of stress , defined as
 |
(13) |
where the right-hand side means each coordinate of the
-dimensional vector
is obtained as the elementwise
product of the corresponding coordinates
of the smoothed thickness and the derivative of the surface.
From (3), we should model , the vector of true
velocities at the observation locations, as a linear function
of the corresponding coordinates of
times the powers of coordinates of
.
But, in preliminary data analyses, we noted that at least two
models (one for small and another for
large ) are needed. Let be an unknown change point,
and consider different linear functions above and below the change
point. Finally, the model for the velocity data vector is
 |
(14) |
where the subscripts 1 and 2 indicate the varying
dimensions of the vectors (a vector with all
elements equal to 1) and
,
depending on the value of the
change point , and are errors primarily
representing measurement error associated with the velocity data. See Berliner et al. (2005) for more details.
|
|
Bayesian Calculations
|
|
A separate webpage can be viewed for an introduction to Bayesian
statistics: Tutorial on Bayesian Statistics for Geophysicists.
Though we can write down Bayes' Theorem for the posterior distribution
of all unknowns conditional on the observations, the result is
typically not computable in closed form. We use a Monte Carlo
approach that
produces an ensemble of realizations from the target
posterior distribution. The method relies on the emerging technology of Markov
Chain Monte Carlo (MCMC). The idea of MCMC is to
simulate a Markov chain that has been carefully designed so that its
stationary distribution coincides with the target posterior
distribution. It
follows that, after a burn-in or transience period, the generated
realizations of the chain comprise a simulated sample from the
posterior. Data analysis (often known as "output analysis")
is performed on this sample to produce the
desired inferences. In our case, direct use of MCMC is quite
challenging, primarily due to the nonlinearities present in (2)
and (3). Hence, we combine MCMC with the
technique of Importance Sampling Monte Carlo (ISMC). The basic idea of
ISMC begins with a setting in which direct simulation from a target
distribution is difficult or inefficient. One generates an
ensemble from another, more manageable distribution. The theory of
ISMC provides formulas for the calculation of weights that are
used to reweight the ensemble, permitting inferences relative to the
original target. General introductions to both MCMC and ISMC can be
found in Robert and Casella (1999). An illustration of these
technologies in a geophysical problem is given in Berliner, Milliff,
and Wikle (2003).
An outline of the calculations used here goes as follows. We first run
separate, independent MCMC algorithms for the basal model and the
surface model. These runs produce ensembles from the posterior distributions
and
. Due to the various
conditional-independence assumptions described above, these ensembles
are summaries of the posterior distribution of the unknowns
conditional on the two datasets and . They are then used in
conjunction with the velocity model
(recall
(8)) to simulate velocities conditional on and
.
To incorporate the velocity data , we
reweight all of these samples using ISMC results.
|
|
Bayesian-Analysis Results for the Northeast Ice Stream, Greenland
|
|
For each of the four resolutions, Figure 4 presents
10 realizations of the smoothed base
, superimposed on the
original data.
We see that
the posterior distributions of the smoothed base are increasingly
faithful to the basal data as the resolution is increased.
We tried even higher resolution wavelets,
but detected very little difference from the results for r = 4.

Figure 4. Posterior Smoothed Basal Topographies at Each Resolution.
Shown are basal data and 10 posterior realizations of smoothed basal
topographies for (a) r = 1, (b) r = 2, (c) r
= 3, (d) r = 4.
For each of the four resolutions, Figure 5 presents
50 realizations and the posterior mean, estimated using ensembles of
size 2000, of the smoothed stresses
(recall (13)).
For each resolution, Figure 6 presents 100 realizations and
the posterior means estimated using ensembles of
size 2000; the original velocity data is also shown in each plot.
Note that the change point
at x = 77.5 km is clearly seen in these graphs.

Figure 5. Posterior Realizations of Smoothed Stress at Each Resolution.
Shown are 50 posterior realizations of smoothed (kPa) and posterior mean of based on 2000 realizations for (a) r = 1, (b) r = 2, (c) r
= 3, (d) r = 4.

Figure 6. Summaries of Posterior Distributions for Velocities.
Shown are 100 posterior realizations of velocity profiles and their
posterior means based on 2000 realizations for (a) r = 1, (b) r = 2, (c) r
= 3, (d) r = 4, as well as the original velocity data.
The estimates (i.e., posterior means) for the other parameters in
the model of the
large-scale surface elevation (11) are
, , and
.
Prior and posterior means for other key model parameters are presented in
Berliner et al. (2005).
Variance Estimation and Model Selection
We estimated as follows. For each of our 2000 simulated
ensemble members, we
compute the variance, say , where the subscript m indicates the
ensemble member) of the "residuals", namely the observed
velocity data minus the generated velocities from the m-th ensemble.
The average then provides a
posterior estimate of (due to the very large
sample sizes, the prior distribution on "washes out").
For each resolution, the resulting
estimate of is about 50, corresponding to
a standard deviation of about 7-8 m/yr.
This compares fairly well to the suggestion that most
measurement errors in velocity data are expected to be
less than 10 m/yr (Goldstein, Engelhardt, Kamb, and
Frolich 1993).
To check on the plausibility of the treatment of as a
constant, we partitioned the length of our profile into eight
subintervals and estimated as described above, but from
data restricted to the subintervals. The results are summarized in
Figure 7.
First, we find evidence that the magnitudes of
the variations around the model differ substantially on either side of
the change point. In particular, we see very large variances to the
left of the change point, suggesting a severe (and anticipated)
breakdown of the laminar-flow approximation.
Indeed, the model predicts increasing
velocities when approaching the change point from
the left and decreasing velocities to the right, whereas the velocity
observations decrease relatively smoothly from left to right
through this region.
Also, we note differences in throughout the
profile. This suggests looking more closely at local
behavior of the velocities. For example, a model with mutliple change
points (i.e., corresponding to multiple, local
values of sliding velocity and flow parameter A) could be suggested.
Beyond assessments of local model misfit, Figure 7 also
contains information regarding comparisons of the
resolutions used for basal smoothing. Focusing on the region to the right of
the change point, we note that resolution r = 2 would be the preferred
choice even though it appears to severely smooth some features of the
basal topography (recall Figure 4).

Figure 7. Local Estimates of .
Estimates of using data restricted to eight subregions for each
resolution r = 1,...,4.
Model Stability and Predictive Power
To assess both model stability and predictive power, we did two
simple experiments. First, we re-ran the velocity model
using only 20% of the velocity data (every fifth observation).
The resulting analyses are reported in Figure 8. In
comparing this figure to Figure 6, we find fairly strong
similarities, suggesting good stability and interpolation properties.
To quantify the behavior, we computed an estimate of the
predictive variance of the velocity. Specifically, for the velocity
data left out of
the analysis, we computed the average squared prediction errors
(observed value minus posterior mean). We obtained the value of
50, which compares
very well with our estimates of based on all the data.

Figure 8. Posterior Distributions for Velocities for Subsampled Data.
Bayesian analysis using only every fifth velocity data point. Shown
are 100 posterior realizations of velocity profiles and their posterior means based on
2000 realizations for (a) r = 1, (b) r = 2, (c) r
= 3, (d) r = 4, as well as the original velocity data.
Second, we again repeated the analysis leaving out some velocity data,
but in this case we omitted all observations occurring between
x = 148 km
and x = 202 km. The posterior results are shown in
Figure 9. Models for resolutions 2,3, and 4
do reasonably well even at
predicting velocities in the unobserved region.
However, the additional smoothing associated with
r = 1 leads to very poor predictions in this region.
Note that the spreads
in the ensemble members are larger than in Figure 3, reflecting
the extra uncertainty.
It is also interesting that the models appear
to systematically predict slightly larger velocities in the unobserved
range compared to both the observed velocities and the Bayesian
analyses incorporating that data, though r = 2 again does the best job
in this region.

Figure 9. Posterior Distributions for Velocities with Region
Omitted.
Bayesian analysis using no velocity data from the range x = 150
km to x = 200 km; shown are 100 realizations of
posterior velocity profiles and posterior means of velocities based on
2000 realizations for (a) r = 1, (b) r = 2, (c) r
= 3, (d) r = 4, as well as the original velocity data.
Surface Model
We now address briefly the issue of changing the amount of smoothing
of the surface. In a straightforward analysis, where we did no
Bayesian or spatial modeling,
we fit a very simple class of local smoothing models to the original
surface data. We then examined
the differences between these fits and a single smoothed function,
looking for systematic differences locally in space. The most
interesting region corresponds to
. Note that from our Bayesian analysis, there are
systematic errors in this range: from 250-300 km we overestimate the
velocity, and from 300-390 km, we underestimate the velocity. These
regions correlate very well with regions in which our smoothed surface
derivative underestimates and then overestimates, respectively,
the surface derivative obtained from the local fit. However, using
such local surface models degrades the velocity model in
other regions. Ultimately, we should smooth both the surface and the
base interactively.
Acknowledgements
This research was supported by the National Science Foundation, Office
of Polar Programs and Probability and Statistics Program, under Grant
No. 0229292.
|
|
References
|
|
Berliner, L.M. (2003).
Physical-statistical modeling in geophysics.
Journal of Geophysical Research, 108(D24),
8776, doi: 10.1029/2002JD002865.
Berliner, L.M., Jezek, K., Cressie, N., Kim, Y., Lam, C.Q., and van der Veen, C.J. (2005).
Physical-statistical modeling of ice-stream dynamics.
Department of Statistics Preprint No. 759, The Ohio State University.
Berliner, L.M., Milliff, R.F., and Wikle, C.K. (2003).
Bayesian hierarchical modeling of air-sea interaction.
Journal of Geophysical Research, 108(C4), 3104,
doi:10.1029/ 2002JC001413.
Goldstein, R.M., Engelhardt, H., Kamb, B., and
Frolich, R.M.
(1993). Satellite radar interferometry for monitoring ice sheet motion:
Application to an Antarctic ice stream. Science, 262, 1526-1530.
Kamb, B., and Echelmeyer, K.A. (1986). Stress-gradient coupling in
glacier flow: I. Longitudinal averaging of the influence of ice
thickness and surface slope. Journal of Glaciology, 32, 267-284.
Paterson, W.S.P. (1994). The Physics of Glaciers,
3rd edn. Butterworth-Heinemann, Wolburn, MA.
Robert, C.P. and Casella, G. (1999). Monte Carlo
Statistical Methods. Springer-Verlag, New York.
|
|
Links
|
|
Tutorial on Bayesian Statistics for Geophysicists
Byrd Polar Research Center
|
|