|
The primary authors of this tutorial are L.M. Berliner and N. Cressie.
Essence of Bayesian Reasoning
One rationale for the Bayesian approach is that it is a
mathematically rigorous
method for combining information in the presence of uncertainty.
Uncertainty is a notion that applies to observations, models, and
parameters, in differing amounts depending on the problem under
investigation. This tutorial in combination with the SSES Web-Project: Ice Streams, demonstrates how uncertainty can be
accounted for and scientific inference can be made on ice-stream
dynamics using the Bayesian approach.
The
primary computational tool used is Bayes' Theorem, which is simply a
result in probability theory relating various conditional
distributions. However, the Bayesian view of statistical modeling and
analysis involves more than simple application of probability
theory. Indeed, it is a paradigm that involves the modeling of
unknowns as random variables and using observations to
update that modeling effort.
Two primary sources of information are available
for inference on the unknown quantities of interest:
(i) observations or data that convey some information
regarding those unknowns, and
(ii) prior information, based on scientific reasoning regarding the
unknowns, as well as past experience and data.
Both sources of information are subject to uncertainties.
The basis of the Bayesian approach is that all such uncertainties
are modeled probabilistically. Specifically, a data model or
statistical distribution for the data is developed conditional upon the
unknowns; the result is denoted by
. Similarly, prior information and uncertainty are
summarized through a prior probability distribution,
denoted by
.
Construction of a data model does not require that we actually
know the values of the unknowns (fortunately!). Rather, this
distribution is viewed as a function of both the unknowns and the
data. For example, if we observe the surface velocity of an ice-stream
at some point in space, we would not believe that the resulting
observation, , is exactly equal to the true value,
. The presence of an unobservable measurement error, ,
would be anticipated. A common assumption is that
 |
(T1) |
To complete the statistical argument, we
would suggest a probability distribution for . Suppose that it is
plausible to assert that is a
Gaussian or normally distributed random variable with mean zero and a
known (say from previous calibration studies)
variance ; in shorthand,
, where is read "is distributed as". This
suggests a statistical model for , conditional upon :
 |
(T2) |
The development of a prior distribution,
, is the
subject of much of the SSES Web-Project: Ice Streams. For now, suppose
this step has been accomplished. Bayes' Theorem then yields the
solution to our problem, which is the posterior distribution of the
unknowns conditional upon the observed (i.e., now fixed) data, given by
![\begin{displaymath}[\mbox{unknowns} \mid \mbox{data}]=
\frac{[\mbox{data} \mid
\mbox{unknowns}] \, [\mbox{unknowns}]}{[\mbox{data}]}.
\end{displaymath}](Text/ice/img19.png) |
(T3) |
The denominator, , in this expression can be viewed
simply as that constant, depending on the observed data but not the
unknowns, that normalizes the numerator, ensuring that
the posterior is indeed a probability distribution (physicists may
note the analogy to the partition function in
statistical physics).
Pursuing our simple example regarding velocity, suppose that our prior information suggests that the true velocity is normally distributed:
One can show that the posterior distribution is also Gaussian (e.g., Berger 1985, pp.127-128) with mean
and variance
Inspection of these formulas indicates how Bayesian analysis provides combinations of data and prior information. First, note that the posterior mean of is a weighted average of the prior mean (or guess) and the purely-data-based estimate . Also, note that if the datum is extremely precise relative to our prior information ( is small compared with ), the posterior mean is very close to . Next, note that the posterior variance is smaller than either and ; that is, we learn more via the combinaton of data and prior than each tells us separately. (These aspects are very common, but not universal, aspects of Bayesian analysis.)
It is instructive to compare the Bayesian approach to a
conventional statistical approach. In the latter case, primary attention is
directed to the data model (T2) only.
In particular, the optimal estimator in the sense of both maximum
likelihood and minimum squared errors of is the observation .
But notice that if we simply choose
the prior distribution to be uniform or constant everywhere,
(T3) reduces to
![\begin{displaymath}[\mbox{unknowns} \mid \mbox{data}]\propto
[\mbox{data} \mid \mbox{unknowns}],
\end{displaymath}](Text/ice/img21.png) |
(T4) |
suggesting that many traditional estimation procedures are
interpretable as modes (i.e., maxima) of posterior distributions corresponding
to uniform priors. This applies quite generally to
least-squares estimates, maximum-likelihood
estimates, including spline smoothers and certain three- and
four-dimensional variational data assimilation procedures, and control
methods applied in glaciology (see below). In this sense, the
Bayesian approach is more general since geophysical knowledge
typically leads to a prior distribution
that is
not uniform. For example, the force of gravity
counterbalanced by stresses on an ice stream can be incorporated into
, making it far less uncertain than the uniform
model that yields the conventional statistical approach.
A key issue in statistical inference is that it involves not only the
production of estimates, but also quantifications of uncertainly
associated with them. The Bayesian approach responds with the
production of the full posterior distribution. By constrast,
conventional statistical approaches must rely only on the data model. In our example, the only quantification of variability in
is . In classical statistics, this quantity can be used to produce confidence intervals for the true value : , where depends on the level of confidence desired (e.g., a 95% interval arises if =1.96). However, the correct interpretation of a 95% confidence interval is that if data were gathered under identical conditions 100 times, one would expect that 95 out of the 100 confidence intervals would contain the true value . Under the conventional statistical approach, is a constant and hence is not amenable to a probability statement about its value.
Rather than velocity at a single point, we are interested in velocity
profiles or velocity fields. Suppose
that a vector of observations is available for
estimation of the surface velocity profile for locations in
some domain. By analogy to (T2), we might postulate the
data model,
as a multivariate normal (Gaussian)
distribution,
, where is the
vector of the true velocities at the observation locations. Choosing the simple
estimate to estimate is the result of using conventional
statistics, but again it accounts for
none of our scientific understanding regarding the structure of
velocity fields and, furthermore, it does not suggest a method
for estimating velocity at unobserved locations.
A purely statistical modeling approach would postulate the
true profile to be a stochastic process with some mean function and some
spatial covariance function, followed by spatial prediction of the
field for all in the domain. This would require the tasks of
formulation and estimation of the covariances. Such approaches are
known as kriging or "objective analysis" (though the modifier
"objective" is subject to misinterpretation); kriging and related
spatial prediction methods are discussed in an introductory article by
Cressie (1989). Physical-statistical
modeling offers the opportunity to develop models based on physical
reasoning, and we shall use this to build the distribution,
; see SSES
Web-Project: Ice Streams. We note that control methods (see
below) also make formal
use of a physical model, though quantification of uncertainty remains a
difficult issue. The physical-statistical Bayesian approach relies on
physically based interrelationships among (unknown and sometimes
imperfectly observed) quantities of interest, but it
focuses on probability distributional interrelationships.
To accomplish our goal of scientific inference on the unknown
quantities of interest, we use the technology of hierarchical
statistical modeling.
(Bayesian) Hierarchical Statistical Modeling
Many procedures used in the analysis of geophysical
data are Bayesian or approximately Bayesian. For example, the
Kalman filter is a Bayesian procedure (e.g., Meinhold and
Singpurwalla, 1983).
As indicated above, a variety of other
standard procedures have
Bayesian derivations and interpretations (e.g., Evensen and van Leeuwen
2000; Lorenc 1986). General presentations of Bayesian
analysis intended for geophysical audiences are given in Epstein
(1985) and Tarantola (1987).
Nevertheless, the full power of the methodology is only recently
making inroads in geophysics (e.g.,
Berliner, Wikle, and Cressie 2000;
Berliner, Milliff, and Wikle 2003).
The two breakthroughs have been in modeling (through emphasis on
Bayesian hierarchical modeling) and in computations (through
variants of Markov chain Monte Carlo or MCMC).
Consider three basic collections of variables to be modeled:
data, our observations;
processes, those physical, state variables of interest
(e.g., velocities, stresses, etc.);
and parameters, unknown physical constants and parameters
introduced in the statistical components of the model.
Hierarchical thinking suggests a model with three primary components
(e.g., Berliner 1996), namely
Data model:
,
Prior processes model:
,
Prior on parameters:
.
Though the construction of these components is not easy,
once the modeling is complete, Bayes' Theorem yields the
posterior distribution,
.
Notice that processes and parameters together make up all that is unknown
in the problem. That is,
, and Bayes' Theorem given by (T3)
can be used to obtain the posterior distribution.
In the Bayesian treatment of parameters,
even unknown constants are treated
as if they are random.
This is important since the definitions of some
model parameters may be based on approximations.
For example, flow parameters
are not really unknown physical constants, but rather functions of
other unmodeled process variables (e.g., temperature, pressure,
etc.), which introduces uncertainty.
While the Bayesian approach may
appear complicated compared to a conventional
statistical approach, its considerable advantage is that it
quantifies and manages uncertainties through to final conclusions.
Comparison to Other Methods
For further perspective on the Bayesian approach, we offer a brief comparison
to a more familiar notion of inverse analysis using control methods,
for a glaciological problem described
in MacAyeal (1989). In their analysis of data from the NE Greenland Ice
Stream, Joughin, Fahnestock, MacAyeal, Bamber, and Gogineni (2001)
rely on a model that "determines the velocity and stress fields as
functions of ice thickness, surface elevation" and other
inputs. In the SSES Web-Project: Ice Streams, the same ice
stream is studied with updated
data along a transect down its center. Though their model is more
complicated, they have a central step that
is parallel to the use of (13) and
(14) in SSES
Web-Project: Ice Streams, using
"forward models" that rely on ice thickness and surface elevation.
Joughin et al. (2001) proceed "to find the model parameters (e.g.,
basal shear stress)
that minimize[s] the misfit between a model-derived velocity field and
its observed counterpart." In our context and notation, they find
, where is similiar to the model in
(14), and minimize it over the unknown
quantities.
One crucial point of departure from the fully Bayesian
approach is that in their analysis, no formal adjustments are made to
account for the fact that the surface and thickness are unknown, and
merely estimated by observations.
In a Bayesian analysis, the surface and thickness are random
quantities with posterior distributions based on the observations, but
sensitive to the uncertainties in those data. And, a
Bayesian analysis enables one to explicitly account for spatial
dependencies.
The second key
difference between the two approaches is that the Bayesian
approach does not rely solely on the
minimization of a cost function (in this case
) to
produce estimates of parameters. Instead, the full distribution of the
unknowns is developed. Summaries of the final posterior
distribution such as its mean or mode can be computed, and
we can also ascribe uncertainty
measures to the results, produce realizations from the posterior, and
so forth. To emphasize the point further, control
methods of inverse modeling produce posterior modes from
posterior distributions based on relatively vague priors. That is,
conventional statistical approaches are special cases of
an approximate Bayesian analysis.
We note that the uncertainty management of a Bayesian analysis
should not be expected to come for free. Bayesian modeling is
generally more complex, since inverse probability analysis (a
historical name for Bayesian analysis) may require more assumptions than
inverse estimation. Bayesian computations may also seem more
complicated, though nonlinear optimization is seldom easy, with
results that may not be stable. Indeed, some of the apparent complexities
in either the Bayesian approach or the conventional statistical
approach are arguably often matters of experience and
tradition.
Acknowledgments
The preparation of this tutorial was supported by the National Science
Foundation, Office of Polar Programs and Probability and Statistics
Program, under Grant No. 0229292.
|
|
Berger, J.O. (1985). Statistical Decision Theory and Bayesian
Analysis. Springer-Verlag, New York.
Berliner, L.M. (1996). Hierarchical Bayesian
time series models. In Maximum Entropy and Bayesian Methods, K. Hanson and R. Silver (Eds.),
Kluwer Academic Publishers, 15-22.
Berliner, L.M., Wikle, C.K., and Cressie, N. (2000).
Long-lead prediction of Pacific SST's via Bayesian dynamic modeling.
Journal of Climate, 13, 3953-3968.
Berliner, L.M., Milliff, R.F., and Wikle, C.K. (2003).
Bayesian hierarchical modeling of air-sea interaction.
Journal of Geophysical Research, 108(C4), 3104,
doi:10.1029/ 2002JC001413.
Cressie, N. (1989). Geostatistics. American Statistician,
43, 197-202.
Epstein, E.S. (1985). Statistical Inference and
Prediction in Climatology: A Bayesian Approach. American Meteorological
Society, Boston, MA.
Evensen, G. and van Leeuwen, P.J. (2000). An ensemble Kalman smoother
for nonlinear dynamics. Monthly Weather Review, 128, 1852-1867.
Joughin, I., Fahnestock, M., MacAyeal, D., Bamber, J.L., and Gogineni, P. (2001). Observation and analysis of ice flow in the largest
Greenland ice stream. Journal of Geophysical Research, 106(D24), 34,021-34,034.
MacAyeal, D.R. (1989). Large-scale flow over a viscous basal
sediment: Theory and application to Ice Stream B, Antarctica.
Journal of Geophysical Research, 94(B4), 4071-4088.
Meinhold, J. and Singpurwalla, N. (1983). Understanding the Kalman
filter. American Statistician, 37, 123-127.
Tarantola, A. (1987).
Inverse Problem Theory: Methods for Data Fitting and Model
Parameter Estimation. Elsevier, New York.
|