**Seminar Abstracts (spring 2010)**

No seminar

Peter Frazier, Cornell University

*Ranking and Selection of Many Alternatives using Correlated Knowledge Gradients*

**Abstract:** We consider the ranking and selection problem, in which one wishes to use a fixed sampling budget efficiently to find which of several alternatives is the best. By explicitly modeling the relationship between alternative values with a correlated Bayesian prior, sampling policies may perform efficiently even when the number of alternatives is very large. We propose the correlated knowledge-gradient sampling policy and show special cases in which it is Bayes-optimal. We apply it to two problems: drug discovery, and continuous global optimization.

Seminar is cancelled due to weather

George Fann, Oak Ridge National Laboratory

*Fast Multiresolution Methods for Density Functional Theory using Petascale Computers*

**Abstract:
**An overview of an adaptive pseudo-spectral approach for solving high dimensional integro-partial differential equations using petaflop computers will be presented. The real-analysis based algorithms are based on multiresolution analysis and low-separation rank approximations of functions and operators. We apply these methods to solve 3-D Schroedinger, Lippmann-Schwinger and Density Functional Equations to high precision without assumptions on symmetry. Each of the operators and wave-functions has its own structure of refinement to achieve and guarantee the desired finite precision. We will discuss scaling up to 100K cores on the Cray XT-5 and extensions to higher dimensional time-dependent problems. These types of problems are common in the simulations of semiconductors, quantum chemistry, molecular electronics and nuclear physics

Alexandra Kolla, Institute for
Advanced Studies

*New Techniques for Using and Approximating Graph Spectra *

**Abstract:
**
In the first part of the talk, we present new techniques for approximating a large graph with a smaller one. Namely, we show how, given a large graph G and a subgraph H of it, we can choose a very small number of edges H' out of H such that replacing H with H' does not change the spectrum of G by much. We also discuss significant implications of our techniques in speeding up linear system solvers and creating cost-efficient, well-connected networks.

In the second part of the talk, we present spectral techniques, for investigating the validity of Khot's Unique Games conjecture (UGC). UGC is one of the most central open problems in computational complexity theory. It asserts that for a certain
constraint satisfaction problem, it is NP-hard to decide whether there is a
labeling that satisfies almost all the constraints or,
for every labeling, the fraction of the constraints satisfied is very small. Since its origin, the UGC has been applied with remarkable success to prove
tight hardness of approximation results for several
important NP-hard problems such as Vertex Cover , MAXCUT.

We present a novel spectral algorithm for deciding satisfiability of Unique Games. Our algorithm, given a 1-\epsilon satisfiable instance of UG, recovers a good labelling (that satisfies more than 99% of the constraints), for a plethora of graphs, including expanders and graphs that are shown to be hard for semidefinite programming techniques.

Ciprian Crainiceanu, Johns Hopkins Unversity

*Longitudinal Functional Principal Component Analysis*

**Abstract: **We introduce models for the analysis of functional data observed at multiple
time points. The dynamic behavior of functional data is decomposed into a
time-dependent population average, baseline (or static) subject-specific variability,
longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability
and measurement error. The model can be viewed as the functional analog
of the classical mixed effects model where random effects are replaced by random
processes. Methods have wide applicability and are computationally feasible for
moderate and large data sets. Computational feasibility is assured by using principal
component bases for the functional processes. The methodology is motivated
by and applied to a diffusion tensor imaging (DTI) study designed to analyze differences
and changes in brain connectivity in healthy volunteers and multiple sclerosis
(MS) patients.

David Ruth, United States Naval Academy

*Applications of Assignment Algorithms to Nonparametric Tests for Homogeneity*

**Abstract:** Given a sequence of observations, has a change occurred in the underlying probability distribution with respect to observation order? This problem of detecting “change points” arises in a variety of applications including quality control, health diagnosis, biosurveillance, and image analysis. Detecting change points in high-dimensional settings is challenging, and most change-point methods for multi-dimensional problems rely upon distributional assumptions or the use of observation history to model probability distributions. We present new non-parametric statistical tests for change points based on non-bipartite matching techniques. These tests are capable of detecting changes of quite general nature and require no distributional assumptions. The most powerful among these new tests compares favorably to parametric competitors even when parametric assumptions are met.

*SPRING BREAK*

Corina Tarnita, Harvard University

**Abstract:**

Matthew Macauley, Clemson University

*Graph Dynamical Systems -- A Mathematical Framework for Interaction-Based Systems*

**Abstract:**
Many complex systems can be modeled in an agent-based manner, where the dynamics are governed by purely local interactions. Analyzing these large systems is difficult due to non-homogeneity and the lack of mathematical tools used to analyze traditional continuous models. As a result, agent-based systems are often analyzed using computer simulations, with little or no formal validation of the results. Graph dynamics systems (GDSs) represent a natural framework for the modeling and simulation of such systems, and are an alternative to more traditional continuous models. I will give an overview of some basis GDS theory, and give three example applications of where a GDS framework is used, including traffic simulations, epidemiology, and gene annotation algorithms. This is ongoing work with the Network Dynamics and Simulation Science Laboratory (NDSSL) at Virginia Tech.

Jonathon Peterson, Cornell University

*Traps, slowdowns, and bridges of one-dimensional transient random walks in a random environment *

**Abstract:**
In classical random walks on the integers $\mathbb Z$, the transition probabilities governing the motion of the random walk are the same at every site. In contrast, for random walks in random environments (RWRE) the transition probabilities at each site are randomly chosen according to some probability measure.

This seemingly simple modification to classical random walks can create very surprising behaviors such as transience in the opposite direction of the average drift and non-Gaussian limiting distributions.

This unexpected behavior of RWRE is due to the existence of ``traps'' in the environment: portions of the environment that contain the random walk for longer than normal. An important tool for analyzing the trapping effects of the environment is what is known as the potential of the environment.

In this talk I will explain how the potential of the environment and some simple probability computations can be used to study ``bridges'' of RWRE (a RWRE conditioned to be back at the starting location after $2n$ steps). This is based on joint work with Nina Gantert.

The 2010 A. J. Duncan Lectures**
**Andreas Buja, the Wharton
School, University of Pennsylvania

**Abstract of Thursday's lecture:
**

This talk is a tour of four statistical applications where visualization proves useful, maybe even essential. Two applications are about teaching certain abstract concepts, and two more applications involve discovery in data analysis. In all four applications the visualizations are dynamic and interactive. It may be surprising that this feature (1) is quite useful and (2) can be entirely programmed in the R language.

The teaching applications are about

1) explaining "regression to the mean" to Statistics 101 students, and

2) illustrating "proper scoring rules" to machine learning researchers.

The data analytic applications show

3) space-time aspects of commercial ocean fishing data, and

4) associations among more than 1,000 variables in autism data.

The last example raises some issues of statistical inference.

Abstract of Friday's lecture:

*Statistical Inference for Exploratory Data Analysis
and Model Diagnostics*

We propose to furnish statistical data visualization with an
inferential framework and protocol, modeled after confirmatory
statistical testing. In this framework, plots take on the role of
test statistics, and human cognition the role of statistical tests.
Statistical significance of "discoveries" is measured by having the
human viewer compare the plot of the real dataset with collections of
plots of simulated datasets. A simple but rigorous protocol that
provides inferential validity is modeled after the ``lineup'' popular
from criminal legal procedures. Another protocol modeled after the"Rorschach" inkblot test, well-known from (pop-)psychology, will help
analysts acclimatize to random variability before being exposed to the
plot of the real data. The proposed protocols will be useful for
exploratory data analysis, with reference datasets simulated by using
a null assumption that structure is absent. The framework is also
useful for model diagnostics in which case reference datasets are
simulated from the model in question. Adopting the protocols will
mean an adjustment in working procedures for data analysts, adding

more rigor, and teachers might find that incorporating these protocols into the curriculum improves their students' statistical thinking.

04/22/2010

Federico Bandi, Johns Hopkins University

*Nonparametric Leverage Effects*

**Abstract: **Vast empirical evidence points to the existence of a negative correlation, named "leverage effect," between shocks in volatility and shocks in returns. We provide a nonparametric theory of leverage estimation in the context of a continuous-time stochastic volatility model with jumps in returns, jumps in volatility, or both. Leverage is defined as a flexible function of the state of the firm, as summarized by the spot volatility level. We show that its point-wise functional estimates have asymptotic properties (in terms of rates of convergence, limiting biases, and limiting variances) which crucially depend on the likelihood of the individual jumps and co-jumps as well as on the features of the jump size distributions. Empirically, we find economically important time-variation in leverage with more negative values associated with higher volatility levels.

04/29/2010

Andrew Barron, Yale University

*Information-Theoretic Validation of L1 Penalized Likelihood*

**Abstract:**Simplicity and likelihood are linked by principles of information theory,
armed with which various likelihood penalties are developed for model
selection. We provide methods for analysis of these procedures based both on

properties of statistical risk and properties of data compression. It is
classical that penalties based on number of parameters have these
information-theoretic properties, as we review. Here we show also that
penalities based on L1 norms of coefficients in regression are both risk
valid and codelength valid.

This page uses Peter Jipsen and J. Knisley's implementation of LaTeXMathML.If you use Internet Explorer, you can either switch to Firefox, or install the mathPlayer plug-in to see nice mathematical expressions.