Seminar Abstracts (spring 2010)
Peter Frazier, Cornell University
Ranking and Selection of Many Alternatives using Correlated Knowledge Gradients
Abstract: We consider the ranking and selection problem, in which one wishes to use a fixed sampling budget efficiently to find which of several alternatives is the best. By explicitly modeling the relationship between alternative values with a correlated Bayesian prior, sampling policies may perform efficiently even when the number of alternatives is very large. We propose the correlated knowledge-gradient sampling policy and show special cases in which it is Bayes-optimal. We apply it to two problems: drug discovery, and continuous global optimization.
Seminar is cancelled due to weather
George Fann, Oak Ridge National Laboratory
Fast Multiresolution Methods for Density Functional Theory using Petascale Computers
Abstract: An overview of an adaptive pseudo-spectral approach for solving high dimensional integro-partial differential equations using petaflop computers will be presented. The real-analysis based algorithms are based on multiresolution analysis and low-separation rank approximations of functions and operators. We apply these methods to solve 3-D Schroedinger, Lippmann-Schwinger and Density Functional Equations to high precision without assumptions on symmetry. Each of the operators and wave-functions has its own structure of refinement to achieve and guarantee the desired finite precision. We will discuss scaling up to 100K cores on the Cray XT-5 and extensions to higher dimensional time-dependent problems. These types of problems are common in the simulations of semiconductors, quantum chemistry, molecular electronics and nuclear physics
Alexandra Kolla, Institute for
New Techniques for Using and Approximating Graph Spectra
In the first part of the talk, we present new techniques for approximating a large graph with a smaller one. Namely, we show how, given a large graph G and a subgraph H of it, we can choose a very small number of edges H' out of H such that replacing H with H' does not change the spectrum of G by much. We also discuss significant implications of our techniques in speeding up linear system solvers and creating cost-efficient, well-connected networks.
In the second part of the talk, we present spectral techniques, for investigating the validity of Khot's Unique Games conjecture (UGC). UGC is one of the most central open problems in computational complexity theory. It asserts that for a certain constraint satisfaction problem, it is NP-hard to decide whether there is a labeling that satisfies almost all the constraints or, for every labeling, the fraction of the constraints satisfied is very small. Since its origin, the UGC has been applied with remarkable success to prove tight hardness of approximation results for several important NP-hard problems such as Vertex Cover , MAXCUT.
We present a novel spectral algorithm for deciding satisfiability of Unique Games. Our algorithm, given a 1-\epsilon satisfiable instance of UG, recovers a good labelling (that satisfies more than 99% of the constraints), for a plethora of graphs, including expanders and graphs that are shown to be hard for semidefinite programming techniques.
Ciprian Crainiceanu, Johns Hopkins Unversity
Longitudinal Functional Principal Component Analysis
Abstract: We introduce models for the analysis of functional data observed at multiple
time points. The dynamic behavior of functional data is decomposed into a
time-dependent population average, baseline (or static) subject-specific variability,
longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability
and measurement error. The model can be viewed as the functional analog
of the classical mixed effects model where random effects are replaced by random
processes. Methods have wide applicability and are computationally feasible for
moderate and large data sets. Computational feasibility is assured by using principal
component bases for the functional processes. The methodology is motivated
by and applied to a diffusion tensor imaging (DTI) study designed to analyze differences
and changes in brain connectivity in healthy volunteers and multiple sclerosis
David Ruth, United States Naval Academy
Applications of Assignment Algorithms to Nonparametric Tests for Homogeneity
Abstract: Given a sequence of observations, has a change occurred in the underlying probability distribution with respect to observation order? This problem of detecting “change points” arises in a variety of applications including quality control, health diagnosis, biosurveillance, and image analysis. Detecting change points in high-dimensional settings is challenging, and most change-point methods for multi-dimensional problems rely upon distributional assumptions or the use of observation history to model probability distributions. We present new non-parametric statistical tests for change points based on non-bipartite matching techniques. These tests are capable of detecting changes of quite general nature and require no distributional assumptions. The most powerful among these new tests compares favorably to parametric competitors even when parametric assumptions are met.
Corina Tarnita, Harvard University
Matthew Macauley, Clemson University
Graph Dynamical Systems -- A Mathematical Framework for Interaction-Based Systems
Abstract: Many complex systems can be modeled in an agent-based manner, where the dynamics are governed by purely local interactions. Analyzing these large systems is difficult due to non-homogeneity and the lack of mathematical tools used to analyze traditional continuous models. As a result, agent-based systems are often analyzed using computer simulations, with little or no formal validation of the results. Graph dynamics systems (GDSs) represent a natural framework for the modeling and simulation of such systems, and are an alternative to more traditional continuous models. I will give an overview of some basis GDS theory, and give three example applications of where a GDS framework is used, including traffic simulations, epidemiology, and gene annotation algorithms. This is ongoing work with the Network Dynamics and Simulation Science Laboratory (NDSSL) at Virginia Tech.
Jonathon Peterson, Cornell University
Traps, slowdowns, and bridges of one-dimensional transient random walks in a random environment
In classical random walks on the integers $\mathbb Z$, the transition probabilities governing the motion of the random walk are the same at every site. In contrast, for random walks in random environments (RWRE) the transition probabilities at each site are randomly chosen according to some probability measure.
This seemingly simple modification to classical random walks can create very surprising behaviors such as transience in the opposite direction of the average drift and non-Gaussian limiting distributions.
This unexpected behavior of RWRE is due to the existence of ``traps'' in the environment: portions of the environment that contain the random walk for longer than normal. An important tool for analyzing the trapping effects of the environment is what is known as the potential of the environment.
In this talk I will explain how the potential of the environment and some simple probability computations can be used to study ``bridges'' of RWRE (a RWRE conditioned to be back at the starting location after $2n$ steps). This is based on joint work with Nina Gantert.
04/15/2010 and 04/16/2010
The 2010 A. J. Duncan Lectures
Andreas Buja, the Wharton School, University of Pennsylvania
Abstract of Thursday's lecture:
Seeing is Believing: Statistical Visualization for Teaching and Data Analysis
This talk is a tour of four statistical applications where visualization proves useful, maybe even essential. Two applications are about teaching certain abstract concepts, and two more applications involve discovery in data analysis. In all four applications the visualizations are dynamic and interactive. It may be surprising that this feature (1) is quite useful and (2) can be entirely programmed in the R language.
The teaching applications are about
1) explaining "regression to the mean" to Statistics 101 students, and
2) illustrating "proper scoring rules" to machine learning researchers.
The data analytic applications show
3) space-time aspects of commercial ocean fishing data, and
4) associations among more than 1,000 variables in autism data.
The last example raises some issues of statistical inference.
Abstract of Friday's lecture:
Statistical Inference for Exploratory Data Analysis and Model Diagnostics
We propose to furnish statistical data visualization with an
inferential framework and protocol, modeled after confirmatory
statistical testing. In this framework, plots take on the role of
test statistics, and human cognition the role of statistical tests.
Statistical significance of "discoveries" is measured by having the
human viewer compare the plot of the real dataset with collections of
plots of simulated datasets. A simple but rigorous protocol that
provides inferential validity is modeled after the ``lineup'' popular
from criminal legal procedures. Another protocol modeled after the"Rorschach" inkblot test, well-known from (pop-)psychology, will help
analysts acclimatize to random variability before being exposed to the
plot of the real data. The proposed protocols will be useful for
exploratory data analysis, with reference datasets simulated by using
a null assumption that structure is absent. The framework is also
useful for model diagnostics in which case reference datasets are
simulated from the model in question. Adopting the protocols will
mean an adjustment in working procedures for data analysts, adding
more rigor, and teachers might find that incorporating these protocols into the curriculum improves their students' statistical thinking.
Federico Bandi, Johns Hopkins University
Nonparametric Leverage Effects
Abstract: Vast empirical evidence points to the existence of a negative correlation, named "leverage effect," between shocks in volatility and shocks in returns. We provide a nonparametric theory of leverage estimation in the context of a continuous-time stochastic volatility model with jumps in returns, jumps in volatility, or both. Leverage is defined as a flexible function of the state of the firm, as summarized by the spot volatility level. We show that its point-wise functional estimates have asymptotic properties (in terms of rates of convergence, limiting biases, and limiting variances) which crucially depend on the likelihood of the individual jumps and co-jumps as well as on the features of the jump size distributions. Empirically, we find economically important time-variation in leverage with more negative values associated with higher volatility levels.
Andrew Barron, Yale University
Information-Theoretic Validation of L1 Penalized Likelihood
Abstract:Simplicity and likelihood are linked by principles of information theory,
armed with which various likelihood penalties are developed for model
selection. We provide methods for analysis of these procedures based both on
properties of statistical risk and properties of data compression. It is classical that penalties based on number of parameters have these information-theoretic properties, as we review. Here we show also that penalities based on L1 norms of coefficients in regression are both risk valid and codelength valid.
This page uses Peter Jipsen and J. Knisley's implementation of LaTeXMathML.If you use Internet Explorer, you can either switch to Firefox, or install the mathPlayer plug-in to see nice mathematical expressions.