**Seminar Abstracts (Fall 2011)**

**Carey Priebe **(Johns Hopkins University)

*(1) Clustering Gene Effectors and
(2) Optimizing the Quantity/Quality Trade-off in Connectome Inference*

**Abstract: **JHU AMS has a formal collaborative relationship with
Howard Hughes Medical Institute's Janelia Farm Research Campus.

Opportunities abound.
This seminar will present motivation and preliminary results
for two ongoing projects.

This is joint work with
(1) Youngser Park and Marta Zlatic, and
(2) Joshua Vogelstein and Davi Bock.

**Richard Stanley** (Massachusetts Institute of Technology)

* A Survey of Alternating Permutations *

**Abstract:** An alternating permutation w = a_1 ... a_n of

1,2,...,n is a permutation such that a_i>a_{i+1} if and only if

i is odd. If E_n (called an Euler number) denotes the

number of alternating permutations of 1,...,n, then

\sum_n E_n x^n/n! = sec x + tan x.

We will discuss such topics as other occurrences of Euler numbers in mathematics, umbral enumeration of classes of alternating permutations, and the statistical behavior of longest alternating subsequences of permutations.

**Alan Izenman** (Temple University)

On High-Dimensionality in Multivariate Regression Problems

**Abstract:** As enormous data sets become the norm rather than the exception, statistics as a scientific discipline is changing to keep up with this development. Of particular interest are regression problems in which attention to high dimensionality has become an important part in determining how to proceed. In multiple regression, regularization and sparsity considerations have led to new methodologies for dealing with the high-dimensionality, low sample size situation. In multivariate regression, rank restrictions have led to a reduced-rank regression model that incorporates many of the classical dimensionality-reduction methodologies as special cases. In this talk I will discuss problems of working with regression data when there are a large number of variables and a relatively small number of observations, and I will explore some new graphical ideas in the multivariate case.

**Daniel Robinson** (Johns Hopkins University)

Matrix splitting methods for bound-constrained quadratic programming and linear complementarity problems

**Abstract:** I present two-phase matrix splitting methods for solving bound-constrained quadratic programs (BQPs) and linear complementarity problems (LCPs). The method for solving BQPs uses matrix splitting iterations to generate descent directions that drive convergence of the iterates and rapidly identify those variables that are active at the solution. The second-phase uses this prediction to further refine the active set and to accelerate convergence. The method for solving LCP combines matrix splitting iterations with a "natural" merit function. This combination allows one to prove convergence of the method and maintain excellent practical performance. Once again, a second subspace phase is used to accelerate convergence. I present numerical results for both algorithms on CUTEr test problems, randomly generated problems, and the pricing of American options.

**Genevera Allen** (Baylor College of Medicine, Texas Children's Hospital & Rice University)

Generalizing Principal Components Analysis

**Abstract:** Variables in high-dimensional data sets common in neuroimaging, metabolomics, environmetrics and time series often exhibit complex dependencies that can arise, for example, from spatial and/or temporal processes or latent network structures. Conventional multivariate analysis techniques often ignore these relationships. We propose a generalization of the singular value decomposition that is appropriate for transposable matrix data, or data in which neither the rows nor the columns can be considered independent instances. By finding the best low rank approximation of the data with respect to a transposable quadratic norm, we introduce a novel decomposition, entitled the *Generalized least squares Matrix Decomposition* (GMD). This decomposition can be used to perform generalized principal components analysis (GPCA) which directly accounts for dependencies in the data. We also introduce a framework for regularizing the factors of the GMD that can be used to perform sparse GPCA, functional GPCA, non-negative GPCA, and sparse non-negative GPCA. Additionally, we develop fast numerical algorithms allowing one to apply our methods to massive data sets. Through simulations and real data examples on functional MRI data and NMR spectroscopy, we demonstrate the utility of GPCA and sparse GPCA for dimension reduction, signal recovery, and feature selection with high-dimensional transposable data.

**Michael Damron** (Princeton University)

*Limit Shapes Outside the Percolation Cone*

**Abstract:** First-passage percolation deals with the large-scale
geometry of the randomly weighted graph obtained by placing i.i.d.
non-negative weights on the edges of the standard nearest-neighbor
graph on the square lattice. The “shape theorem” of Durrett and
Liggett states that under mild conditions, if B(r) is the ball of
radius r about the origin in the weighted graph metric, then with
probability one, B(r)/r converges uniformly to a deterministic compact
convex set C. In joint work with Mike Hochman and more recent work
with Tuca Auffinger, we investigate the limiting shape for a broad
class of distributions for which there is a coupling with an
underlying oriented percolation model. We can show that in these cases
the limit shape must be non-polygonal. I will explain this theorem and
its various consequences, relating to infinite coexistence in
stochastic growth and competition models, and, if time permits, shape
fluctuation theorems.

**Joshua Epstein** (Johns Hopkins University - Dept of Emergency Medicine)

*Mathematical and Agent-Based Computational Modeling of Contagion Dynamics*

**Abstract:** Following a compact development of the classical infectious disease model, Epstein will present mathematical and agent-based computational extensions,

impinging on such phenomena as pandemic flu, political upheaval, and contagious fear. Models at scales ranging from the school playground to the entire planet will be presented.

**Jan Hannig** (University of North Carolina)

*On Generalized Fiducial Inference*

**Abstract: **R. A. Fisher's fiducial inference has been the subject of many discussions and controversies ever since he
introduced the idea during the 1930's. The idea experienced a bumpy ride, to say the least, during its early years
and one can safely say that it eventually fell into disfavor among mainstream statisticians. However, it appears
to have made a resurgence recently under various guises. For example under the new name generalized inference fiducial
inference has proved to be a useful tool for deriving statistical procedures for problems where frequentist methods with good
properties were previously unavailable. Therefore we believe that the fiducial argument of R.A. Fisher deserves a fresh look from a new angle.

In this talk we first generalize Fisher's fiducial argument and obtain a fiducial recipe applicable in virtually any situation. We demonstrate this fiducial recipe on examples of varying complexity. We also investigate, by simulation and by theoretical considerations, some properties of the statistical procedures derived by the fiducial recipe showing they often possess good repeated sampling, frequentist properties. As an example we apply the recipe to interval observed mixed linear model.

Portions of this talk are based on a joined work with Hari Iyer, Thomas C.M. Lee and Jessi Cisewski.

**Yulia Gel** (Visiting Associate Professor at Johns Hopkins University from University of Waterloo)

*Banded estimation and prediction for linear time series*

**Abstract: **
This talk discusses banded regularization of an empirical autocovariance matrix and its impact on model estimation and forecasting of a linear weakly dependent time series which does not degenerate to a finite dimensional representation. In particular, we show that banding enables us to employ an approximating model of a much higher order than typically suggested by AIC, while controlling how many parameters are to be estimated precisely and the level of accuracy. We present results on asymptotic consistency of banded autocovariance matrices under the Frobenius norm and the same realization of time series, and provide a theoretical justification on optimal band selection using cross-validation. Remarkably, the cross-validation loss function for banded prediction is related to the conditional mean square prediction error (MSPE) and, thus, may be viewed as an alternative model selection criterion. The proposed procedure is illustrated by simulations and application to predicting sea surface temperature (SST) index in the Nino 3.4 region. This is a joint work with Peter Bickel, University of California, Berkeley.

**Robert Koyak** (Naval Postgraduate School)

*Detecting Change in Multivariate Data Streams Using Minimum Subgraphs*

**Abstract: **Consider a sequence of independent, multivariate observations on which we test whether their sampling distributions are the same (homogeneity) or change in a systematic manner (heterogeneity). Such change may consist of single “jump” to a new distribution at an unspecified point (“time”) in the observation sequence or a gradual drift such that the metricized distributional change increases with time separation. A matrix of inter-point distances provides a rich store of information about the distributional structure of the observations. When used as edge weights in a complete, undirected graph with vertices consisting of the sequence labels, interesting possibilities arise for tapping into this information with minimum subgraphs of various kinds, including spanning trees and nonbipartite matchings. These subgraphs can yield nonparametric tests for multivariate homogeneity that are quite powerful. With the exception of minimum spanning trees, however, the theory behind these procedures is largely undeveloped. We begin by reviewing these procedures, and then discuss both computational and theoretical challenges to broadening their applicability.

**The 2011 A. J. Goldman Lecture**

**Dimitris Bertsimas** (Massachusetts Institute of Technology)

*A computationally tractable theory of performance analysis in stochastic systems*

**Abstract:**Modern probability theory, whose foundation is based on the axioms set forth by Kolmogorov, is currently the major tool for performance analysis in stochastic systems. While it offers insights in understanding such systems, probability theory is really not a computationally tractable theory. Correspondingly, some of its major areas of application remain unsolved when the underlying systems become multidimensional: Queuing networks, network information theory, pricing multi-dimensional financial contracts, auction design in multi-item, multi-bidder auctions among others.

We propose a new approach to analyze stochastic systems based on robust optimization. The key idea is to replace the Kolmogorov axioms as primitives of probability theory, with some of the asymptotic implications of probability theory: the central limit theorem and law of large numbers and to define appropriate robust optimization problems to perform performance analysis. In this way, the performance analysis questions become highly structured optimization problems (linear, conic, mixed integer) for which there exist efficient, practical algorithms that are capable of solving truly large scale systems.

We demonstrate that the proposed approach achieves computationally tractable methods for (a) analyzing multiclass queuing networks, (b) characterizing the capacity region of network information theory and associated coding and decoding methods generalizing the work of Shannon, (c) pricing multi-dimensional financial contracts generalizing the work of Black, Scholes and Merton, (d) designing multi-item, multi-bidder auctions generalizing the work of Myerson.

This is joint work with my doctoral student at MIT Chaithanya Bandi.**Jason Eisner **(Johns Hopkins University)

*A Non-Parametric Bayesian Approach to Inflectional Morphology*

**Abstract: **Have you ever studied a foreign language and had to memorize verb conjugations? How many regular and irregular verbs did you have to study before you could generalize the patterns to new verbs? Were you able to acquire more new verbs and new patterns by reading text in the foreign language? These are problems of statistical inference. You are inferring a distribution over vectors of related strings: <amo, amas, amat, amamus, amatis, amant>.

In this talk, we formalize this problem and give approximate inference algorithms. We first develop a model that stochastically generates a new language's grammatical patterns, dictionary, and text. (There is no bound on the length of the dictionary or the length of the strings in the dictionary.) We then show how we infer posterior marginals and modes of the unobserved random variables, using dynamic programming wrapped in loopy belief propagation inside Monte Carlo EM.

This is joint work with Markus Dreyer.