Seminar Abstracts (Spring 2014)
Warren B. Powell (Princeton University)
Energy and Uncertainty: From the laboratory to policy, unifying the fields of stochastic optimization
Abstract: Errors in forecasting wind and clouds, generator failures, market responses to price signals, unexpected outcomes in laboratory experiments, breakthroughs in storage technology, the heavy-tailed volatility of electricity prices, our shifting understanding of climate change, and the policies of state and federal governments. These are all examples of the different types of uncertainty that we have to deal with as we plan an energy future with the diminished footprint of fossil fuels. We have to understand these sources of uncertainty as we search for new materials, design and operate the power grid, work with energy markets, manage the energy demands of buildings and residences, and guide policy. We would like to guide these decisions using analytics, but we lack even a fundamental vocabulary for writing down these problems mathematically. Stochastic optimization is a jungle of mathematically arcane, computationally complex models and algorithms which have to be tuned for each application. In this talk, I will provide a brief overview of a mathematical framework that can be taught to any undergraduate in science and engineering. I will then unify a sprawling literature using four fundamental classes of policies. Finally, I will illustrate these using several applications in energy systems, closing with a description of SMART-ISO.
Clearing the Jungle of Stochastic Optimization
Abstract: Mathematical programming has given us a widely used canonical framework for modeling deterministic optimization problems. However, if we introduce a random variable, the field fragments into a number of competing communities with names such as stochastic programming, dynamic programming, stochastic search, simulation optimization, and optimal control. Differences between these fields are magnified by notation and terminology, which hide subtle but more important differences in problem characteristics. Complicating matters further is a misunderstanding of basic terms such as dynamic program, policy, and state variable. While deterministic optimization problems are defined by solving for decisions, sequential stochastic optimization problems are characterized by finding functions known as policies. I will identify four fundamental classes of policies which unify the competing subcommunities into a common framework I call computational stochastic optimization. These ideas will be illustrated using applications drawn from energy, health, freight transportation, and vehicle navigation.
John Benedetto (University of Maryland)
Fourier operators in applied harmonic analysis
Abstract: The short-time Fourier transform (STFT) is formulated to address problems in waveform design and optimal ambiguity function behavior, to define vector-valued Discrete Fourier Transforms (DFTs) for modelling of multi-sensor environments, and to prove non-uniform sampling formulas and applicable pseudo-differential operator inequalities in terms of balayage. The technology combines methods from harmonic analysis, number theory, finite Gabor systems associated with compressive sensing, and the theory of frames.
Roger Peng (Johns Hopkins University/School of Public Health, Biostatistics)
Estimating the health benefits of reducing particulate matter air pollution
Abstract: There is increasing evidence that particulate matter (PM) air
pollution is harmful to human health and can cause a range of
cardiovascular and respiratory problems. The relationship between PM
and health outcomes in humans has been studied using a variety of
study designs ranging from large-scale ecologic studies to small-scale
challenge studies, with each design having advantages and
disadvantages. In this talk we explore an intermediate study design
that involves introducing an intervention designed to lower air
pollution levels without explicitly controlling its level and describe
the statistical methods used to analyze data from such studies. We
present results from a randomized air cleaner intervention study
conducted in a population of Baltimore City children with asthma.
Using the method of principal stratification, we found that the the
number of asthma-related symptom-free days increased significantly in
the subgroup of children for whom the intervention decreased indoor
pollutant levels. This effect was larger than the overall treatment
effect. We also found some evidence of a positive treatment effect in
the subgroup of children for whom the intervention had no effect on
pollutant levels. Overall, this study found evidence that modifying
indoor pollution levels improves symptom outcomes in children with
asthma and that interventions designed to lower pollution levels may
have other benefits that warrant further exploration.
Peter Bickel (University of California, Berkeley)
Surprising phenomena in infererence for iid p dimensional data for 0<p/n<1 and for p/n->oo
Abstract: I. We study the behavior of robust regression and least squares in the unpenalized case for 0<p/n<1.Surprisingly it turns out individual coefficients are still asymptotically normal unbiased at rate 1/sqrt(n) but the variance differs from the p fixed situation with important implications for inference.
Heuristics: Noureddine el Karoui,Derek Bean and Bin Yu (2012)
Proofs:Noureddine el Karoui(2013),D.Donoho and Andrea Montanari(2013)
II. It is recognized through earlier work of Huber and Diaconis and Freedman that if coordinates are iid (or under weaker conditions) as
p and n ->oo the marginal distributions for almost all projections are asymptotically Gaussian so that non Gaussian empirical distributions suggest non linear phenomena. Suppose that the data are, in fact, coordinatewise ,iid Gaussian, so that the distribution of all projections is Gaussian.We show that, if p/n->oo, given any distribution F, a projection can be found whose empirical df is arbitrarily close to F.
(with Boaz Nadler)
Sofia Olhede (University London College)
Network histograms and universality of blockmodel approximation
Abstract: Networks are fast becoming part of the modern statistical landscape. Yet we lack a full understanding of their large-sample properties in all but the simplest settings, hindering the development of models and estimation methods that admit theoretical performance guarantees. A network histogram is obtained by fitting a stochastic blockmodel to a single observation of a network dataset. Blocks of edges play the role of histogram bins, and community sizes that of histogram bandwidths or bin sizes. Just as standard histograms allow for varying bandwidths, different blockmodel estimates can all be considered valid representations of an underlying probability model, subject to bandwidth constraints. We show that under these constraints, the mean integrated square error of the network histogram tends to zero as the network grows large, and we provide methods for optimal bandwidth selection-thus making the blockmodel a universal representation. With this insight, we discuss the interpretation of network communities in light of the fact that many different community assignments can all give an equally valid representation of the network. To demonstrate the fidelity-versus-interpretability tradeoff inherent in considering different numbers and sizes of communities, we show an example of detecting and describing new network community microstructure in political weblog data.
This is joint work with Patrick Wolfe - UCL
Pablo A. Parrilo (Massachusetts Institute of Technology)
Convex sets, conic matrix factorizations and rank lower bounds
Abstract: In optimization one often represents convex sets in terms of convex cones. Such representations or 'lifts' of a convex set are especially useful if the cone admits efficient algorithms for linear optimization over its affine slices, as in the classical cases of linear and semidefinite programming. Despite the fact that these techniques are widely used, there are many aspects (particularly, existence and efficiency) that are still poorly understood. In this talk we discuss the relationship between conic representations of convex sets, and a special "conic" factorization of an operator associated to the convex set, generalizing earlier results of Yannakakis on polyhedral lifts of polytopes and nonnegative factorizations. When the cones live in a family, our results lead to the definition of the rank of a convex set with respect to this family (e.g., the positive semidefinite rank of a convex set), as well as techniques for lower bounding these ranks. We will provide a gentle introduction to these techniques, emphasizing geometric intuition, open questions as well as recent results. Based on joint work with Joao Gouveia, Hamza Fawzi, James Saunderson and Rekha Thomas.
Alberto Del Pia (IBM Watson Research Center)
Integer Polynomial Programming in the Plane
Abstract: In this seminar, we attempt to classify the complexity of Integer Programming in 2 variables when polynomials enter as the objective and/or constraints. We show that in 2 variables, optimizing a quadratic polynomial over the integer points in a polyhedral region is solvable in polynomial time, while optimizing a quartic polynomial in the same type of region is NP-Hard, and all the optimal solutions can be so large that they require exponential space, in terms of the size of the problem, to store any such solution in binary.
Laurent Saloff-Coste (Cornell University)
THE DUNCAN LECTURES
Groups and Random Walks
Abstract: The notion of group is both familiar to all mathematicians and astonishing in the complexity of the objects it describes. Random walks offer natural ways to explore these extraordinary structures. In the early 1960s, Harry Kesten suggested that only very few groups could carry random walks that never got lost at infinity. It took 25 years before N. Varopoulos proved Kesten right. Using concrete examples, this lecture will take us through the history of this subject from Karl Pearson, Henri Poincare, and George Polya, to Mark Kac leaving Johns Hopkins for Cornell in the summer of 1939, to some recent progress in our understanding of random walks on groups.
04/04/14 - 10:00am - WH 304
Random Walk Invariants of Groups
Abstract: This talk will be centered around the fact that the probability of return
of random walk on a group can be viewed as an invariant of the group.
Namely, for a large natural class of random walks, the large-n behavior of the probability of return at time n depends only on the group, not on the particular random walk that is considered. This property makes the study of this quantity both very natural and possible.
Carey Priebe (Johns Hopkins University)
Testing the cortical column conjecture
Abstract: Many contemporary theories of neural information processing suggest that the neocortex employs hierarchical algorithms composed of repeated instances of a limited set of computing primitives. There is a recognized need for tools for interrogating the structure of the cortical microcircuits believed to embody these primitives. The cortical column conjecture suggests that neurons are connected in a graph that exhibits motifs representing repeated processing modules. The question we investigate herein is this: Can we design a test that will allow us to decide whether an observed cortical graph satisfies the conjecture? For our purposes, we model a cortical graph as a hierarchical block model on n vertices (neurons) which contains induced subgraphs which are themselves independent stochastic block models. To enable the desired inference we must first identify subgraphs and then test whether these identified subgraphs correspond to repeated motifs. We demonstrate the efficacy of our preliminary dabbling via bio-inspired simulation. (This is joint work with Youngser Park & Vince Lyzinski & Minh Tang & Daniel Sussman & Joshua T. Vogelstein & R. Jacob Vogelstein &c.)
Andreas Waechter (Northwestern University)
Exploiting Inexact Subproblem Solutions in Nonlinear Optimization Algorithms
Abstract: Numerical algorithms for nonlinear constrained nonconvex optimization problems require the solution of subproblems, typically linear systems or quadratic optimization problems, in order to compute a new iterate. In many application areas, including problems involving ordinary or partial differential equations, the solution of these subproblems is a computational bottleneck, and it is highly desirable to work with approximate but quickly computable subproblem solutions. The challenge with such an approach lies in the development of suitable termination criteria for the subproblem solver that still lead to a convergent overall algorithm, particularly in the presence of inequality constraints. In this talk we present new frameworks for interior point and sequential quadratic programming methods. Numerical results on PDE-constrained and optimal control problems are presented to illustrate the reliability and efficiency of the proposed approach.
Rafael Mendoza-Arriaga (McCombs School of Business at the University of Texas at Austin)
Random Time Changes in Quantitative Finance
Abstract: Subjecting a stochastic process to a random time change is a classical technique in stochastic analysis. In this talk we survey our recent work on using random time changes as a flexible model-building tool for designing stochastic processes that possess rich features required for empirical realism in financial modeling. These features include state-dependent and time-inhomogeneous jumps and stochastic volatility. Moreover, our modeling framework offers analytical and computational tractability, which are required for operational purposes. We sketch applications across credit, commodity, equity, power generation systems and insurance.
Mengjie Chen (Yale University)
Two high dimensional statistical methods for data integration challenges in biology
Abstract: Last ten years have witnessed the delivery of an incredible amount of information through big-biology consortia, where multiple data types have been cataloged for the same samples. The integrative nature of these studies necessitates the development of novel statistical methods that can leverage multiple types of information. In this presentation, I’ll talk about two studies. In the first study, we developed a sparse canonical correlation analysis (sCCA) method for joint characterization of two sets of random variables in high-dimensional settings. Our method is capable of incorporating general covariance structure with computational efficiency. In addition, it is the first work to establish the convergence rate for sCCA. We applied this method to investigate the relationship between genome-wide methylation sites and gene expression that are marginally associated with clinical outcomes. The second study was motivated by the need of detecting gene-gene connections while removing the effect from genotypes. The problem has been formulated as joint estimation of the multiple regression coefficients and the precision matrix in Gaussian settings. We have developed a tuning-free sparse estimation procedure, ANTAC, which provides asymptotically normal estimation for covariate-adjusted Gaussian graphical models. A case study using a yeast eQTL dataset will be discussed.
Jim Fill (Johns Hopkins University)
Tamás Budavári (Johns Hopkins University)
Inference from Large-Scale Astronomy Data
Abstract: Dedicated telescopes systematically survey the sky every night across the entire electromagnetic spectrum. From these observations we expect to have 100PB of images in the next 10 years. Astronomy projects typically require data from a number of telescopes, and many rely on multiple epochs to study the variable sky. We will discuss recent developments that properly combine separate collections and enable their joint analysis.