Statistics plays a pivotal role in today’s big/small data challenges. My research interests lie in developing Bayes theory and methods for a board range of statistical problems, such as high-dimensional data analysis, nonparametric problems, uncertainty quantification, and large-scale heterogeneous data problems. I also develop new Bayesian methods for various applications, including electronic health records, dynamic treatment regimens, cancer genomics, early detection of Alzheimer’s disease, mental health in people with HIV, early-phase clinical trial designs, and material engineering. Examples of previous and ongoing research conducted by the research group are listed below, including 1) Bayes theory and methods for high-dimensional data ; 2) Methods and applications to HIV related studies ; 3) Reinforcement learning and Dynamic treatment regimens for precision medicine ; 4) Bayesian early-phase clinical trial designs ; 5) Interpretable Augmented Intelligence for Material Engineering .
In contemporary statistics, datasets are typically collected with high-dimensionality, where the dimension can be significantly larger than the sample size. In the high-dimensional setting, additional structural assumptions are often necessary in order to address challenges associated with statistical inference. For example, sparsity is introduced for sparse covariance/precision matrix estimation, and low-rank structure is enforced in spiked covariance matrix models. Taking network data analysis as an example. Latent position graphs have been proved to be useful for varieties of network analysis problems, and we focus on a particular class of latent position graphs: the random dot product graphs. They are simple in architecture but can be used as a building block for approximating more general latent position graphs with positive definite link functions. The techniques for statistical analysis on random dot product graphs so far have been focusing on spectral methods, e.g., the adjacency spectral embedding (ASE), whereas the likelihood information is neglected. Furthermore, it remains open what is the minimax risk for estimating the latent positions, and how can one achieve it by constructing a useful estimator? The overall goal is to establish the theoretical framework of Bayesian models for random dot product graphs completely by showing both its first-order and second-order optimality.
Selected Publications:
Grants:
Although combination antiretroviral therapy (ART) is highly effective in suppressing viral load for people with HIV (PWH), many ART agents may exacerbate central nervous system (CNS)-related adverse effects including depression. Therefore, understanding the effects of ART drugs on the CNS function, especially mental health, can help clinicians personalize medicine with less adverse effects for PWH and prevent them from discontinuing their ART to avoid undesirable health outcomes and increased likelihood of HIV transmission. The emergence of electronic health records offers researchers unprecedented access to HIV data including individuals’ mental health records, drug prescriptions, and clinical information over time. However, modeling such data is very challenging due to high-dimensionality of the drug combination space, the individual heterogeneity, and sparseness of the observed drug combinations. We develop Bayesian approaches to learn longitudinal drug effects and drug combination effects on mental health in PWH adjusting for socio-demographic, behavioral, and clinical factors. Our method has clinical utility in guiding clinicians to prescribe more informed and effective personalized treatment based on individuals’ treatment histories and clinical characteristics.
Selected Publications:
Selected Grants:
Selected Collaborators:
Traditional statistical methods for dynamic treatment regimes usually focus on estimating an optimal sequence of treatments at given medical interventions, but overlook the important question of “when this intervention should happen.” This project fills in this gap by building a generative probabilistic model for a sequence of medical interventions–which are discrete events in continuous time–with a marked temporal point process where the mark is the assigned treatment or dosage. This decision model is then embedded into a Bayesian joint framework that also models clinical observations including longituindal clinical measurements and time-to-event data. We also develop a policy gradient method to train the decision model, by interacting with the observation model, to learn the personalized optimal clinical decision with the goal of optimizing patients’ health outcomes. Moreover, we have built an R package {\it doct} (short for ``Decisions Optimized in Continuous Time”) so that users can apply the proposed method to datasets in a similar setup that involves longitudinal decision making and an objective reward to optimize.
Selected Publications:
Hua W+ , Mei H, Zohar S, Giral M and Xu Y# , “Personalized Dynamic Treatment Regimes in Continuous Time: A Bayesian Joint Model for Optimizing Clinical Decisions with Timing.” arXiv:2007.04155 / R package doct
Xu Y, Müller P, Wahed A and Thall P, “Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times (with discussion)”.
arXiv:1405.2656
/
software
/
supplement
Journal of the American Statistical Association 111.515 (2016): 921-950.
(Winner of the 2015 David P. Byar Young Investigator Travel Award Sponsored by ASA Biometrics Section)
Grants:
Developing targeted therapies based on patients’ baseline characteristics and genomic profiles such as biomarkers has gained growing interests in recent years. Depending on patients’ clinical characteristics, the expression of specific biomarkers or their combinations, different patient subgroups could respond differently to the same treatment. An ideal design, especially at the proof of concept stage, should search for such subgroups and make dynamic adaptation as the trial goes on. When no prior knowledge is available on whether the treatment works on the all-comer population or only works on the subgroup defined by one biomarker or several biomarkers, it’s necessary to incorporate the adaptive estimation of the heterogeneous treatment effect to the decision-making at interim analyses. To address this problem, we propose an Adaptive Subgroup-Identification Enrichment Design, ASIED, to simultaneously search for predictive biomarkers, identify the subgroups with differential treatment effects, and modify study entry criteria at interim analyses when justified. More importantly, we construct robust quantitative decision-making rules for population enrichment when the interim outcomes are heterogeneous in the context of a multilevel target product profile, which defines the minimal and targeted levels of treatment effect. Through extensive simulations, the ASIED is demonstrated to achieve desirable operating characteristics and compare favorably against alternatives.
Selected Publications:
Materials discovery and development depend on understanding and harnessing the complexity and dynamics across scales, from 3D atomic level detail to component level performance. This project will utilize recent advances in data science to understand structure-property relationships in materials and make accurate and robust property predictions. We will utilize available data more efficiently through combination with physical rules and prior knowledge to develop an interpretable augmented intelligent (AI) system to learn principles behind the association of input structures with material properties with uncertainty quantification. We have an interdisciplinary team with both domain scientists and data scientists shown below.
Grants:
Collaborators: