Course Descriptions - Statistical Analysis/Decision Theory/Data Analysis
550.413 Applied Statistics and Data Analysis: An introduction to basic concepts, techniques, and major computer software packages in applied statistics and data analysis. Topics include numerical descriptive statistics, observations and variables, sampling distributions, statistical inference, linear regression, multiple regression, design of experiments, nonparametric methods, and sample surveys. Real-life data sets are used in lectures and computer assignments. Intensiveuse of statistical packages such as S+ to analyze data. Prerequisite: 550.112 or equivalent.
550.430 Introduction to Statistics: Introduction to the basic principles of statistical reasoning and data analysis. Emphasis on techniques of application. Classical parametric estimation, hypothesis testing, and multiple decision problems; linear models, analysis of variance, and regression; nonparametric and robust procedures; decision-theoretic setting, Bayesian methods. Prerequisite: 550.420 or approved alternative.
550.432 Linear Statistical Models: The general linear model in matrix terms. Techniques of applications, with use of statistical computer packages. Multiple regression, polynomial regression, stepwise regression, multicollinearity, reparametrization, normal correlation models and analysis; basic and multifactor analysis of variance, fixed and random effects. Prerequisites: 550.430, 550.291 or approved alternative.
550.436 Data Mining: Data mining is a relatively new
term used in the academic and business world, often associated with the
development and quantitative analysis of very large databases. Its definition
covers a wide spectrum of analytic and information technology topics,
such as machine learning, artificial intelligence, statistical modeling,
and efficient database development. This course will review these broad
topics, and cover specific analytic and modeling techniques such as advanced
data visualization, decision trees, neural networks, nearest neighbor,
clustering, logistic regression, and association rules. Although some
of the mathematics underlying these techniques will be discussed, our
focus will be on the application of the techniques to real data and the
interpretation of results. Because use of the computer is extremely important
when “mining” large amounts of data, we will make substantial
use of data mining software tools to learn the techniques and analyze
datasets. Prerequisite: 550.310 or equivalent. Recommended
prerequisite: 550.413.
550.439 Time Series Analysis: Time series analysis from the frequency and time domain approaches. Descriptive techniques; regression analysis; trends, smoothing, prediction; linear systems; serial correlation; stationary processes; spectral analysis. Prerequisites: 550.310, 550.311 or equivalent calculus-based probability course, 110.201 or 550.291 and mathematical maturity.
550.630 Statistical Theory I: The fundamentals of mathematical statistics. Distribution theory for statistics of normal samples; exponential statistical models; sufficiency principle; least squares, maximum likelihood, and UMVU estimation; hypothesis testing, the Neyman-Pearson lemma, likelihood ratio procedures; the general linear model, the Gauss-Markov theorem, multiple comparisons; contingency tables, chi-square methods, goodness-of-fit; nonparametric and robust methods; decision theory, Bayes and minimax procedures. Prerequisite: 550.420 or 550.620.
550.631 Statistical Theory II: Advanced concepts and
tools fundamental to research in mathematical statistics and statistical
inference: asymptotic theory; optimality; various mathematical foundations.
Prerequisite: 550.630.
550.632 Multivariate Statistical Theory: Theory of statistics when data are in the form of multivariate observations. The multivariate normal distribution; Wishart distributions; inference on means, Hotelling’s T2; multivariate linear models; regression, ANOVA; inference on covariances; classification and discrimination; principal components; canonical correlations; canonical variables. Prerequisites: 550.630, 550.692.
550.633 Time Series Analysis: Time series analysis
from the frequency and time domain
approaches. Descriptive techniques; regression analysis; trends, smoothing,
prediction; linear systems; serial correlation; stationary processes;
spectral analysis. Prerequisites:
550.630, 550.692.
550.640 Machine Learning: This course will focus on
theoretical and practical aspects of statistical learning. We will review
a collection of learning algorithms for classification and regression
estimation, including linear methods, kernel methods, tree-based and
boosting methods; we will also discuss unsupervised methods for linear
and nonlinear data reduction and clustering. We will introduce fundamental
concepts of the theory of model selection and validation: bias/variance
dilemma, penalty methods, and some measures of complexity; the course
will also include standard validation algorithms, like cross-validation
and bootstrap.
Prerequisites: 550.430.


