550.400 Mathematical Modeling and Consulting
Spring 2010
Lecture notes

Lecture 1 - 1/25/10

Overview of the course, syllabus. Getting acquainted.

R installation

Getting started with R:

writing scripts and passing commands to the console

creating a vector of numbers, strings or TRUE/FALSE values

the objects() command

the history() command

copying the R icon to a folder

setting up the "Start in" folder

leaving R and returning

organizing projects in folders

R Script for Lecture 1


Lecture 2 - 1/27/10

Reading assignment: Read Chapters 1 & 2 in Introduction to R by Friday Chapters 3 & 4 by Monday

R Script for Lecture 2

Some statistical concepts discussed:

Properties of data vs. properties of idealized model that describes how data are generated

empirical cdf for data

quantile function for data

Statistical model: data come from an exponential distribution:

cdf: F(x) = 1-exp(-lambda*x) for x>0

pdf: f(x) = lambda*exp(-lambda*x) for x>0

fitting based on data by estimating lambda: lambda-hat=1/mean of data

If we sample a distribution F to get X1,X2,.... then F(X1),F(X2),... should look like a Uniform(0,1) sample

Working with R

Lecture 3 1/29/10

The sample() command

The (non-parametric) boostrap method for determining sampling variability for an estimator (illustrated with an exponential rate parameter estimate).

R Script for Lecture 3

Working with R


Lecture 4 2/1/10

Reading assignment: Read Chapters 6 & 7 by Friday 2/5/10

R script for Lecture 4

Working with R


Lecture 5 2/3/10

R script for Lecture 5

Working with R

Lecture 6 2/5/10

R script for Lecture 6

Some notes (pdf)


Lecture 7 2/15/10

Reading assignment - Chapters 7 & 8 by Wednesday

Quiz Friday - sample questions will be posted on Wednesday

R script for Lecture 7

pop data for states (link to Wikipedia)

.csv file - after importing into Excel, copy & pasting table, exporting to .csv file

tax data for states (link to Wikipedia)

.csv file - after importing into Excel, copy & pasting table, exporting to .csv file


Lecture 8 2/17/10

R script for Lecture 8


Lecture 9 2/19/10

R script for Lecture 9

Linear models continued

Lecture 10 2/22/10

Notes on normal distributions & the linear model - pdf

R script for Lecture 10

Univariate normal distribution

Bivariate normal distribution

Multivariate normal distribution

The linear model in matrix form

Are the Pearson Father and Son data sampled from a bivariate normal distribution?

Example of a heavy-tailed distribution (ratio of normals with mean 0) to show that not all distributions can be assume to have means

Testing whether any variable help to predict worker's wages - leading up to the F-test

Lecture 11 2/24/10

Notes on normal distributions & the linear model (stuff added) - pdf

R script for Lecture 11

Multivariate normal distribution properties

F distribution

Linear model in matrix form

The F-test

Doing matrix calculations done by LM in R

F-test in R

Lecture 12 2/26/10

Paul Maiste (Lityx) - Monday Lecture

R script for Lecture 12

Notes on R-squared & Interactions

What is R squared?

Writing your own R functions

What are interactions for pairs of categorical variables?

Lecture 13 3/1/10

Paul Maiste

Lecture 14 3/3/10

R script for Lecture 14

Handwritten notes (pdf) on Akaike's information criterion & maximum likelihood

Applications of looping in R

Stepwise regression

Akaike's information criterion

Lecture 15 3/5/10


R script for Lecture 15

Implementing leave-one-out cross-validation

The which() command

Regression trees (pdf file)

Lecture 16 3/8/10

Quiz on Wednesday

R script for Lecture 16

Notes on logistic regression, likelihood for logistic regression, parametric bootstrap

Analysis of student's snail ring predictions

The order() command

Modeling binary response data

Logit model

Maximum likelihood

Structure of log-likelihood for logit model

R's glm function

Parametric bootstrap for sampling variability

Simulating binary response data

Lecture 17 3/10/2010

R script for Lecture 17

GLM logistic regression output

Fitted values - two types


GLM predictions

Separating training and testing

Stepwise fitting

Lecture 18 3/12/2010

Susan Wierman, Executive Director

Mid-Atlantic Regional Air Management Association

A Brief Intoduction to Air Quality Data Analysis (ppt file)

Lecture 19 3/22/2010

R script for Lecture 19

Handwritten notes on ROC and nonparametric bootstrap

ROC curves

sapply, tapply, lapply

Lecture 20 3/24/2010

Carl Liggio - Analyzing Power Prices - notes were emailed

Lecture 21 3/26/2010

R script for Lecture 21

Classification trees

ROC curve comparison

adding a legend()

Various applications of sapply()

applying a function to a vector

vector output

arguments function

nonparametric bootstrap

Lecture 22 3/29/2010

Yichen Qin - Rtcltk package Lecture (examples)

Lecture 23 3/31/2010

Quiz Friday

Handwritten notes on survival analysis

R Script for Lecture 23

Survival functions

Exponential survival distributions

Estimation of the survival curve when there is no censoring

Censored data

Probabilities as products of conditional probabilities

Kaplan-Meier estimator

R's aml dataset

Lecture 24 4/2/2010

R script for Lecture 24

Hazard function

R's survival package

Surv() function and objects it creates

The survfit()

Lecture 25 4/5/2010

Quiz #5 - Monday April 12th

R script/template for HW#5

R script for Lecture 25

Cox' proportional hazard model

The Rossi criminal recidivism dataset (.csv file)

Stepwise Cox proportional hazards modeling in R

Cox Proportional-Hazards Regression for Survival Data by John Fox

Lecture 26 4/7/2010

Guidelines for end of semester projects:
Looking for a report with the following components:

1) Introduction to the problem - goals
2) Description of data including source
3) Statistical methodology implemented
4) Report on findings w/ text, tables & figures
5) Discussion of strengths and weaknesses of approach
5) Conclusions
6) References
7) Summary of contribution of each team member

Grading will be based on evidence of:

1) Creativity
2) Clarity of presentation
3) Thoroughness of investigation
4) Thoughtfulness of investigation
5) Effort expended
6) Critical thinking

R script for Lecture 26

The data() command

Quantile regression

Quantile as solutions to a minimization

Regression quantiles

Analysis of the Engel food expenditure data

Quantiles as solutions to a minmization problem & quantile regression

Lecture 27 4/9/2010

Reminder: quiz monday

R script for Lecture 27

Nonparametric regression kernel smoothing

Why we smooth

Predicting temperature on a given day in the future

Finding the best window size

Smoothing in R

nonparametric regression kernels


Lecture 28 4/12/2010

R script for Lecture 28

Reading sampled lines from large datasets

Time series models

Autoregressive models

Moving average models

Lecture 29 4/14/2010

R script for Lecture 29

Time series models (continuation of written notes from last lecture)

Moving average models

Autocovariance/Autocorrelation functions (ACFs)

ACF of a moving average model

Fitting AR, MA models

ARMA models

Writing formatted output using cat() and sink() functions

Lecture 30 4/16/2010

R script for Lecture 30

Time series models (continued from last lecture)

Differencing for stationarity

Returns for financial series

ARIMA models




Lecture 31 4/19/2010

R script for Lecture 31

Time series models (continued) - AR(1) autocorrelation function

Bootstrapping ARIMA models

Time Series in the Frequency Domain

Discrete Fourier transforms

Lecture 32 4/21/2010

R script for Lecture 32

Discrete Fourier Transforms (continued)

Explaining the Fast Fourier Transform (FFT)

What do periodicities look like?

What happens when we transform white noise?

Linearity and superposition

Plotting a complex series

Why we look at real & imaginary parts

The periodogram

Lecture 33 4/23/2010

R script for Lecture 33

More properties of discrete Fourier transforms

Spectrum of an AR(1)

Smoothing to estimate a continuous spectrum

Multivariate autogregression models using the mAr library

Sampling using mAr.sim()

Estimation using mAr.est()

Lecture 34 4/26/2010

R script for Lecture 34 (part 1 - mAR analysis of dow+treasuries)

R script for Lecture 34 (part 2 - cluster analysis)

Cluster analysis with hclust()

Lecture 35 4/28/2010

R script

Introduction to spatial statistics

John Snow & the 1854 cholera outbreak

Snow's spot map

Baddely's analyzing point patterns in R (200 pages)

Homogeneous Poisson point processes (complete spatial randomness)

Inhomogeneous Poisson point processes

Lecture 36 4/30/2010

R script


Lecture 37 5/3/2010

R script

Baltimore homicide data

Creating a ppp object with a polygonal boundary

Image objects

Density plots

Perspective plots

Contour plots

Fitting a homogeneous Poisson point process

Fitting an inhomogeneous Poisson point process

Lecture 38 5/5/2010

R script

Operations on windows





Dirichlet/Voronoi tesselaion

Delauney tesselation

An analog of quadrat sampling with a covariate

Fitting a ppp with covariates

Inference for parameters via vcov

Generating realizations from the fitted model

Matern type I and type II processes

Lecture 39 5/7/2010






Future topics