# MATH 6643 Summer 2012 Applications of Mixed Models

Thought of the Week

“... but what did they want to learn? ...” – Nancy Reid possibly commenting on a presentation of a statistical consultation.

News

• NEW: July 10: Prepare two questions, suitable as exam questions (requiring 10 to 15 minutes to answer) on the material of the course. Try to prepare one on the earlier material and one on the later material. You can post your question in /Submitted sample exam questions. Please post them before July 17 so we can have a look at them in class.
• May 17: We decided that, from now on, instead of preparing an exercise question on the current chapter of Snijders and Bosker before each class, we would each prepare a 'discussion question' to be placed in the same file so it can easily be viewed and discussed. The file is Snijders and Bosker: Discussion Questions.
• July 2: spidadev update for OS X and Linux. For *unix users wanting to install the update, simply download http://blackwell.math.yorku.ca/R/spidadev.tar.gz to your desktop. Then, in X11 or Terminal, go to your desktop directory and run: R CMD INSTALL spidadev.tar.gz

Course Schedule

Tuesday Thursday
May 8: Class 1
10: Class 2
15: Class 3
17: Class 4
22

No class: DLSPH/SORA/TABA workshop

24

No class: SPIDA

29

No class: SPIDA

31

No class: SPIDA

June 5

No class: SSC 2012 meetings in Guelph

7: Class 5
12: Class 6
14: Class 7
19: Class 8
21: Class 9
26: Class 10
28: Class 11
July 3: Class 12
5: Class 13
10: Class 14
12: Class 15
17: Class 16
19: Class 17
24: Class 18
26: Final exam

## Things we'll do

1. Read Snijders and Bosker (2012) Multilevel Analysis, 2nd. ed., Sage
This is one of the best books I've read on the concepts and on the 'art' of mixed models. Almost every sentence is rich with meaning. The authors very rarely skip an important point. I would venture to say that it is the best book written yet on applications of mixed models. The first edition in 1999 held the title for a number of years.
• We will discuss a new chapter and the beginning of each class. Someone will be chosen at random to lead the discussion.
• Since the book doesn't have exercises, you will prepare exercises for each chapter. Before the chapter is discussed, you will prepare 'simple, straightforward' exercises. After the chapter is discussed you will prepare deeper more conceptual and challenging exercises. You can use various sources of multilevel data for the exercises.
• Conveniently, the book has 18 chapters and we have 18 classes. We will begin with chapter 2 in class 2 and continue. You should prepare a simple exercises and post it on your 'student' page before the class. Be prepared to discuss the chapter in class. Each day someone will be chosen at random to lead off the discussion with a five-minute summary of the key points in the chapter. Post a deeper question on the same chapter before the next class.
2. Write a paper. Maybe publish it.
• This will be done in teams of two or, exceptionally, three. If you wish to be 'randomly' matched to others who wish to be randomly matched let me know. The idea is stolen from Gary King (2006). Read Gary King's article to find out how to proceed. You need to find an article with data that lends itself to longitudinal or multilevel analysis. You will find many examples of King's students replication papers including their data and Rcode of the Harvard Dataverse. Choose the article before May 22 and send me an email message. During the 'break' created by SPIDA and the SSC meeting, you should work on replicating the original analysis.
3. Complete a few assignments early in the course intended to cover the 'math' of multilevel models.
4. Write a function or a small package in R that does something useful and that can be added to the 'spidadev' package. Some ideas. Due at class 17.
5. Write a final exam.

## Some Data Sources with Publication Listings

1. The Longitudinal Study of American Youth ([1]).
2. The University of Wisconsin's BADGIR (Better Access to Data for Global Interdisciplinary Research, email registration is easy to gain access. If you decide to use the National Survey of Families and Households data I have a reduced longitudinal dataset created just ask me (Ryan) for it, it might be useful[2]).

It would be so much more fun just to learn for its own sake! But...

• [30%] Snijders and Bosker has 18 chapters. We will discuss and do exercises for 17 of them. That's a total of 17 discussions and 34 exercises. Each discussion and exercise is graded out of 10. I will discard the 11 lowest grades of the 51. That gives you a maximum possible score of 400.
• [30%] Replication, reanalysis and paper.
• [10%] Assignments
• [15%] R function or package.
• [20%] Final exam

## Things to learn

Applying a statistical methodology requires being aware of many things:

• Some basic theory behind the method
• What assumptions the theory implies, how they fail and when they're reasonable although wrong
• How to turn substantive research questions into statistical questions
• How to formulate and explore those questions statistically
• How to avoid too common but execrable errors: e.g. interpreting main effects with interactions, interpreting lists of p-values from regression output, interpreting multiple p-values for dummy variables of a factor
• How to program
• How to produce informative and stimulating graphics
• Numerical savvy: avoiding X'X, using svd and qr instead

We will try to weave these themes throughout the course

## Class 1: May 8

### Assignment 1

Due: May 15 before class

You must work individually but you can use any written source provided you cite it. Hint: you might like to refresh your memory about the Spectral Decomposition Theorem for symmetric matrices.

1. [10] Prove the Sherman-Morrison identity (state appropriate assumptions). Note that U and V need not be square matrices.
(A + UDV) − 1 = A − 1A − 1U(D − 1 + VA − 1U) − 1VA − 1
Hint: It might be easier to first prove a special case for (I + UV) − 1 and then use basic facts about inverses and products, e.g. (AB) − 1 = B − 1A − 1
2. [10] Consider a linear regression model Y = Xβ + ε where X is a $n \times (k+1)$ matrix whose first column consists of 1's, β = (β01,...,βk)' and Var(ε) = σ2I. Let ΣXX be the $k \times k$variance matrix of the predictor variables and let sE be the residual standard error. Find and prove an expression for $Var( (\hat{\beta_1}, ... , \hat{\beta_k} )')$
3. [10] Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'.
4. [10] Show that a symmetic matrix Σ is a variance matrix if and only if there exists a matrix A such that Σ = AA'.
5. [10] Let A and B be square matrices such that AA' = BB' = Σ with Σ positive definite. Show that there exists an orthogonal matrix Γ such that A = BΓ.
6. [50] Retrieve the "Arrests" data set from "library(effects)" in R. You can get information about the variables in the data set in the usual way with
> ?Arrests
> install.packages('effects')
The Toronto Star has published some stories claiming that this data set reveals a pattern of discrimination in police behaviour. You have been hired by the National Post to study the data set and produce an independent opinion. Your opinion may agree, disagree or otherwise qualify the claim that this data shows a conclusive pattern of discrimination. Write a report with suitable graphs and include the details of your analysis with appropriate graphs as an appendix.

## Class 2: May 10

• Slides:
• Notes:
• Notes on
• Linear transformations of random vectors: mean and variance
• Multivariate Normal Distribution
• Marginal and conditional distributions
• Visualizing the bivariate normal
• Screen capture of Class 2 (sorry -- I forgot to turn it back on after the break)

## Class 3: May 15

### Assignment 2

Due June 7 before class

You must write up your own work based on your own understanding but you can do anything you want to develop your understanding.

1. [10] A basis for $\mathbb{R}^p$ that is a conjugate basis with respect to a positive definite matrix M is a sequence of vectors x1,x2,...,xp in $\mathbb{R}^p$ such that x'iMxi = 1 and x'iMxj = 0 if $i \neq j$. Show that the columns of a non-singular matrix A form a conjugate basis with respect to Σ − 1 if Σ = AA'. Note that a conjugate basis is merely an orthogonal basis (actually, an 'orthonormal' basis) with respect to the metric defined by | | x | | 2 = x − 1x.
2. [10] We will call a "square root" of a square matrix M any square matrix A such that M = AA'. Show that a square matrix has a square root if and only if it is a variance matrix.
3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.
4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity -- which doesn't need to be distinguished as 'var' or as 'sd'.
5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command.
6. [10] Consider a $2 \times 2$ variance matrix $\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22}\end{bmatrix}$ for a random vector $\begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}$. Verify that the Cholesky matrix $C = \begin{bmatrix} \sigma_{11}^{1/2} & 0 \\ \sigma_{21}/ \sigma_{11}^{1/2}& \sqrt{\sigma_{22} - \sigma_{12}^2 / \sigma_{11}}\end{bmatrix}$ is a square root of Σ.
Show that the Cholesky matrix can be written as $\begin{bmatrix} \sigma_1 & 0 \\ \beta_{21} \sigma_1 & \sigma_{2 \cdot 1}\end{bmatrix}$ where β21 is the regression coefficient of Y2 on Y1.
Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of C relative to the ellipse.
7. [10] Show that a non-singular $2 \times 2$ variance matrix, Σ can be factored so that Σ = AA' with A an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6.
8. [20] Generate 100 observations for three variables Y, X and Z so that in the regression of Y on both X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant p-values and concluding that nothing is happening if none of the p-value achieve significance?
9. [20] Generate 100 observations for three variables Y, X and Z so that in the separate simple regressions of Y on each of X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of Y on both X and Z is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict Y using both X and Z?

## Class 4: May 17

### Class Notes

• Class notes:
• Learning statistics: The How? vs The When?
• What does adding $\bar{X}_j$ do to your model and why. [Ask me why the added-variable plot explains everything -- almost].
• Compositional effect (aka between effect) = Contextual effect + Within effect
• The 'empty' random intercept model
• Generating data
• Estimating β0j with a BLUE
• Three ways of estimating γ00
• Overall mean
• Mean of means
• Inverse variance weighted mean of means (= mixed model approach)
• Back to estimating β0j -- with a BLUP (= empirical Bayes estimator = shrinkage estimator)
• Exchangeability: when to shrink and when not to shrink

## Class 7: June 14

### Assignment 3

Due: June 21 The purpose of this assignment is to give you a chance to try out what we're learning by working on a dataset and to have the sobering experience of split-half validation. Using the 'hsfull' data set in 'spida', choose a random sample of half the schools. To make everything replicable, first choose a random seed at random (an odd integer between 1 and 2^32-1), record the number and use 'set.seed' with the number just before selecting the random sample. Using the half sample, explore the role ses, Sex, Minority and Sector in their relationship with math achievement. Discuss your strategy as you go along. At some point(s) in the analysis, illustrate that you can

1. Correctly test the significance of interactions and of effects with multiple degrees of freedom.
2. Avoid incorrect interpretations of p-values in regression output (you should do this at every point)
3. Test hypotheses involving the random effects structure of the model
4. Choose between various random effects structures
5. Carry out a test involving setting up a linear hypothesis matrix with more than one row
6. Estimate and plot conditional effects of a factor that interacts with ses. You should first plot an 'effect' plot showing the estimated response for each level as a function of ses and, next, graph the estimate gap with standard error bands.
7. Run one of your final models on the other half of the data. Comment on differences. Create the 'gap plot' with the other half of the data and comment on the differences.

### Effect sizes in mixed models

• An interesting message on an R mailing list: "... in psychology, reviewers will bludgeon you for an effect size..." but with some cause:
Quoting from Roberts (2006), the APA Publication Manual (2001):
The general principle to be followed, however, is to provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship. (p. 26)
Effect sizes. Always present effect sizes for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes in a practical and theoretical context.

## Class 8: June 19

### Splitting a Dataframe by Cluster

Phil and I wrote a function that partitions a dataframe into a training and a test set. The function (and instructions) are found here [8]. --Msigal 10:39, 27 June 2012 (EDT).

### Heteroskedasticity in its many forms

Here's the start of a wiki page and script discussing heteroskedasticity in its various forms.

## Class 10: June 26

### Short assignment 4

Don't spend more than 90 minutes on this if you attended class on June 26. Assuming that you've read Chapter 10, if you missed class on June 26 then you might have to spend 90 + 120 = 210 minutes! If you haven't read Chapter 10 then ...

Go though Lab 2 on the SPIDA page. Compare the diagnostics in Lab 2 with those in Snijders & Bosker. Identify diagnostics suggested in Snijders and Bosker that are missing in Lab 2.

Send your 'assignment' to georges@yorku.ca. Include '6643 Assignment 4' in the subject line.

Longer voluntary assignment: Program some missing diagnostics.

## Class 17: July 19

### Generating G and R matrices in various packages

Function Notes
lme

in package nlme

Linear mixed effects: normal response

G side and R side modelling
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower


or

 Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )

lmer

in package lme4

Linear mixed models for gaussian response with Laplace approximation

G side modeling only, R = σ2I
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmer

in package lme4

• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side only, no R side
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmmPQL

in packages MASS/nlme

Generalized linear mixed models with Penalized Quasi Likelihood
• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side and R side as in lme
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower

MCMCglmm

in package MCMCglmm

Generalized linear mixed models with MCMC
• family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,

cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.
Model syntax:

 Y ~ X * W, random = ~ us(1 + X):id  [Note: id should be a factor, us=unstructured]


For nested effect:

 Y ~ X * W, random =  ~us(1 + X):higher + us(1 + X):higher:lower


For crossed effect:

 Y ~ X * W, random =  ~us(1 + X):id1+ us(1 + X):id2