## Multilevel and Longitudinal Modeling Using Stata, Second EditionSophia Rabe-Hesketh and Anders SkrondalCopyright 2008 ISBN-13: 978-1-59718-040-5 Pages: 562; paperback Price $59.00 | |

New edition now available | |

See a larger photo of the front cover See the back cover Table of contents Preface (pdf) Author index (pdf) Subject index (pdf) Errata Download the datasets used in this book Obtain answers to the exercises Read reviews of the first edition Review of the second edition from the Stata Journal Read reviews of the second edition |

*Multilevel and Longitudinal Modeling Using Stata, Second Edition*, by
Sophia Rabe-Hesketh and Anders Skrondal, looks specifically at Stata’s
treatment of generalized linear mixed models, also known as multilevel or
hierarchical models. These models are “mixed” because they allow
fixed and random effects, and they are “generalized” because
they are appropriate for continuous Gaussian responses as well as binary,
count, and other types of limited dependent variables.

The second edition has much to offer for readers of the first edition,
reading more like a sequel than an update. The text has almost doubled in
length from the original, coming in at 562 pages. This second edition
incorporates three new chapters: a chapter on standard linear regression, a
chapter on discrete-time survival analysis, and a chapter on longitudinal
and panel data containing an
expanded discussion of random-coefficient and growth-curve models. The
authors have updated this edition for Stata 10, expanding on discussions in
the original edition and adding new in-text examples and end-of-chapter
exercises. In particular, the authors have thoroughly covered the new Stata
commands **xtmelogit** and **xtmepoisson**.

The first chapter provides a review of the methods of linear regression. Rabe-Hesketh and Skrondal then begin with the comparatively simple random-intercept linear model without covariates, developing the mixed model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective.

Once the authors have established the mixed-model foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random-effects). The authors then discuss models with random coefficients, followed by models for growth curves. The middle chapters of the book apply the concepts for Gaussian models to models for binary responses (e.g., logit and probit), ordinal responses (e.g., ordered logit and ordered probit), and count responses (e.g., Poisson).

The text continues with a discussion of how to use multilevel methods in discrete-time survival analysis, for example, using complimentary log-log regression to fit the proportional hazards model. The authors then consider models with multiple levels of random variation and models with crossed (nonnested) random effects. In its examples and end-of-chapter exercises, the book contains real datasets and data from the medical, social, and behavioral sciences literature.

The book has several applications of generalized mixed models performed in
Stata. Rabe-Hesketh and Skrondal developed **gllamm**, a Stata program
that can fit many latent-variable models, of which the generalized linear
mixed model is a special case. As of version 10, Stata contains the
**xtmixed**, **xtmelogit**, and **xtmepoisson** commands for
fitting multilevel models, in addition to other **xt** commands for
fitting standard random-intercept models. The type of models fit by these
commands sometimes overlap; when this happens, the authors highlight the
differences in syntax, data organization, and output for the two (or more)
commands that can be used to fit the same model. The authors also point out
the relative strengths and weaknesses of each command when used to fit the
same model, based on considerations such as computational speed, accuracy,
available predictions, and available postestimation statistics.

In reference to the first edition, a reviewer for *American
Statistician* commends Rabe-Hesketh and Skrondal for promoting the
appropriate use of multilevel and longitudinal modeling. The reviewer writes
in the August 2006 issue, “All too often computer manuals leave off ...
important aspects of an analysis, but the authors have been careful to
provide a well-rounded and complete approach to model fitting and
interpretation.”

In summary, this book is the most complete, up-to-date depiction of Stata's capacity for fitting generalized linear mixed models. The authors provide an ideal introduction for Stata users wishing to learn about this powerful data-analysis tool.

List of Tables

List of Figures

Preface
(PDF)

I Preliminaries

1 Review of linear regression

1.1 Introduction

1.2 Is there gender discrimination in faculty salaries?

1.3 Independent-samples t test

1.4 One-way analysis of variance

1.5 Simple linear regression

1.6 Dummy variables

1.7 Multiple linear regression

1.8 Interactions

1.9 Dummies for more than two groups

1.10 Other types of interactions

1.12 Residual diagnostics

1.13 Summary and further reading

1.14 Exercises

1.2 Is there gender discrimination in faculty salaries?

1.3 Independent-samples t test

1.4 One-way analysis of variance

1.5 Simple linear regression

1.6 Dummy variables

1.7 Multiple linear regression

1.8 Interactions

1.9 Dummies for more than two groups

1.10 Other types of interactions

1.10.1 Interaction between dummy variables

1.10.2 Interaction between continuous covariates

1.11 Nonlinear effects 1.10.2 Interaction between continuous covariates

1.12 Residual diagnostics

1.13 Summary and further reading

1.14 Exercises

II Two-level linear models

2 Variance-components models

2.1 Introduction

2.2 How reliable are peak-expiratory-flow measurements

2.3 The variance-components model

2.5 Estimation using Stata

2.9 Assigning values to the random intercepts

2.11 Exercises

2.2 How reliable are peak-expiratory-flow measurements

2.3 The variance-components model

2.3.1 Model specification and path diagram

2.3.2 Error components, variance components, and reliability

2.3.3 Intraclass correlation

2.4 Fixed versus random effects 2.3.2 Error components, variance components, and reliability

2.3.3 Intraclass correlation

2.5 Estimation using Stata

2.5.1 Data preparation

2.5.2 Using xtreg

2.5.3 Using xtmixed

2.5.4 Using gllamm

2.6 Hypothesis tests and confidence intervals
2.5.2 Using xtreg

2.5.3 Using xtmixed

2.5.4 Using gllamm

2.6.1 Hypothesis test and confidence interval for the population mean

2.6.2 Hypothesis test and confidence interval for the between-cluster variance

2.7 More on statistical inference
2.6.2 Hypothesis test and confidence interval for the between-cluster variance

2.7.1 Different estimation models

2.7.2 Inference for Β

2.8 Crossed versus nested effects 2.7.2 Inference for Β

Estimate and standard error: Balanced case

Estimate: Unbalanced case

Estimate: Unbalanced case

2.9 Assigning values to the random intercepts

2.9.1 Maximum likelihood estimation

2.9.3 Empirical Bayes variances

2.10 Summary and further reading
Implementation via OLS regression

Implementation via the mean total residual

2.9.2 Empirical Bayes prediction Implementation via the mean total residual

2.9.3 Empirical Bayes variances

2.11 Exercises

3 Random-intercept models with covariates

3.1 Introduction

3.2 Does smoking during pregnancy affect birthweight?

3.3 The linear random-intercept model with covariates

3.6 Hypothesis tests and confidence intervals

3.9 Residual diagnostics

3.10 More on statistical inference for regression coefficients

3.12 Exercises

3.2 Does smoking during pregnancy affect birthweight?

3.3 The linear random-intercept model with covariates

3.3.1 Model specification

3.3.2 Residual variance and intraclass correlation

3.4 Estimation using Stata
3.3.2 Residual variance and intraclass correlation

3.4.1 Using xtreg

3.4.2 Using xtmixed

3.4.3 Using gllamm

3.5 Coefficients of determination or variance explained 3.4.2 Using xtmixed

3.4.3 Using gllamm

3.6 Hypothesis tests and confidence intervals

3.6.1 Hypothesis tests for regression coefficients

3.6.3 Hypothesis test for between-cluster variance

3.7 Between and within effects
Hypothesis tests for individual regression coefficients

Joint hypothesis tests for several regression coefficients

3.6.2 Predicted means and confidence intervals Joint hypothesis tests for several regression coefficients

3.6.3 Hypothesis test for between-cluster variance

3.7.1 Between-mother effects

3.7.2 Within-mother effects

3.7.3 Relations among estimators

3.7.4 Endogeneity and different within- and between-mother effects

3.7.5 Hausman endogeneity test

3.8 Fixed versus random effects revisited 3.7.2 Within-mother effects

3.7.3 Relations among estimators

3.7.4 Endogeneity and different within- and between-mother effects

3.7.5 Hausman endogeneity test

3.9 Residual diagnostics

3.10 More on statistical inference for regression coefficients

3.10.1 Consequences of using ordinary regression for clustered data

3.10.2 Power and sample-size determination

3.11 Summary and further reading 3.10.2 Power and sample-size determination

3.12 Exercises

4 Random-coefficient models

4.1 Introduction

4.2 How effective are different schools

4.3 Separate linear regressions for each school

4.4 Specification and interpretation of a random-coefficient model

4.7 Interpretation of estimates

4.8 Assigning values to the random intercepts and slopes

4.10 Some warnings about random-coefficient models

4.11 Summary and further reading

4.12 Exercises

4.2 How effective are different schools

4.3 Separate linear regressions for each school

4.4 Specification and interpretation of a random-coefficient model

4.4.1 Specification of random-coefficient model

4.4.2 Interpretation of the random-effects variances and covariances

4.5 Estimation using Stata
4.4.2 Interpretation of the random-effects variances and covariances

4.5.1 Using xtmixed

4.6 Testing the slope variance
Random-intercept model

Random-coefficient model

4.5.2 Using gllamm
Random-coefficient model

Random-intercept model

Random-coefficient model

Random-coefficient model

4.7 Interpretation of estimates

4.8 Assigning values to the random intercepts and slopes

4.8.1 Maximum likelihood estimation

4.8.2 Empirical Bayes prediction

4.8.3 Model visualization

4.8.4 Residual diagnostics

4.8.5 Inferences for individual schools

4.9 Two-stage model formulation 4.8.2 Empirical Bayes prediction

4.8.3 Model visualization

4.8.4 Residual diagnostics

4.8.5 Inferences for individual schools

4.10 Some warnings about random-coefficient models

4.11 Summary and further reading

4.12 Exercises

5 Longitudinal, panel, and growth-curve models

5.1 Introduction

5.2 How and why do wages change over time?

5.3 Data structure

5.5 Random- and fixed-effects approaches

5.8 Hybrid approaches

5.13 Prediction of mean growth trajectory and 95% band

5.14 Complex level-1 variation or heteroskedasticity

5.15 Summary and further reading

5.16 Exercises

5.2 How and why do wages change over time?

5.3 Data structure

5.3.1 Missing data

5.3.2 Time-varying and time-constant variables

5.4 Time scales in longitudinal data 5.3.2 Time-varying and time-constant variables

5.5 Random- and fixed-effects approaches

5.5.1 Correlated residuals

5.5.2 Fixed-intercept model

5.5.4 Random-coefficient model

5.5.5 Marginal mean and covariance structure induced by random effects

5.6 Marginal modeling
5.5.2 Fixed-intercept model

Using xtreg

Using anova

5.5.3 Random-intercept model Using anova

5.5.4 Random-coefficient model

5.5.5 Marginal mean and covariance structure induced by random effects

Marginal mean and covariance structure for random-intercept models

Marginal mean and covariance structure for random-coefficient models

Marginal mean and covariance structure for random-coefficient models

5.6.1 Covariance structures

5.7 Autoregressive- or lagged-response models
Compound symmetric or exchangeable structure

Random-coefficient structure

Autoregressive residual structure

Unstructured covariance matrix

5.6.2 Marginal modeling using Stata
Random-coefficient structure

Autoregressive residual structure

Unstructured covariance matrix

5.8 Hybrid approaches

5.8.1 Autoregressive response and random effects

5.8.2 Autoregressive responses and autoregressive residuals

5.8.3 Autoregressive residuals and random or fixed effects

5.9 Missing data
5.8.2 Autoregressive responses and autoregressive residuals

5.8.3 Autoregressive residuals and random or fixed effects

5.9.1 Maximum likelihood estimation under MAR: A simulation

5.10 How do children grow?
5.10.1 Observed growth trajectories

5.11 Growth-curve modeling
5.11.1 Random-intercept model

5.11.2 Random-coefficient model

5.11.3 Two-stage model formulation

5.12 Prediction of trajectories for individual children 5.11.2 Random-coefficient model

5.11.3 Two-stage model formulation

5.13 Prediction of mean growth trajectory and 95% band

5.14 Complex level-1 variation or heteroskedasticity

5.15 Summary and further reading

5.16 Exercises

III Two-level generalized linear models

6 Dichotomous or binary responses

6.1 Introduction

6.2 Single-level models for dichotomous responses

6.4 Longitudinal data structure

6.5 Population-averaged or marginal probabilities

6.6 Random-intercept logistic regression

6.7 Estimation of logistic random-intercept models

6.9 Subject-specific vs. population-averaged relationships

6.10 Measures of dependence and heterogeneity

6.16 Exercises

6.2 Single-level models for dichotomous responses

6.2.1 Generalized linear model formulation

6.2.2 Latent-response formulation

6.3 Which treatment is best for toenail infection? 6.2.2 Latent-response formulation

Logistic regression

Probit regression

Probit regression

6.4 Longitudinal data structure

6.5 Population-averaged or marginal probabilities

6.6 Random-intercept logistic regression

6.7 Estimation of logistic random-intercept models

6.7.1 Using xtlogit

6.7.2 Using xtmelogit

6.7.3 Using gllamm

6.8 Inference for logistic random-intercept models 6.7.2 Using xtmelogit

6.7.3 Using gllamm

6.9 Subject-specific vs. population-averaged relationships

6.10 Measures of dependence and heterogeneity

6.10.1 Conditional or residual intraclass correlation of the latent responses

6.10.2 Median odds ratio

6.11 Maximum likelihood estimation
6.10.2 Median odds ratio

6.11.1 Adaptive quadrature

6.11.2 Some speed considerations

6.12 Assigning values to random effects
6.11.2 Some speed considerations

Advice for speeding up gllamm

6.12.1 Maximum likelihood estimation

6.12.2 Empirical Bayes prediction

6.12.3 Empirical Bayes modal prediction

6.13 Different kinds of predicted probabilities
6.12.2 Empirical Bayes prediction

6.12.3 Empirical Bayes modal prediction

6.13.1 Predicted population-averaged probabilities

6.13.2 Predicted subject-specific probabilities

6.14 Other approaches to clustered dichotomous data
6.13.2 Predicted subject-specific probabilities

Predictions for hypothetical subjects

Predictions for the subjects in the sample

Predictions for the subjects in the sample

6.14.1 Conditional logistic regression

6.14.2 Generalized estimating equations (GEE)

6.15 Summary and further reading 6.14.2 Generalized estimating equations (GEE)

6.16 Exercises

7 Ordinal responses

7.1 Introduction

7.2 Single-level cumulative models for ordinal responses

7.4 Longitudinal data structure and graphs

7.10 A random-intercept probit model with grader bias

7.14 Exercises

7.2 Single-level cumulative models for ordinal responses

7.2.1 Generalized linear model formulation

7.2.2 Latent-response formulation

7.2.3 Proportional odds

7.2.4 Identification

7.3 Are antipsychotic drugs effective for patients with schizophrenia? 7.2.2 Latent-response formulation

7.2.3 Proportional odds

7.2.4 Identification

7.4 Longitudinal data structure and graphs

7.4.1 Longitudinal data structure

7.4.2 Plotting cumulative proportions

7.4.3 Plotting estimated cumulative logits and transforming the time scale

7.5 A single-level proportional odds model
7.4.2 Plotting cumulative proportions

7.4.3 Plotting estimated cumulative logits and transforming the time scale

7.5.1 Model specification

7.5.2 Estimation using Stata

7.6 A random-intercept proportional odds model
7.5.2 Estimation using Stata

7.6.1 Model specification

7.6.2 Estimation using Stata

7.7 A random-intercept proportional odds model 7.6.2 Estimation using Stata

7.7.1 Model specification

7.7.2 Estimation using gllamm

7.8 Different kinds of predicted probabilities
7.7.2 Estimation using gllamm

7.8.1 Predicted population-averaged probabilities

7.8.2 Predicted patient-specific probabilities

7.9 Do experts differ in the grading of student essays? 7.8.2 Predicted patient-specific probabilities

7.10 A random-intercept probit model with grader bias

7.10.1 Model specification

7.10.2 Estimation

7.11 Including grader-specific measurement error variances
7.10.2 Estimation

7.11.1 Model specification

7.11.2 Estimation

7.12 Including grader-specific thresholds
7.11.2 Estimation

7.12.1 Model specification

7.12.2 Estimation

7.13 Summary and further reading 7.12.2 Estimation

7.14 Exercises

8 Discrete-time survival

8.1 Introduction

8.4 Data expansion

8.5 Proportional hazards and interval censoring

8.6 Complementary log-log models

8.7 A random-intercept complementary log-log model

8.9 Summary and further reading

8.10 Exercises

8.1.1 Censoring and truncation

8.1.2 Time-varying covariates and different time-scales

8.1.3 Discrete- versus continuous-time survival data

8.2 Single-level models for discrete-time survival data
8.1.2 Time-varying covariates and different time-scales

8.1.3 Discrete- versus continuous-time survival data

8.2.1 Discrete-time hazard and discrete-time survival

8.2.2 Data expansion for discrete-time survival analysis

8.2.3 Estimation via regression models for dichotomous responses

8.2.4 Including covariates

8.3 How does birth history affect child mortality? 8.2.2 Data expansion for discrete-time survival analysis

8.2.3 Estimation via regression models for dichotomous responses

8.2.4 Including covariates

Time-constant covariates

Time-varying covariates

8.2.5 Handling left-truncated data
Time-varying covariates

8.4 Data expansion

8.5 Proportional hazards and interval censoring

8.6 Complementary log-log models

8.7 A random-intercept complementary log-log model

8.7.1 Model specification

8.7.2 Estimation using Stata

8.8 Marginal and conditional survival probabilities 8.7.2 Estimation using Stata

8.9 Summary and further reading

8.10 Exercises

9 Counts

9.1 Introduction

9.2 What are counts?

9.4 Did the German health-care reform reduce the number of doctor visits?

9.5 Longitudinal data structure

9.6 Single-level Poisson regression

9.11 Other approaches to two-level count data

9.14 Standardized mortality ratios

9.15 Random-intercept Poisson regression

9.18 Exercises

9.2 What are counts?

9.2.1 Counts versus proportions

9.2.2 Counts as aggregated event-history data

9.3 Single-level Poisson models for counts 9.2.2 Counts as aggregated event-history data

9.4 Did the German health-care reform reduce the number of doctor visits?

9.5 Longitudinal data structure

9.6 Single-level Poisson regression

9.6.1 Model specification

9.6.2 Estimation using Stata

9.7 Random-intercept Poisson regression
9.6.2 Estimation using Stata

9.7.1 Model specification

9.7.2 Estimation using Stata

9.8 Random-coefficient Poisson regression
9.7.2 Estimation using Stata

Using xtpoisson

Using xtmepoisson

Using gllamm

Using xtmepoisson

Using gllamm

9.8.1 Model specification

9.8.2 Estimation using Stata

9.9 Overdispersion in single-level models
9.8.2 Estimation using Stata

Using xtmepoisson

Using gllamm

9.8.3 Interpretation of estimates
Using gllamm

9.9.1 Normally distributed random intercept

9.9.2 Negative binomial models

9.10 Level-1 overdispersion in two-level models 9.9.2 Negative binomial models

Mean dispersion or NB2

Constant dispersion or NB1

9.9.3 Quasilikelihood or robust standard errors
Constant dispersion or NB1

9.11 Other approaches to two-level count data

9.11.1 Conditional Poisson regression

9.11.2 Conditional negative binomial regression

9.11.3 Generalized estimating equations

9.11.4 Marginal and conditional estimates when responses are MAR

9.12 How does birth history affect child mortality?
9.11.2 Conditional negative binomial regression

9.11.3 Generalized estimating equations

9.11.4 Marginal and conditional estimates when responses are MAR

Simulation

9.12.1 Simple piecewise exponential survival model

9.12.2 Piecewise exponential survival model with covariates and frailty

9.13 Which Scottish counties have a high risk of lip cancer? 9.12.2 Piecewise exponential survival model with covariates and frailty

9.14 Standardized mortality ratios

9.15 Random-intercept Poisson regression

9.15.1 Model specification

9.15.2 Estimation using gllamm

9.15.3 Prediction of standardized mortality ratios

9.16 Nonparametric maximum likelihood estimation
9.15.2 Estimation using gllamm

9.15.3 Prediction of standardized mortality ratios

9.16.1 Specification

9.16.2 Estimation using gllamm

9.16.3 Prediction

9.17 Summary and further reading 9.16.2 Estimation using gllamm

9.16.3 Prediction

9.18 Exercises

IV Models with nested and crossed random effects

10 Higher-level models with nested random effects

10.1 Introduction

10.2 Do peak-expiratory-flow measurements vary between methods?

10.3 Two-level variance-components models

10.6 A three-level logistic random-intercept model

10.9 Estimation of three-level logistic random-coefficient models using Stata

10.13 Exercises

10.2 Do peak-expiratory-flow measurements vary between methods?

10.3 Two-level variance-components models

10.3.1 Model specification

10.3.2 Estimation using xtmixed

10.4 Three-level variance-components models 10.3.2 Estimation using xtmixed

10.4.1 Model specification

10.4.2 Different types of intraclass correlation

10.4.3 Three-stage formulation

10.4.4 Estimating using xtmixed

10.4.5 Empirical Bayes prediction using xtmixed

10.5 Did the Guatemalan immunization campaign work? 10.4.2 Different types of intraclass correlation

10.4.3 Three-stage formulation

10.4.4 Estimating using xtmixed

10.4.5 Empirical Bayes prediction using xtmixed

10.6 A three-level logistic random-intercept model

10.6.1 Model specification

10.6.2 Different types of intraclass correlations for the latent responses

10.6.3 Different kinds of median odds ratios

10.6.4 Three-stage formulation

10.7 Estimation of three-level logistic random-intercept models using Stata
10.6.2 Different types of intraclass correlations for the latent responses

10.6.3 Different kinds of median odds ratios

10.6.4 Three-stage formulation

10.7.1 Using gllamm

10.7.2 Using xtmelogit

10.8 A three-level logistic random-coefficient model 10.7.2 Using xtmelogit

10.9 Estimation of three-level logistic random-coefficient models using Stata

10.9.1 Using gllamm

10.9.2 Using xtmelogit

10.10 Prediction of random effects
10.9.2 Using xtmelogit

10.10.1 Empirical Bayes prediction

10.10.2 Empirical Bayes modal prediction

10.11 Different kinds of predicted probabilities
10.10.2 Empirical Bayes modal prediction

10.11.1 Predicted marginal probabilities

10.11.2 Predicted median or conditional probabilities

10.11.e Predicted posterior mean probabilities

10.12 Summary and further reading 10.11.2 Predicted median or conditional probabilities

10.11.e Predicted posterior mean probabilities

10.13 Exercises

11 Crossed random effects

11.1 Introduction

11.2 How does investment depend on expected profit and capital stock?

11.3 A two-way error-components model

11.5 An additive crossed random-effects model

11.8 Do salamanders from different populations mate successfully?

11.9 Crossed random-effects logistic regression

11.10 Summary and further reading

11.11 Exercises

11.2 How does investment depend on expected profit and capital stock?

11.3 A two-way error-components model

11.3.1 Models specification

11.3.2 Residual intraclass correlations

11.3.3 Estimation

11.3.4 Prediction

11.4 How much do primary and secondary schools affect attainment at age 16? 11.3.2 Residual intraclass correlations

11.3.3 Estimation

11.3.4 Prediction

11.5 An additive crossed random-effects model

11.5.1 Specification

11.5.2 Estimation using xtmixed

11.6 Including a random interaction
11.5.2 Estimation using xtmixed

11.6.1 Model specification

11.6.2 Intraclass correlations

11.6.3 Estimation using xtmixed

11.6.4 Some diagnostics

11.7 A trick requiring fewer random effects 11.6.2 Intraclass correlations

11.6.3 Estimation using xtmixed

11.6.4 Some diagnostics

11.8 Do salamanders from different populations mate successfully?

11.9 Crossed random-effects logistic regression

11.10 Summary and further reading

11.11 Exercises

A Syntax for gllamm, eq, and gllapred: The bare essentials

B Syntax for gllamm

C Syntax for gllapred

D Syntax for gllasim

References

Author index
(PDF)

Subject index
(PDF)