**List of figures**

**List of tables**

**List of boxed tips**

**Preface**

**Support materials for the book**

**Glossary of acronyms**

**Glossary of mathematical and statistical symbols**

**1 Getting started**

1.1 Conventions

1.2 Introduction

1.3 The Stata screen

1.4 Using an existing dataset

1.5 An example of a short Stata session

1.6 Video aids to learning Stata

1.7 Summary

1.8 Exercises

**2 Entering data**

2.1 Creating a dataset

2.2 An example questionnaire

2.3 Developing a coding system

2.4 Entering data using the Data Editor

2.4.1 Value labels

2.5 The Variables Manager

2.6 The Data Editor (Browse) view

2.7 Saving your dataset

2.8 Checking the data

2.9 Summary

2.10 Exercises

**3 Preparing data for analysis**

3.1 Introduction

3.2 Planning your work

3.3 Creating value labels

3.4 Reverse-code variables

3.5 Creating and modifying variables

3.6 Creating scales

3.7 Saving some of your data

3.8 Summary

3.9 Exercises

**4 Working with commands, do-files, and results**

4.1 Introduction

4.2 How Stata commands are constructed

4.3 Creating a do-file

4.4 Copying your results to a word processor

4.5 Logging your command file

4.6 Summary

4.7 Exercises

**5 Descriptive statistics and graphs for one variable**

5.1 Descriptive statistics and graphs

5.2 Where is the center of a distribution?

5.3 How dispersed is the distribution?

5.4 Statistics and graphs—unordered categories

5.5 Statistics and graphs—ordered categories and variables

5.6 Statistics and graphs—quantitative variables

5.7 Summary

5.8 Exercises

**6 Statistics and graphs for two categorical variables**

6.1 Relationship between categorical variables

6.2 Cross-tabulation

6.3 Chi-squared test

6.3.1 Degrees of freedom

6.3.2 Probability tables

6.4 Percentages and measures of association

6.5 Odds ratios when dependent variable has two categories

6.6 Ordered categorical variables

6.7 Interactive tables

6.8 Tables—linking categorical and quantitative variables

6.9 Power analysis when using a chi-squared test of significance

6.10 Summary

6.11 Exercises

**7 Tests for one or two means**

7.1 Introduction to tests for one or two means

7.2 Randomization

7.3 Random sampling

7.4 Hypotheses

7.5 One-sample test of a proportion

7.6 Two-sample test of a proportion

7.7 One-sample test of means

7.8 Two-sample test of group means

7.8.1 Testing for unequal variances

7.9 Repeated-measures t test

7.10 Power analysis

7.11 Nonparametric alternatives

7.11.1 Mann–Whitney two-sample rank-sum test

7.11.2 Nonparametric alternative: Median test

7.12 Video tutorial related to this chapter

7.13 Summary

7.14 Exercises

**8 Bivariate correlation and regression**

8.1 Introduction to bivariate correlation and regression

8.2 Scattergrams

8.3 Plotting the regression line

8.4 An alternative to producing a scattergram, binscatter

8.5 Correlation

8.6 Regression

8.7 Spearman’s rho: Rank-order correlation for ordinal data

8.8 Power analysis with correlation

8.9 Summary

8.10 Exercises

**9 Analysis of variance**

9.1 The logic of one-way analysis of variance

9.2 ANOVA example

9.3 ANOVA example with nonexperimental data

9.4 Power analysis for one-way ANOVA

9.5 A nonparametric alternative to ANOVA

9.6 Analysis of covariance

9.7 Two-way ANOVA

9.8 Repeated-measures design

9.9 Intraclass correlation—measuring agreement

9.10 Power analysis with ANOVA

9.10.1 Power analysis for one-way ANOVA

9.10.2 Power analysis for two-way ANOVA

9.10.3 Power analysis for repeated-measures ANOVA

9.10.4 Summary of power analysis for ANOVA

9.11 Summary

9.12 Exercises

**10 Multiple regression**

10.1 Introduction to multiple regression

10.2 What is multiple regression?

10.3 The basic multiple regression command

10.4 Increment in R-squared: Semipartial correlations

10.5 Is the dependent variable normally distributed?

10.6 Are the residuals normally distributed?

10.7 Regression diagnostic statistics

10.7.1 Outliers and influential cases

10.7.2 Influential observations: DFbeta

10.7.3 Combinations of variables may cause problems

10.8 Weighted data

10.9 Categorical predictors and hierarchical regression

10.10 A shortcut for working with a categorical variable

10.11 Fundamentals of interaction

10.12 Nonlinear relations

10.12.1 Fitting a quadratic model

10.12.2 Centering when using a quadratic term

10.12.3 Do we need to add a quadratic component?

10.13 Power analysis in multiple regression

10.14 Summary

10.15 Exercises

**11 Logistic regression**

11.1 Introduction to logistic regression

11.2 An example

11.3 What is an odds ratio and a logit?

11.3.1 The odds ratio

11.3.2 The logit transformation

11.4 Data used in the rest of the chapter

11.5 Logistic regression

11.6 Hypothesis testing

11.6.1 Testing individual coefficients

11.6.2 Testing sets of coefficients

11.7 Margins: More on interpreting results from logistic regression

11.8 Nested logistic regressions

11.9 Power analysis when doing logistic regression

11.10 Next steps for using logistic regression and its extensions

11.11 Summary

11.12 Exercises

**12 Measurement, reliability, and validity**

12.1 Overview of reliability and validity

12.2 Constructing a scale

12.2.1 Generating a mean score for each person

12.3 Reliability

12.3.1 Stability and test–retest reliability

12.3.2 Equivalence

12.3.3 Split-half and alpha reliability—internal consistency

12.3.4 Kuder–Richardson reliability for dichotomous items

12.3.5 Rater agreement—kappa (*κ*)

12.4 Validity

12.4.1 Expert judgment

12.4.2 Criterion-related validity

12.4.3 Construct validity

12.5 Factor analysis

12.6 PCF analysis

12.6.1 Orthogonal rotation: Varimax

12.6.2 Oblique rotation: Promax

12.7 But we wanted one scale, not four scales

12.7.1 Scoring our variable

12.8 Summary

12.9 Exercises

**13 Structural equation and generalized structural equation modeling**

13.1 Linear regression using sem

13.1.1 Using the sem command directly

13.1.2 SEM and working with missing values

13.1.3 Exploring missing values and auxiliary variables

13.1.4 Getting auxiliary variables into your SEM command

13.2 A quick way to draw a regression model

13.3 The gsem command for logistic regression

13.3.1 Fitting the model using the logit command

13.3.2 Fitting the model using the gsem command

13.4 Path analysis and mediation

13.5 Conclusions and what is next for the sem command

13.6 Exercises

**14 Working with missing values—multiple imputation**

14.1 Working with missing values—multiple imputation

14.2 What variables do we include when doing imputations?

14.3 The nature of the problem

14.4 Multiple imputation and its assumptions about the mechanism for missingness

14.5 Multiple imputation

14.6 A detailed example

14.6.1 Preliminary analysis

14.6.2 Setup and multiple-imputation stage

14.6.3 The analysis stage

14.6.4 For those who want an *R*^{2} and standardized *β*s

14.6.5 When impossible values are imputed

14.7 Summary

14.8 Exercises

**15 An introduction to multilevel analysis**

15.1 Questions and data for groups of individuals

15.2 Questions and data for a longitudinal multilevel application

15.3 Fixed-effects regression models

15.4 Random-effects regression models

15.5 An applied example

15.5.1 Research questions

15.5.2 Reshaping data to do multilevel analysis

15.6 A quick visualization of our data

15.7 Random-intercept model

15.7.1 Random intercept—linear model

15.7.2 Random-intercept model—quadratic term

15.7.3 Treating time as a categorical variable

15.8 Random-coefficients model

15.9 Including a time-invariant covariate

15.10 Summary

15.11 Exercises

**16 Item response theory (IRT)**

16.1 How are IRT measures of variables different from summated scales?

16.2 Overview of three IRT models for dichotomous items

16.2.1 The one-parameter logistic (1PL) model

16.2.2 The two-parameter logistic (2PL) model

16.2.3 The three-parameter logistic (3PL) model

16.3 Fitting the 1PL model using Stata

16.3.1 The estimation

16.3.2 How important is each of the items?

16.3.3 An overall evaluation of our scale

16.3.4 Estimating the latent score

16.4 Fitting a 2PL IRT model

16.4.1 Fitting the 2PL model

16.5 The graded response model—IRT for Likert-type items

16.5.1 The data

16.5.2 Fitting our graded response model

16.5.3 Estimating a person’s score

16.6 Reliability of the fitted IRT model

16.7 Using the Stata menu system

16.8 Extensions of IRT

16.9 Exercises

**A What’s next?**

A.1 Introduction to the appendix

A.2 Resources

A.2.1 Web resources

A.2.2 Books about Stata

A.2.3 Short courses

A.2.4 Acquiring data

A.2.5 Learning from the postestimation methods

A.3 Summary

**References**