## An Introduction to Survival Analysis Using Stata, Second EditionMario Cleves, William W. Gould, Roberto G. Gutierrez, and Yulia MarchenkoCopyright 2008 ISBN-13: 978-1-59718-041-2 Pages: 372; paperback Price $56.00 New edition | |

See a larger photo of the front cover See the back cover Table of contents Preface to the Second Edition (pdf) Preface to the Revised Edition (pdf) Preface to the First Edition (pdf) Chapter 1—The problem of survival analysis (pdf) Author index (pdf) Subject index (pdf) Download the datasets used in this book |

*An Introduction to Survival Analysis Using Stata, Second Edition* is
the ideal tutorial for professional data analysts who want to learn survival
analysis for the first time or who are well versed in survival analysis but
not as dexterous in using Stata to analyze survival data. This text also
serves as a valuable reference to those who already have experience using
Stata’s survival analysis routines.

The second edition has been updated for Stata 10, containing a new chapter
on power and sample-size calculations for survival studies and sections that
describe how to fit regression models (**stcox** and **streg**) in the
presence of complex survey data. Other enhancements include discussions
about nonparametric estimation of mean/median survival, survival graphs with
embedded at-risk tables, better hazard graphs through the use of boundary
kernels, and concordance measures for assessing the predictive accuracy of
the Cox model, as well as an expanded discussion of model building
strategies including the use of fractional polynomials.

This book provides statistical theory, step-by-step procedures for analyzing
survival data, an in-depth usage guide for Stata’s most widely used
**st** commands, and a collection of tips for using Stata to analyze
survival data and present the results. This book develops from first
principles the statistical concepts unique to survival data and assumes only
a knowledge of basic probability and statistics and a working knowledge of
Stata.

The first three chapters of the text cover basic theoretical concepts:
hazard functions, cumulative hazard functions, and their interpretations;
survivor functions; hazard models; and a comparison of nonparametric,
semiparametric, and parametric methodologies. Chapter 4 deals with censoring
and truncation. The next three chapters cover the formatting, manipulation,
**stset**ting, and error checking involved in preparing survival data for
analysis using Stata’s **st** analysis commands. Chapter 8 covers
nonparametric methods, including the Kaplan–Meier and
Nelson–Aalen estimators, and the various nonparametric tests for the
equality of survival experience.

Chapters 9–11 discuss Cox regression and include various examples of
fitting a Cox model, obtaining predictions, interpreting results, building
models, and model diagnostics. The next four chapters cover parametric
models, which are fit using Stata’s **streg** command. These
chapters include detailed derivations of all six parametric models currently
supported in Stata and methods for determining which model is appropriate,
as well as information on obtaining predictions, stratification, and
advanced topics such as frailty models. The final chapter is devoted to
power and sample-size calculations for survival studies.

List of Tables

List of Figures

Notation and Typography

1.1 Parametric modeling

1.2 Semiparametric modeling

1.3 Nonparametric analysis

1.4 Linking the three approaches

1.2 Semiparametric modeling

1.3 Nonparametric analysis

1.4 Linking the three approaches

2 Describing the distribution of failure times

2.1 The survivor and hazard functions

2.2 The quantile function

2.3 Interpreting the cumulative hazard and hazard rate

2.2 The quantile function

2.3 Interpreting the cumulative hazard and hazard rate

2.3.1 Interpreting the cumulative hazard

2.3.2 Interpreting the hazard rate

2.4 Means and medians
2.3.2 Interpreting the hazard rate

3 Hazard models

3.1 Parametric models

3.2 Semiparametric models

3.3 Analysis time (time at risk)

3.2 Semiparametric models

3.3 Analysis time (time at risk)

4 Censoring and truncation

4.1 Censoring

4.1.1 Right censoring

4.1.2 Interval censoring

4.1.3 Left censoring

4.2 Truncation
4.1.2 Interval censoring

4.1.3 Left censoring

4.2.1 Left truncation (delayed entry)

4.2.2 Interval truncation (gaps)

4.2.3 Right truncation

4.2.2 Interval truncation (gaps)

4.2.3 Right truncation

5 Recording survival data

5.1 The desired format

5.2 Other formats

5.3 Example: Wide-form snapshot data

5.2 Other formats

5.3 Example: Wide-form snapshot data

6 Using stset

6.1 A short lesson on dates

6.2 Purposes of the stset command

6.3 The syntax of the stset command

6.2 Purposes of the stset command

6.3 The syntax of the stset command

6.3.1 Specifying analysis time

6.3.2 Variables defined by stset

6.3.3 Specifying what constitutes failure

6.3.4 Specifying when subjects exit from the analysis

6.3.5 Specifying when subjects enter the analysis

6.3.6 Specifying the subject-ID variable

6.3.7 Specifying the begin-of-span variable

6.3.8 Convenience options

6.3.2 Variables defined by stset

6.3.3 Specifying what constitutes failure

6.3.4 Specifying when subjects exit from the analysis

6.3.5 Specifying when subjects enter the analysis

6.3.6 Specifying the subject-ID variable

6.3.7 Specifying the begin-of-span variable

6.3.8 Convenience options

7 After stset

7.1 Look at stset’s output

7.2 List some of your data

7.3 Use stdescribe

7.4 Use stvary

7.5 Perhaps use stfill

7.6 Example: Hip fracture data

7.2 List some of your data

7.3 Use stdescribe

7.4 Use stvary

7.5 Perhaps use stfill

7.6 Example: Hip fracture data

8 Nonparametric analysis

8.1 Inadequacies of standard univariate methods

8.2 The Kaplan–Meier estimator

8.4 Estimating the hazard function

8.5 Estimating mean and median survival times

8.6 Tests of hypothesis

8.2 The Kaplan–Meier estimator

8.2.1 Calculation

8.2.2 Censoring

8.2.3 Left truncation (delayed entry)

8.2.4 Interval truncation (gaps)

8.2.5 Relationship to the empirical distribution function

8.2.6 Other uses of sts list

8.2.7 Graphing the Kaplan–Meier estimate

8.3 The Nelson–Aalen estimator8.2.2 Censoring

8.2.3 Left truncation (delayed entry)

8.2.4 Interval truncation (gaps)

8.2.5 Relationship to the empirical distribution function

8.2.6 Other uses of sts list

8.2.7 Graphing the Kaplan–Meier estimate

8.4 Estimating the hazard function

8.5 Estimating mean and median survival times

8.6 Tests of hypothesis

8.6.1 The log-rank test

8.6.2 The Wilcoxon test

8.6.3 Other tests

8.6.4 Stratified tests

8.6.2 The Wilcoxon test

8.6.3 Other tests

8.6.4 Stratified tests

9 The Cox proportional hazards model

9.1 Using stcox

9.1.1 The Cox model has no intercept

9.1.2 Interpreting coefficients

9.1.3 The effect of units on coefficients

9.1.4 Estimating the baseline cumulative hazard and survivor functions

9.1.5 Estimating the baseline hazard function

9.1.6 The effect of units on the baseline functions

9.2 Likelihood calculations
9.1.2 Interpreting coefficients

9.1.3 The effect of units on coefficients

9.1.4 Estimating the baseline cumulative hazard and survivor functions

9.1.5 Estimating the baseline hazard function

9.1.6 The effect of units on the baseline functions

9.2.1 No tied failures

9.2.2 Tied failures

9.3 Stratified analysis
9.2.2 Tied failures

The marginal calculation

The partial calculation

The Breslow approximation

The Efron approximation

9.2.3 Summary
The partial calculation

The Breslow approximation

The Efron approximation

9.3.1 Obtaining coefficient estimates

9.3.2 Obtaining estimates of baseline functions

9.4 Cox models with shared frailty
9.3.2 Obtaining estimates of baseline functions

9.4.1 Parameter estimation

9.4.2 Obtaining estimates of baseline functions

9.5 Cox models with survey data
9.4.2 Obtaining estimates of baseline functions

9.5.1 Declaring survey characteristics

9.5.2 Fitting a Cox model with survey data

9.5.3 Some caveats of analyzing survival data from complex survey designs

9.5.2 Fitting a Cox model with survey data

9.5.3 Some caveats of analyzing survival data from complex survey designs

10 Model building using stcox

10.1 Indicator variables

10.2 Categorical variables

10.3 Continuous variables

10.5 Time-varying variables

10.2 Categorical variables

10.3 Continuous variables

10.3.1 Fractional polynomials

10.4 Interactions10.5 Time-varying variables

10.5.1 Using stcox, tvc() texp()

10.5.2 Using stsplit

10.6 Modeling group effects: fixed-effects, random-effects, stratification,
and clustering
10.5.2 Using stsplit

11 The Cox model: Diagnostics

11.1 Testing the proportional-hazards assumption

11.1.1 Tests based on reestimation

11.1.2 Test based on Schoenfeld residuals

11.1.3 Graphical methods

11.2 Residuals
11.1.2 Test based on Schoenfeld residuals

11.1.3 Graphical methods

Reye's syndrome data

11.2.1 Determining functional form

11.2.2 Goodness of fit

11.2.3 Outliers and influential points

11.2.1 Determining functional form

11.2.2 Goodness of fit

11.2.3 Outliers and influential points

12 Parametric models

12.1 Motivation

12.2 Classes of parametric models

12.2 Classes of parametric models

12.2.1 Parametric proportional hazards models

12.2.2 Accelerated failure-time models

12.2.3 Comparing the two parameterizations

12.2.2 Accelerated failure-time models

12.2.3 Comparing the two parameterizations

13 A survey of parametric regression models in Stata

13.1 The exponential model

13.4 Lognormal regression (AFT metric)

13.5 Loglogistic regression (AFT metric)

13.6 Generalized gamma regression (AFT metric)

13.7 Choosing among parametric models

13.1.1 Exponential regression in the PH metric

13.1.2 Exponential regression in the AFT metric

13.2 Weibull regression
13.1.2 Exponential regression in the AFT metric

13.2.1 Weibull regression in the PH metric

13.3 Gompertz regression (PH metric)
Fitting null models

13.2.2 Weibull regression in the AFT metric
13.4 Lognormal regression (AFT metric)

13.5 Loglogistic regression (AFT metric)

13.6 Generalized gamma regression (AFT metric)

13.7 Choosing among parametric models

13.7.1 Nested models

13.7.2 Nonnested models

13.7.2 Nonnested models

14 Postestimation commands for parametric models

14.1 Use of predict after streg

14.1.1 Predicting the time of failure

14.1.2 Predicting the hazard and related functions

14.1.3 Calculating residuals

14.2 Using stcurve
14.1.2 Predicting the hazard and related functions

14.1.3 Calculating residuals

15 Generalizing the parametric regression model

15.1 Using the ancillary() option

15.2 Stratified models

15.3 Frailty models

15.2 Stratified models

15.3 Frailty models

15.3.1 Unshared frailty models

15.3.2 Example: Kidney data

15.3.3 Testing for heterogeneity

15.3.4 Shared frailty models

15.3.2 Example: Kidney data

15.3.3 Testing for heterogeneity

15.3.4 Shared frailty models

16 Power and sample-size determination for survival analysis

16.1 Estimating sample size

16.4 Tabulating or graphing results

16.1.1 Multiple-myeloma data

16.1.2 Comparing two survivor functions nonparametrically

16.1.3 Comparing two exponential survivor functions

16.1.4 Cox regression models

16.2 Accounting for withdrawal and accrual of subjects 16.1.2 Comparing two survivor functions nonparametrically

16.1.3 Comparing two exponential survivor functions

16.1.4 Cox regression models

16.2.1 The effect of withdrawal or loss to follow-up

16.2.2 The effect of accrual

16.2.3 Examples

16.3 Estimating power and effect size 16.2.2 The effect of accrual

16.2.3 Examples

16.4 Tabulating or graphing results

References

Author index
(pdf)

Subject index
(pdf)