Generalized Linear Models and Extensions, Fourth Edition 

Click to enlarge See the back cover 
$68.00 Print Buy now$58.00 VitalSource Buy now$51.00 Amazon Kindle Buy from Amazon
As an Amazon Associate, StataCorp earns a small referral credit from
qualifying purchases made from affiliate links on our site.

Author index Subject index Download the datasets used in this book 

Comment from the Stata technical groupGeneralized linear models (GLMs) extend linear regression to models with a nonGaussian or even discrete response. GLM theory is predicated on the exponential family of distributions—a class so rich that it includes the commonly used logit, probit, and Poisson models. Although one can fit these models in Stata by using specialized commands (for example, logit for logit models), fitting them as GLMs with Stata’s glm command offers some advantages. For example, model diagnostics may be calculated and interpreted similarly regardless of the assumed distribution. This text thoroughly covers GLMs, both theoretically and computationally, with an emphasis on Stata. The theory consists of showing how the various GLMs are special cases of the exponential family, showing general properties of this family of distributions, and showing the derivation of maximum likelihood (ML) estimators and standard errors. Hardin and Hilbe show how iteratively reweighted least squares, another method of parameter estimation, is a consequence of ML estimation using Fisher scoring. The authors also discuss different methods of estimating standard errors, including robust methods, robust methods with clustering, Newey–West, outer product of the gradient, bootstrap, and jackknife. The thorough coverage of model diagnostics includes measures of influence such as Cook’s distance, several forms of residuals, the Akaike and Bayesian information criteria, and various R^{2}type measures of explained variability. After presenting general theory, Hardin and Hilbe then break down each distribution. Each distribution has its own chapter that explains the computational details of applying the general theory to that particular distribution. Pseudocode plays a valuable role here because it lets the authors describe computational algorithms relatively simply. Devoting an entire chapter to each distribution (or family, in GLM terms) also allows for the inclusion of realdata examples showing how Stata fits such models, as well as the presentation of certain diagnostics and analytical strategies that are unique to that family. The chapters on binary data and on count (Poisson) data are excellent in this regard. Hardin and Hilbe give ample attention to the problems of overdispersion and zero inflation in countdata models. The final part of the text concerns extensions of GLMs. First, the authors cover multinomial responses, both ordered and unordered. Although multinomial responses are not strictly a part of GLM, the theory is similar in that one can think of a multinomial response as an extension of a binary response. The examples presented in these chapters often use the authors’ own Stata programs, augmenting official Stata’s capabilities. Second, GLMs may be extended to clustered data through generalized estimating equations (GEEs), and one chapter covers GEE theory and examples. GLMs may also be extended by programming one’s own family and link functions for use with Stata’s official glm command, and the authors detail this process. Finally, the authors describe extensions for multivariate models and Bayesian analysis. The fourth edition includes two new chapters. The first introduces bivariate and multivariate models for binary and count outcomes. The second covers Bayesian analysis and demonstrates how to use the bayes: prefix and the bayesmh command to fit Bayesian models for many of the GLMs that were discussed in previous chapters. Additionally, the authors added discussions of the generalized negative binomial models of Waring and Famoye. New explanations of working with heaped data, clustered data, and biascorrected GLMs are included. The new edition also incorporates more examples of creating synthetic data for models such as Poisson, negative binomial, hurdle, and finite mixture models. 

About the authorsJames W. Hardin is a professor and the Biostatistics division head in the Department of Epidemiology and Biostatistics at the University of South Carolina. He is also the associate dean for Faculty Affairs and Curriculum of the Arnold School of Public Health at the University of South Carolina. Hardin is the coauthor, along with Phillip Good, of four editions of Common Errors in Statistics (And How to Avoid Them). He is also the coauthor of more than 200 refereed journal articles and several book chapters. With Hilbe, he wrote the glm command, on which the current Stata command is based. He teaches courses on generalized linear models, generalized estimating equations, count data modeling, and logistic regression through statistics.com. Hardin serves on the editorial board of the Stata Journal. Joseph M. Hilbe was a professor emeritus at the University of Hawaii and an adjunct professor of sociology and statistics at Arizona State University. A Fellow of the Royal Statistical Society and the American Statistical Association, he wrote many journal articles and book chapters. He also wrote Negative Binomial Regression, Practical Guide to Logistic Regression, Modeling Count Data, and with Hardin, Generalized Estimating Equations. Hilbe was also the lead statistician at several major research corporations, CEO of National Health Economics and Research, and president of Health Outcomes Technologies in Pennsylvania. He passed away in March 2017. 

Table of contentsView table of contents >> 