{smcl}
{* 02dec2003}{...}
{hline}
help for {hi:simex}{right:(SJ3-4: st0049, st0050, st0051)}
{hline}

{title:Simulation Extrapolation}

{p 8 12 2}
{cmd:simex} {it:depvar} [{it:indepvars}] ({it:label:varlist})
[({it:label:varlist}) ... ({it:label:varlist})]
[{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmd:bstrap}
{cmd:brep}{cmd:(}{it:#}{cmd:)} {cmd:ltolerance}{cmd:(}{it:#}{cmd:)}
{cmd:iterate}{cmd:(}{it:#}{cmd:)}
{cmd:family}{cmd:(}{it:familyname}{cmd:)}
{cmd:link}{cmd:(}{it:linkname}{cmd:)}
{cmd:suuinit}{cmd:(}matrixname{cmd:)}
{cmd:theta}{cmd:(}matrixname{cmd:)} {cmd:linear|rational}
{cmd:nlrep}{cmd:(}{it:#}{cmd:)} {cmd:nleps}{cmd:(}{it:#}{cmd:)}
{cmd:median {cmd:srep}{cmd:(}{it:#}{cmd:)}
{cmd:btrim}{cmd:(}{it:#}{cmd:)}
{cmd:saving}{cmd:(}{it:filename}{cmd:)} {cmd:replace}
{cmd:seed}{cmd:(}{it:#}{cmd:)}
{cmd:scale}{cmd:(}{cmd:x2}|{cmd:dev}|{it:#}{cmd:)}
{cmd:message}{cmd:(}{it:#}{cmd:)}]


{p 4 4 2}where {it:familyname} is one of

{p 8 8 2}{cmdab:gau:ssian} | {cmdab:ig:aussian} |
{bind:{cmdab:b:inomial} [{it:varnameN}|{it:#N}]} | {cmdab:p:oisson} |
{bind:{cmdab:nb:inomial} [{it:#k}]} | {cmdab:gam:ma}

{p 4 4 2}and {it:linkname} is one of

{p 8 8 2}{cmdab:i:dentity} | {cmd:log} | {cmdab:l:ogit} | {cmdab:p:robit} |
{cmdab:c:loglog} | {cmdab:opo:wer} {it:#} | {cmdab:pow:er} {it:#} |
{cmdab:nb:inomial} | {cmdab:logl:og} | {cmd:logc}

{p 4 4 2} and {it:label:varlist} describes a variable measured with error.
The {it:label} is for the unknown measurement error covariate
({it:label} cannot be the same as an existing variable in the data
set). {it:varlist} is a list of variables with the replicate measurements 
for the unknown {it:label} covariate (see comments for restrictions).

{p 4 4 2}
{cmd:by} {it:...} {cmd::} may be used with {cmd:simex}; see help {help by}.

{p 4 4 2}
{cmd:simex} No {cmd:predict} is implemented.


{title:Description}

{p 4 4 2} 
{cmd:simex} fits generalized linear models for measurement error data
using IRLS (maximum quasi-likelihood) and is similar in syntax to
{cmd:rcal}.  The command is implemented by Stata's plug-in
mechanism. {cmd:simex} allows one or more (see comments) covariates
measured with errors and uses simulation extrapolation to estimate the
missing covariates. It will allow replicate data or a user specified
measurement error covariance matrix. It supports a very fast internal
bootstrap (different from the regular Stata boostrap command).


{title:Options}

{p 4 8 2}
{cmd:bstrap} specifies that the bootstrap estimate of variance be used.

{p 4 8 2}
{cmd:brep}{cmd:(}{it:#}{cmd:)} specifies the number of bootstrap
samples to consider in forming the bootstrap estimate of variance.  The
default is {cmd:brep(199)}.

{p 4 8 2}
{cmd:ltolerance}{cmd:(}{it:#}{cmd:)} specifies the convergence
criterion for the change in deviance between iterations;
{cmd:ltolerance(1e-6)} is the default.

{p 4 8 2}
{cmd:iterate}{cmd:(}{it:#}{cmd:)} specifies the maximum number of
iterations allowed in fitting the model; {cmd:iterate(100)} is the
default.  You should seldom need to specify {cmd:iterate()}.

{p 4 8 2}
{cmd:family}{cmd:(}{it:familyname}{cmd:)} specifies the distribution of
{it:depvar}; {cmd:family(gaussian)} is the default.

{p 4 8 2}
{cmd:link}{cmd:(}{it:linkname}{cmd:)} specifies the link function; the
default is the canonical link for the {cmd:family()} specified.

{p 4 8 2}
{cmd:suuinit}{cmd:(}matrixname{cmd:)} Specify the measurement error covariance
matrix. This is calculated from the replications in the measurement error
variables if it is not specified.

{p 4 8 2}
{cmd:theta}{cmd:(}matrixname{cmd:)} The thetas we will use for our simex. 
The default is {cmd:} theta=(0, .5, 1, 1.5, 2){cmd:}

{p 4 8 2}
{cmd:linear|rational} The default extrapolation is quadratic regression. 
Choose {cmd: linear} to use linear regression extrapolation or {cm: rational} 
to use the rational extrapolant (see {cmd: nlrep}, {cmd: nleps} and comments).

{p 4 8 2}
{cmd:nlrep}{cmd:(}{it:#}{cmd:)} When using the rational extrapolant, the
maximum number of iterations the optimizer will use.

{p 4 8 2}
{cmd:nleps}{cmd:(}{it:#}{cmd:)} When using the rational extrapolant, the
convergence criteria (tolerance) we use in our optimizer.

{p 4 8 2}
{cmd:median} Use the median instead of the default mean of the simulated
estimators (see {cmd: srep}).

{p 4 8 2}
{cmd:srep}{cmd:(}{it:#}{cmd:)} Number of replications (simulations) for
each theta.

{p 4 8 2}
{cmd:btrim}{cmd:(}{it:#}{cmd:)} Percent boostrap trimming. The default
is {cmd:btrim(.02)}.

{p 4 8 2}
{cmd:saving}{cmd:(}{it:filename}{cmd:)} Save the booststrap results to the
specified file.

{p 4 8 2}
{cmd:replace} Replace the existing 'bootstrap results' file if it exists.

{p 4 8 2}
{cmd:seed}{cmd:(}{it:#}{cmd:)} specify the seed for the random number
generator used for simex. This enables for identical simex runs. This
option is generally not specified.

{p 4 8 2}
{cmd:scale}{cmd:(}{cmd:x2}|{cmd:dev}|{it:#}{cmd:)} overrides the
default scale parameter.  By default, {cmd:scale(1)} is assumed for
discrete distributions (binomial, Poisson, negative binomial)
and {cmd:scale(x2)} for continuous distributions
(Gaussian, gamma, inverse Gaussian).

{p 8 8 2}
{cmd:scale(x2)} specifies the scale parameter be set to the Pearson
chi-squared (or generalized chi-squared) statistic divided by the residual
degrees of freedom.

{p 8 8 2}
{cmd:scale(dev)} sets the scale parameter to the deviance divided by the
residual degrees of freedom.  This provides an alternative to {cmd:scale(x2)}
for continuous distributions and over- or under-dispersed discrete
distributions.

{p 8 8 2}
{cmd:scale}{cmd:(}{it:#}{cmd:)} sets the scale parameter to {it:#}.

{p 4 8 2}
{cmd:message}{cmd:(}{it:#}{cmd:)} The message or debug level 
from the plug-in module. The default is {cmd:message(2))}.


{title:Special comments on multiple measurement error covariates}

{p 4 4 2}
The number of replications for a covariate measured with error can
vary across observations. When two or more measurement error
covariates exist, they must all have the same number of replications
across observations.


{title:Special comments on standard errors}

{p 4 4 2} 
By default simex will not calculate standard errors. The only
supported standard errors are via the bootstrap. Use the 'bstrap' and
'brep' options, but note that this calculation can take considerable
time. An estimated time to completion is printed if the boostrap will
require more than 30 seconds.


{title:Special comments on the rational extrapolant}

{p 4 4 2} 
The rational extrapolant requires the fitting of a non-linear curve on
estimated coeeficients using very few data points (the number of
thetas).  This can encounter numerical difficulties and produce and
error message. We suggest the default quadratic extrapolant.


{title:Plotting the effect of measurement error on parameter estimates}

{p 4 4 2} 
Use the {cmd: simexplot} to view the effect of measurement error on
parameter estimates. {cmd:simexplot} will plot the effect of
measurement error on parameter estimate. It gives a visual
representation on how the parameter estimates are derived. The line
shows the extrapolation back to -1. Note that if the rational
extrapolant was used no extrapolant line is drawn. See the examples
below.


{title:Remarks}

{p 4 4 2}
The allowed link functions are

{center:Link function            {cmd:glm} option     }
{center:{hline 40}}
{center:identity                 {cmd:link(identity)} }
{center:log                      {cmd:link(log)}      }
{center:logit                    {cmd:link(logit)}    }
{center:probit                   {cmd:link(probit)}   }
{center:complementary log-log    {cmd:link(cloglog)}  }
{center:odds power               {cmd:link(opower} {it:#}{cmd:)} }
{center:power                    {cmd:link(power} {it:#}{cmd:)}  }
{center:negative binomial        {cmd:link(nbinomial)}}
{center:log-log                  {cmd:link(loglog)}   }
{center:log-compliment           {cmd:link(logc)}     }

{p 4 4 2}
The allowed distribution families are

{center:Family                 {cmd:glm} option       }
{center:{hline 40}}
{center:Gaussian(normal)       {cmd:family(gaussian)} }
{center:Inverse Gaussian       {cmd:family(igaussian)}}
{center:Bernoulli/binomial     {cmd:family(binomial)} }
{center:Poisson                {cmd:family(poisson)}  }
{center:Negative binomial      {cmd:family(nbinomial)}}
{center:Gamma                  {cmd:family(gamma)}    }

{p 4 4 2}
Reasonable combinations of {cmd:family()} and {cmd:link()} are

	  {c |} id  log  logit  probit  clog  pow  opower  nbinomial  loglog  logc
{hline 10}{c +}{hline 67}
Gaussian  {c |}  x   x                         x
inv. Gau. {c |}  x   x                         x
binomial  {c |}  x   x     x      x       x    x     x                  x      x
Poisson   {c |}  x   x                         x
neg. bin. {c |}  x   x                         x              x
gamma     {c |}  x   x                         x

{p 4 11 2}
Note:  Nonstandard combinations other than those marked out above
are allowed, but the user is responsible for seeing that the data fit
the combination and for the interpretation of the results.

{p 4 4 2}
If you specify {cmd:family()} but not {cmd:link()}, you obtain the canonical
link for the family:

{center:{cmd:family()}                default {cmd:link()}}
{center:{hline 38}}
{center:{cmd:family(gaussian)}        {cmd:link(identity)}}
{center:{cmd:family(igaussian)}       {cmd:link(power -2)}}
{center:{cmd:family(binomial)}        {cmd:link(logit)}   }
{center:{cmd:family(poisson)}         {cmd:link(log)}     }
{center:{cmd:family(nbinomial)}       {cmd:link(log)}     }
{center:{cmd:family(gamma)}           {cmd:link(power -1)}}


{title:Examples}

{p 4 8 2}{cmd:. * generate some data}{p_end}
{p 4 8 2}{cmd:. set obs 1000}{p_end}
{p 4 8 2}{cmd:. gen x1 = uniform()}{p_end}
{p 4 8 2}{cmd:. gen x2 = uniform()}{p_end}
{p 4 8 2}{cmd:. gen x3 = uniform()}{p_end}
{p 4 8 2}{cmd:. gen err = invnorm(uniform())}{p_end}
{p 4 8 2}{cmd:. gen y = 1+2*x1+3*x2+4*x3+err}{p_end}

{p 4 8 2}{cmd:. * estimate with x3 known}{p_end}
{p 4 8 2}{cmd:. qvf y x1 x2 x3, bstrap}{p_end}

{p 4 8 2}{cmd:. * simulate measurement error covariate}{p_end}
{p 4 8 2}{cmd:. gen a1 = x3 + .3*invnorm(uniform())}{p_end}
{p 4 8 2}{cmd:. gen a2 = x3 + .3*invnorm(uniform())}{p_end}

{p 4 8 2}{cmd:. * estimate x1, x2 & w3 using simex & plot the extrapolation}{p_end}
{p 4 8 2}{cmd:. simex (y=x1 x2) (w3: a1 a2), bstrap}{p_end}
{p 4 8 2}{cmd:. simexplot w3}{p_end}
{p 4 8 2}{cmd:. simexplot}{p_end}

{p 4 8 2}{cmd:. eret list}{p_end}

{p 4 8 2}{cmd:. mat theta=(0,.5,1,1.5,2,2.5,3,3.5)}{p_end}
{p 4 8 2}{cmd:. simex (y=x1 x2) (w3: a1 a2), bstrap theta(theta) median}{p_end}
{p 4 8 2}{cmd:. simexplot w3}{p_end}

{p 4 4 2}
See {cmd: rcal} for further examples of options.

{title:Also see}

{p 4 13 2}
Online:  help for {help qvf}, {help simex}, {help simexplot}