Our empirical analysis is done with maximum likelihood (ML) and partial likelihood (PL) estimation. This section gives a brief overview of how these two kinds of estimation strategies are used with event-history data.
1. Maximum Likelihood
The likelihood function L is defined as the joint probability (or joint proba- bility density) of sample observations. When observations on different sample members are statistically independent, the likelihood equals the product of the contribution of each sample member / (assuming random sampling):
An estimate of the lower bound of the variance-covariance matrix of Φ, a vector of parameters estimated by ML, is given by
The diagonal elements of this matrix provide asymptotic estimates of vari- ances of parameters. We use this information to test hypotheses about single parameters. In all tables, we report estimated asymptotic standard errors in parentheses below point estimates of the parameters.
We also report the logarithms of the maximized likelihoods. These num- bers can be used to compute likelihood ratio tests on sets of parameters. In the typical tables we report, the models to the left are nested within some of those to the right in the table. Usually the left-most columns in the tables report estimates of simple models that exclude the effects of some of the covariates; the effects of these covariates are then included in the more complex models in columns on the right-hand side of the tables. The likelihood ratio test statistic is defined as
where L0 and L1 denote the likelihoods of the null model (with, say, k constraints imposed) and the alternative model that relaxes the constraints, respectively. With suitably large samples, —2 log λ is distributed as x2 with k degrees of freedom. So —2 times the difference of the log- likelihoods of the pair of hierarchical models in our tables has approximately a x2 distribution under the null hypothesis (that the k constraints hold in the population). We refer often to such tests in discussing our findings.
When the dates of events are known, we use PROC LIFEREG (SAS Institute 1985) to obtain ML estimates of the parameters of regression models of the form:
log T = -β’x + σW,
as discussed earlier, by maximum likelihood. Letting/(W) and G(W) denote the probability density function and the survivor function of the random variable W, the log-likelihood can be written, following Kalbfleisch and Prentice (1980), as
where di, is an indicator variable that equals unity if the observation is uncensored and equals zero if it is censored. The program LIFEREG maximizes this log-likelihood using one of several assumptions about the distribution of W. In the case of the exponential model, W is assumed to have the extreme-value distribution and a is constrained to equal unity. For the Weibull model, the distributional assumption is the same but the scale parameter is unconstrained. In the case of the gamma model, W is assumed to have a gamma distribution with parameter σ, and the scale parameter, k, is constrained to equal unity. In the generalized gamma model the latter constraint is relaxed. Finally, in the case of the log- logistic model, W is assumed to have a logistic distribution.
In reporting estimates of these models, we report β rather than – β; that is, we report the negative of the vector of the regression coefficients. We do so because we are interested in the effects of the covariates on the rates rather than on the expected waiting times. In the case of the Weibull and generalized gamma models, the negative of the vector of regression coefficients actually estimates β* = βσ, as we pointed out earlier. So the estimates we report should be divided by σˆ to estimate β.
As we have noted repeatedly, σ = p-1 in all models for which this parameter is defined. We report estimates of σ in the tables of findings; these are labeled the “scale parameters.’’ ML estimates of p are found by forming σ-1. The tables also report a “shape parameter” for models that use the gamma distribution. The shape parameter equals 1Ik2 in the conventions of PROC LIFEREG.
When the dates are known only to the year, we use ML to estimate parameters of Poisson regression models; that is, we calculate ML estimates of the vector of regression coefficients, β, in (8.18b) and their asymptotic standard errors. We use LIMDEP (Greene 1986) to perform these calculations. Since these models are a special case of log-linear models for counted data, we report G2 as a measure of fit. Let bt denote the count of events in year t and bt the fitted count under a particular model.
In comparisons of nested models, the difference in G2’s is distributed approximately as x2 under the null hypothesis that the k constraints embodied in the simpler model hold.
2. Partial Likelihood
Cox’s (1972, 1975) method of partial likelihood is useful when transition rates vary in some unknown way over time but the variations are the same for organizations at risk of the event. This method applies only to models of proportional hazards (or rates):
Both the exponential and Weibull models fall in this class. The first compo- nent on the right-hand side of (8.20) is an unknown, (possibly) time-depen- dent nuisance function q(t), which affects the rate of every member of the population in the same way. The second component, Φ(xt), is a function of a vector of (possibly time-varying) observed causal variables xt, and possibly also of time t.
Although the PL approach to event-history analysis does not require specification of q(t), it does require parametric assumptions about the dependence of Φ(•) on the observed variables xt. The simplest model we use holds that
The parameters in (8.21) cannot be estimated by ML because the specific form of q(t) is unspecified, which means that a parametric expression cannot be written for the survivor function, a key component of the full likelihood. However, PL estimators can be used.
The PL estimator is formed as follows. The data are first arranged in the order in which events occur in the sample, assuming for simplicity that only one sample member has an event at any moment. So, for example, t(t) denotes the time (duration) of the first event that is observed in the sample, while x(1) denotes the value of the vector of explanatory variables for the sample member that has the first event, and so forth. Usually the times of events are observed for only some subset /* of the sample, /* ≤ I. The remaining Ic = I – /* cases are right-censored. No events occur within the period during which they are observed. Let the collection of cases still at risk of experiencing the event at duration (or age) the risk set, be denoted by R(t(i)). The PL for the whole sample is the product of /* such terms:
The PL estimate of β in (8.21) is obtained by finding the value that maximizes L in (8.22); one treats the PL as though it were a full likelihood. Cox (1975) claimed that the PL estimators of β are consistent; Efron (1977) proved that under fairly general conditions the PL estimators are efficient (see also Cox and Oakes 1984). Tsiatis (1981) proved that the PL estimators are consistent.
This concludes our review of the models and methods used to estimate effects of theoretically relevant covariates on the transition rates of or- ganizations. We turn now to the empirical studies that implement these methods.
Source: Hannan Michael T., Freeman John (1993), Organizational Ecology, Harvard University Press; Reprint edition.