An event-history (or sample-path) observation plan records information on *all *changes in state within some observation period. Let *Y(t*) denote the random variable indicating an organizations’s position at time *t, *and let *y(t) *denote a particular realization of this random variable. The set of all distinct values that *Y(t) *can take is called the state space of *Y, *whose size we denote by ψ. We concentrate on qualitative variables that have a countable number of states. The states in our analyses usually consist of a set of forms and a set of ending events such as merger and disbanding.

The state entered at the nth event is denoted by the random variable *Y _{n}*

*.*The period of time between successive events is called an

*episode*or

*spell.*The nth episode refers to the period between the (n – 1)th and nth events. The length of time from event n — 1 until event n is called the

*waiting time*to the nth event. We often refer to the waiting time until an event as the

*duration*in a state. In the case of the first spell, the waiting time in a state is also the age of the organization at the time of the first change of state.

When data on *Y(t) *are lacking for the beginning of the history, the data are said to be left-censored; when data are lacking for the end of the history, they are said to be right-censored. Event-history data available to organizational researchers are almost always censored on the right (that is, we do not know what will happen in the future). Often they are censored on the left as well. Censoring presents special problems in analyzing event- history data. Right-censoring turns out to be a manageable problem given assumptions that are usually plausible. Left-censoring is far more difficult to handle. Because our designs usually avoid problems of left-censoring, we focus here on analysis of event histories that are censored on the right but not on the left.

### 1. Survivor Function

The first step in analyzing a set of histories is to summarize them in a compact way. One useful summary is the survivor function. This function tells the probability that the event of interest does not occur before time *t:*

* *

where T* _{j}* is a random variable that denotes the waiting time in state

*j.*

In our empirical analyses, more than one event can occur: organizations change form more than once, they absorb other organizations repeatedly, they are absorbed and then re-emerge, and so forth. Since the timing of events can depend on previous history, the survivor function should be expressed as a function of previous history in any general treatment. We suppress this dependence to avoid cluttering the notation. This does not mean that we believe previous history is unimportant; in the chapters that follow, we explore dependence on history when it seems substantively important. We usually do so by including indicators of previous history as covariates in our models.

The standard approach to summarizing event-history data uses Kaplan and Meier’s (1958) product-limit, estimator, a nonparametric maximum- likelihood estimator of the survivor function. Since we are typically inter- ested in the logarithm of the survivor function (for reasons discussed in this section), we rely instead on an estimator of the latter.

### 2. Hazard Function

The hazard function tells the rate at which transitions *from *a state occur. It is the limiting probability of “failing” (that is, leaving a state, at time *t, *given that failure has not occurred before *t). *The hazard function for the nth event is defined in terms of the corresponding survivor function:

Solving this differential equation (with initial condition *G _{j}(*0) = I) gives the

*integrated hazard function:*

* *

The relationship in (8.1) plays an important role in estimating hazard func-tions from empirical data. It shows that the hazard function of a process defines the survivor function, and vice versa.

The integrated hazard function has direct substantive significance. For instance, estimates of this function tell whether hazard functions are con- stant. When the hazard is indeed constant across sample members and over time for each sample member, the integral in (8.1) is a linear function of the waiting time, that is, *H _{j}(t) = h_{j}t. *This means that a plot of an integrated hazard function against duration (or age in the case of first spells) will be approximately linear if the hazard is constant in the population. Thus departures from linearity in such plots are indications that the hazard varies among organizations (there is population heterogeneity), varies over time for an individual organization (there is time dependence), or both.

According to (8.1), a hazard function can be estimated from a plot of the logarithm of an estimated survivor function versus time. The estimated hazard function at any time *t *is the slope of *H _{j}(t). *So any graphic or analytic technique for evaluating this function’s slope is a possible method for estimating the hazard function. The logarithm of the Kaplan-Meier estimator can be used as an estimator of the cumulative hazard function. Since the logarithm is a monotonic transformation and the large-sample properties of maximum likelihood (ML) are preserved under monotonic transformations, this estimator of the cumulative hazard function has good asymptotic properties. However, because of the nonlinearity of the transformation, this estimator of the cumulative hazard function is biased in small samples.

Aalen (1978) has used martingale theory to derive a better estimator.^{26} His nonparametric estimator of the cumulative hazard function is

where *I *is the number of cases at risk initially, *t _{(i)}*

*is the time of the /th event, (i) is the count (order) of the event at*

*t*, and c(

_{(i)}*t*) is the number of cases lost to right-censoring by

_{(i)}*t*

_{(i)}*.*This estimator is unbiased, consistent, and asymptotically normal. Its variance is

This estimator forms the basis of a powerful approach to nonparametric estimation of the cumulative hazard function.

### 3. Conditional Survivor Function

The notion of a survivor function can be generalized to apply to the case of organizational demography and ecology in which several destinations are possible, the case of competing risks. If ψ different destinations (risks) are possible, each organization can be thought of as having ψ unobserved or latent waiting times, one for each destination. Let these latent random variables be denoted by *. *The conditional survivor function for a partic- ular destination is defined in terms of the corresponding latent waiting time:

If no event has occurred by some *t, *this means that * > t *for all *k. *Unless we note otherwise, we assume that the ψ processes are *independent. *Notice that

when the competing risks are independent.

### 4. Instantaneous Transition Rates

The hazard rate for a particular state tells the speed with which that state is vacated. When only one destination is possible, knowledge of the rate of leaving suffices to describe the process. However, as we noted earlier, organizational ecology usually considers processes in which multiple desti- nations are possible. For example, organizations can leave a population by disbanding, by merging with another, or by changing form. In such cases, we need to know both the rate at which some state is vacated and the relative odds of moving to the various destinations. A standard way to consider these kinds of issues is in terms of the instantaneous transition rate.

A transition rate is the limit of an ordinary discrete-time transition prob-ability:

Because the hazard is the rate of leaving a state irrespective of destination,

That is, the rate of leaving state *j *(that is, of entering any state *k) *is identical to the hazard function for state *j.*

The relationship between the transition rate and the survivor function is important in estimation. Equations (8.1) and (8.3)

** **

When the competing risks are independent so that (8.2) holds, this last statement implies that which provides a definition for a generalization of the hazard to the case of

multiple destinations, which is called the *integrated transition rate: **(8.4)*

The relationship between the integrated hazard and the instantaneous tran- sition rate can be seen by differentiating (8.4):

So the slope of *R _{jk}*

*(t)*at time

*t*gives an estimate of the corresponding transition rate. Aalen’s estimator is readily extended to provide an estima tor of

*R*the cumulative transition

_{jk}(t),*rate.*Because the term “cumulative hazard” is often used to apply both to hazards of leaving a state and transition rates to particular states, we sometimes refer to

*R*

_{jk}*as a cumulative hazard.*

Transition rates cannot be observed directly. They can be estimated, however, using estimates of *R(t) *and the relationship in (8.4). Later in this chapter we discuss how this is actually done. But first we turn to issues concerning causal models for organizational transition rates. Henceforth we suppress the subscript *jk *in order to simplify the notation. With this simpler notation, *r *refers to a rate of transition between two unspecified states. The context tells the particular states involved.

Source: Hannan Michael T., Freeman John (1993), *Organizational Ecology*, Harvard University Press; Reprint edition.