Survival Analysis I
- crystal0108wong
- Apr 30, 2017
- 3 min read
Survival Analysis is time to event analysis. Time to event is the time it takes until a pre-defined event occurs. The times till events are called survival times. An example in marketing can be length of subscription/membership and the event would be cancellation of subscription or membership. Since the incomplete nature of the observation occurs in the right tail of the time axis, such observations are right censored.
Survival time data measure the time to a certain event, such as failure, death, a response, etc. These times are subject to random variations and like any random variables form a distribution. The distribution of survival times is usually characterized by three mathematically equivalent functions:
The survival (survivorship) function
The probability density function
The hazard function
The survival function
The survival function, denoted S(t), is the probability that the survival time T is greater than t. Such as, the probability an individual subscribes your magazine longer than t=5 days:
S(t) = P(survival time is greater than t) = P(T > t); S(5) = P(T > 5)
Here S(t) is a nonincreasing function of time t. That is, the larger t is the less likely one survives longer than t. There are two trends in survival function:
A steep survival curve represents a low survival rate or short survival time. The probability of surviving drops quickly:
A gradual or flat curve represents high survival rate or longer survival time. The probability of surviving drops slowly

The probability density function
Like any other continuous random variable, the survival time T has a probability density function. The probability density function is the probability of failure in a small interval per unit time. The density function f(t) is also known as the unconditional failure rate.

Letting T denote the time till an event, you can sort of really think of f(t) as the p(T=t) where S(t) was P(T > t). For example, for time measured in days, f(30) can be thought of as the probability of unsubscribing happening after one month.
The hazard function
The h(t) of Survival time T is the conditional failure rate.

Letting T denote the time till an event, you can sort of think of h(t) as the p(T=t|T ≥t).For example, h(45) can be thought of as the probability of membership cancellation happening on day 45 given it does not occur before day 45.
Estimation of the Survival Analysis
Kaplan-Meier estimator
Denote the number at risk at time t(i) as ni and the observed number of deaths at time t(i) as di. The of the survival function at time t is obtained from the equation.

This formula is the product of conditional probabilities. The term (ni – di)/ni is the estimated probability of surviving past time t(i) given one has made it to time t(i) . The censored observations do not change the value of the KM estimate when they occur. However, they are taken into account by being in the risk set before they occur and not being in the risk set after they occur.
When an event and a censored observation occur at the same time, we assume that the time to event for the censored observation is longer than that for the event. This is the case since the event has not occurred yet for the censored observation. Thus the censored observation is considered in the risk set of the observed event and not afterwards.
Comments