Survival Analysis II - Comparison of Survival Function
- crystal0108wong
- Apr 6, 2017
- 3 min read
This is about comparing the survival experience between categorical explanatory variable. For example, we may wish to see if gender has an effect on the length of magazine subscription. This is similar to using a t-test to see if gender has an effect on some response, except now we are working with censored data. Through comparison we can answer questions such as do women have a higher probability of subscribing magazines longer than men? Plot the KM estimate for both genders can be done in SAS with the Proc lifetest command. Previously, when plotting the KM estimate, the following code was used in SAS:
proc lifetest data=whas100 plot=(s) ;
time years*fstat(0);
run;
Now to obtain comparison result, the “strats” command for the variable gender was used:
proc lifetest data=whas100 plot=(s) ;
time years*fstat(0);
strata gender;
run;
That will give us a graph like this:

The estimated survival function for males lies completely above that for females.In general, the pattern of one survival function lying above another means the group defined by the upper curve lived longer. At any point in time the estimated probability of living past that time point is greater for the group represented by the upper curve. The statistical question is whether this observed difference is significant.
To test if the overall survival experience for males is different than that for females, so in null hypothesis we assume the probability of a death at a time point is the same for both groups. And we assume this for each time point. As a result, similarly to a chi-square test, the expected number of deaths for a group at a time point, under this assumption, can be solved for by: the total number of deaths at that time point times the number at risk for that group at that time point divided by the total at risk at that time point.
[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"></v:stroke> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"></v:f> <v:f eqn="sum @0 1 0"></v:f> <v:f eqn="sum 0 0 @1"></v:f> <v:f eqn="prod @2 1 2"></v:f> <v:f eqn="prod @3 21600 pixelWidth"></v:f> <v:f eqn="prod @3 21600 pixelHeight"></v:f> <v:f eqn="sum @0 0 1"></v:f> <v:f eqn="prod @6 1 2"></v:f> <v:f eqn="prod @7 21600 pixelWidth"></v:f> <v:f eqn="sum @8 21600 0"></v:f> <v:f eqn="prod @7 21600 pixelHeight"></v:f> <v:f eqn="sum @10 21600 0"></v:f> </v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"></v:path> <o:lock v:ext="edit" aspectratio="t"></o:lock> </v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style='width:84pt; height:40pt' o:ole=""> <v:imagedata src="file://localhost/Users/yw7986/Library/Group%20Containers/UBF8T346G9.Office/msoclip1/01/clip_image001.wmz" o:title=""></v:imagedata> </v:shape><![endif][if !vml][endif][if gte mso 9]><xml> <o:OLEObject Type="Embed" ProgID="Equation.3" ShapeID="_x0000_i1025" DrawAspect="Content" ObjectID="_1555348556"> </o:OLEObject> </xml><![endif]
Let:
wi denote the weight at time t(i)
d1i denote the number of deaths in the 1st group at time t(i)
e(hat)1i denote the estimated expected number of deaths in the 1st group at time t(i)
v(hat)1i denote the estimated variance of d1i at time t(i)
The test statistics then have the form:

In the formula, each term is multiplied by a “weight” wi. For the generalized Wilcoxon test statistic, the weights are the number at risk. This means that more weight is placed on differences between the survival functions at the earlier time points (when fewer people have died and more are at risk). The log rank test uses 1 for its weights and all time points are treated the same. If there are large differences between the survival functions, this will correspond to large differences between the observed number of deaths and the expected number of deaths calculated assuming the null is true.
If the null hypothesis is true, the discrepancy between the number of deaths we expected and the number we actually observed should be small. These discrepancies are referred to as errors and again, large errors are evidence against the null. When the errors are large, the sum of these errors will be large. Thus large tests statistics are evidence against the null hypothesis and we reject for large values of the test statistic.
Kommentare