Statistical tests Part I: Student’s t-test

I am planning to put a brief review on statistical tests especially used in medical and biomedical experiments. Today, I will start with one of the most used and flexible of statistical tests: Student’s t-test! (I will follow Stephen Senn’s nice magazine article cited below)

(W.S. Gosset better known as Student, who invented Student’s t-test, and R.A Fisher, who took up Gosset’s work and extended and generalized it are the two men who contributed to development of t-test.)

A t-test is any statistical hypothesis test in which the test statistic follows a Student’s t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student’s t distribution.

For its strict validity, the t-test depends upon a number of assumptions: that the data involved in its calculation are independent, that the data are normally distributed and that any sample variances pooled together are estimates of the same population variation. Of these assumptions the first is crucial, the second mainly affects the power of the test to detect effects and the third depends very much on application and circumstance as to whether it matters or not. We now look at each in turn.

Declarations of independence:

What do we mean by independent? There is a formal mathematical definition that gets us nowhere. It says, effectively, that if the variance of a single measurement is σ^2 and the variance of the mean of n’ measurements is not σ^2/n’ then they are not independent! In practice independence depends on procedure and circumstance. For example, if we measure the diastolic blood pressure for 10 patients drawn at random from a patient population we have 10 independent measurements of patient blood pressure and we can use n’=10 in our formula for the variance of a mean. However, if we take 10 blood pressure measurements from a single, randomly drawn, patient then we may be entitled to claim that n’=10 for the purpose of measuring his or her blood pressure in the patient population. For that purpose we do not have independence.

Using the t-test as if observations were independent when they are not is a very serious error and it can lead to claiming significance when it is not really warranted. The effect of data that are not normally distributed, however, tends in the opposite direction.

t =  (mean(x) – nu)  / SE; where SE is estimated standard error = s/sqrt(n’); nu is hypothesis mean (which is zero).

A basic assumption behind the t-test is that we can reasonably estimate the variability of the mean from the variances of the observations. Now consider the so-called two-sample t-test– and extension, by Fisher, of Student’s t-test. This compares two means, but, strictly speaking, requires that the variances, σ1^2, σ2^2 from the two populations from which they are drawn are identical.

Does this matter? Stephen Senn says in his article that Not much for experimental work when we are just testing a null hypothesis since, as Fisher pointed out, the hypothesis that there is no difference between treatments implies that variances, not just means, are identical. Nor does it matter very much for other purposes provided that the sample sizes are the same. However, if they are not there can be a problem. Why is this? The reason is that the true variance of the difference between the two sample means is  σ1^2/n1′ + σ2^2/n2′. This means that the biggest contribution to the true variance comes from the smaller sample, all other things being equal. (variances of means are inversely proportional to sample size.) The tail wags the dog. It is the smallest sample that is crucial. However, if we assume σ1^2=σ2^2=σ^2 and proceed to estimate the variance by pooling the samples then it is the larger sample that contributes more to the variance estimation.

This mismatch, smaller sample contributing most to the variation in the treatment effect but least to the estimation of that variation, means that the test can behave badly if the variances are unequal. Even worse can happen if there is wider pooling of variances from treatments not directly involved in the comparison. The habit of doing this made sense in trials in agriculture, where degrees of freedom were scarce and agriculture was the field (put intended) in which Fisher developed his theories, but the fact that we still tend to do it for clinical trials, where degrees of freedom are often abundant, shows that habit rather than logic sometimes dictates how we use the t-test.

a few links that include some stats notes and tutorials:

http://www.princeton.edu/~otorres/Stata/statnotes

http://www.significancemagazine.org/details/magazine/868881/The-t-test-tool.html

will continue….

Leave a comment