Student's t-test: Comparison of two means
Theory
Among the most commonly used statistical significance tests applied to small data sets (populations samples) is the series of Student's tests. One of these tests is used for the comparison of two means, which is commonly applied to many cases. Typical examples are:
Example 1: Comparison of analytical results obtained with the same method on samples A and B, in order to confirm whether both samples contain the same percentage of the measured analyte or not.
Example 2: Comparison of analytical results obtained with two different methods A and B on the same sample, in order to confirm whether both methods provide similar analytical results or not.
General aspects of significance tests
The outcome of these tests is the acceptance or rejection of the null hypothesis (H0). The null hypothesis generally states that: "Any differences, discrepancies, or suspiciously outlying results are purely due to random and not systematic errors". The alternative hypothesis (Ha) states exactly the opposite.
The null hypothesis for the aforementioned examples is:
The means are the same, i.e. in Example 1: both samples contain the same percentage of the analyte; in Example 2: both methods provide the same analytical results. The differences observed (if any) are purely due to random errors.
The alternative hypothesis is:
The means are significantly different, i.e. in Example 1: each sample contains a different percentage of the analyte; in Example 2: the methods provide different analytical results (so at least one method yields systematic analytical errors).
An erroneous rejection of H0 (even though it is true) constitutes a Type 1 error, whereas an erroneous acceptance of H0 (even though it is false) constitutes a Type 2 error.
All significance tests provide results within a predefined confidence level % (CL%). Confidence levels commonly used are 90%, 95% and 99%, with most usual (at least in the field of chemical analysis) the 95%.
A CL 95% means that: In case of rejecting Ho, we are 95% or more certain that we did the right thing. In other words, we risk a probability of no morethan (100-95)/100 = 0.05 for a Type 1 error.
We can decrease or increase the confidence level of a significance test, but one has to consider the following pitfalls:
(a) By decreasing CL say to 90% (making thus the rejection of H0 easier) the probability of Type 1 error obviously increases.
(b) By increasing CL say to 99% (making thus the rejection of H0 harder) the probability of Type 2 error increases.
A CL 95% is generally considered as a fair compromise between these two different risks.
Student's t-test for the comparison of two means
This test (as described below) assumes: (a) A normal (gaussian) distribution for the populations of the random errors, (b) there is no significant difference between the standard deviations of both population samples.
The two means and the corresponding standard deviations are calculated by using the following equations (nA and nB are the number of measurements in data set A and data set B, respectively):
Then, the pooled estimate of standard deviation sAB is calculated:
Finally, the statistic texp (experimental t value) is calculated:
texp value is compared with the critical (theoretical) tth value corresponding to the given degree of freedom N (in the present case N = nA + nB - 2) and the confidence level chosen. Tables of critical t values can be found in any book of statistical analysis, as well as in many quantitative analysis textbooks. Iftexp>tth then H0 is rejected else H0 is retained.
How this test and the other significance tests are performed using a statistical analysis program
Nowadays, the rather tedious calculation of statistics (such as texp) has been greatly simplified by using statistical analysis programs. Furthermore, there is no need of using statistical tables containing critical values. Instead, after loading the data and executing the program, a numerical value P is internally calculated (usually by mathematically complicated procedures) and it is finally displayed. This P is the probability of Type 1 error (specific for the data given), and this is more than adequate information for the user to judge the acceptance or the rejection of the null hypothesis.
For example, supposing that we have decided to work at CL 95% (i.e. we risk a probability of error of Type 1 not greater than 0.05), then:
(a) A value of P = 0.085 means that H0 must be accepted otherwise we risk an unacceptably high probability (more than 0.05) of error of Type 1.
(b) A value of P = 0.021 means that H0 must be rejected because the probability of error of Type 1 is quite low (less than 0.05).
Accordingly, if we had decided to work at CL 90%, in both cases (P = 0.085, P = 0.021) H0 would have been rejected, whereas if we had decided to work on CL 99%, in both cases H0 would have been accepted.
Applet
This applet provides a demonstration of the aforementioned Student's t test. The user can easily create two demo sets (A and B) of measurements by left-clicking on the provided areas. All demo data values (45
By clicking "CALCULATE" the number of measurements (N), the means and the standard deviations (SD) for each set are displayed in the next to the plots areas as shown below (the positions of the means and the +/-1 SD width are also shown in deep blue color):
whereas the texp and P values are shown in the t-test report text area:
In addition, this outcome is explicitly stated (although redundantly) in terms of positive or negative detection of any difference between the means at CL 90%, 95% and 99%.
The user can study the effect of measurements spreading and of the number of measurements on the final outcome. Thus, for means differing by the same amount, in case of low data spreading (small SD values) a significant difference may be reported, whereas in case of high data spreading (high SD values) no significant difference is reported.
This applet can be used (by clicking on the radiobutton "user's data") for testing actual numerical values provided by the user, acting exactly like a statistical analysis program performing this type of significance test.
Không có nhận xét nào:
Đăng nhận xét