Hypothesis Testing
ABSTRACT
Hypothesis testing involves formulating statements about relationships between two or more variables and then stating the hypothesis of interest. A null hypothesis is used to describe what is true if the hypothesis of interest is incorrect. An appropriate statistical test is then conducted. The results are used to decide whether or not to accept or reject the null hypothesis.
Chance occurrence?
In any given population (e.g. a group of 3-year-olds in a nursery; all adults who had a stroke last year; children receiving speech therapy for stuttering in a particular clinic) there are likely to be random variations, i.e. characteristics that occur simply by chance. Now, suppose that we have been providing speech therapy for stuttering to a group of children and we observe that it is the females who progress more rapidly. Can we assume that these seeming gender differences are real, i.e. that is the child’s sex that is the influencing factor? Indeed, can we even be certain that it is our intervention program that has caused the improvements (whether for males or females)? Perhaps the improvements would have come about anyway, without our intervention? And maybe the sex of the child has nothing to do with causing the apparent differences?
In situations such as this, whenever we need to determine whether or not some observed characteristic of a population is real and not a chance occurrence (because we just happen to have selected this particular group to study) then we need to test a hypothesis.
Definition of hypothesis
A hypothesis is an assertion about the relationship between two or more variables or concepts. Here are some examples:
b > g
boys use more interruptions in conversations than girls
boys use a greater range of verbs than girls in story-telling
b < g
boys use fewer questions in conversations than girls
boys are less likely to engage in conversations with strangers than girls
b = g
boys and girls are as equally likely to present with dyspraxia
there is no difference between boys’ and girls’ fluency when speaking on the telephone
b <> g
boys and girls differ in their ability to use past tenses of regular verbs
boys and girls do not use the same range of nouns in their written work
Testing hypotheses
In order to perform statistical tests hypotheses must be stated explicitly in terms precise enough to allow the collection of relevant data which can then be tested. Reconsider one of the examples given above:
‘boys use a greater range of verbs than girls in storytelling’
This statement is insufficiently explicit: it lacks the necessary precision. For example, what is meant by ‘boys’? A three-month-old male baby is a boy but is unlikely to be included in this statement, as we would not expect a neonate to be able to engage in ‘storytelling’. And when does a male stop being a boy? At age 16 years? 18 years? Similar questions will be posed about the definition of ‘girls’. And what is meant by ‘verbs’? Are only lexical verbs to be included? Just auxiliary verbs? Both? The point here is that each variable in the statement needs to be operationalized, i.e. precisely defined so that it can be measured and/or expressed quantitatively or qualitatively.
Once the variables have been fully operationalized the process of testing a hypothesis is as follows.
1. State the hypothesis
This is the so-called hypothesis of interest and is a statement about what we believe is actually true for the population we are studying. For example:
‘Second Grade children with dyspraxia and Second Grade children without any speech-language impairments differ in their average oral reading comprehension scores on the Gray Oral Reading Tests (GORT-4)’
2. State the null hypothesis
This describes what is true in the population if the hypothesis of interest is incorrect, i.e. when it is null. It is used as a frame of reference to evaluate the hypothesis of interest. For example:
‘Second Grade children with dyspraxia and Second Grade children without any speech-language impairments do not differ in their average oral reading comprehension scores on the Gray Oral Reading Tests (GORT-4)’
3. Perform a statistical test
In our example hypothesis we recognize two groups:
- the group of children with dyspraxia
- the group of children with no speech-language impairments
If there is a difference in the scores (for oral reading comprehension) for each of group then an appropriate statistical test is used. This calculates the probability that we would observe a difference at least as large as the one we have observed in our sample if the null hypothesis is true, i.e. if there is no difference between the groups.
In our example we might use a test such as the Student’s t-test. This is a robust test used to determine whether the means of two samples (children with dyspraxia; children with no speech-language impairments) are significantly different.
4. Decide
Reject the null hypothesis
Only reject the null hypothesis if the probability is small – usually only if the odds against the result being obtained by chance are more than 20:1, i.e. a probability of less than 0.05. If the probability is indeed less than 0.05 then the result is said to be statistically significant and we can, therefore, reject the null hypothesis.
We are, therefore, claiming that the assertion that there is no difference between the groups is incorrect – we are rejecting the assertion. Consequently, we are claiming that we have evidence of an alternative hypothesis that there is indeed a difference between the groups. Of course, this alternative hypothesis is the hypothesis of interest that we stated at the outset, i.e. that there is a difference between the groups.
do not reject the null hypothesis
Do not reject the null hypothesis if the probability is large, i.e. if the odds against the result being obtained by chance are less than 20:1, i.e. a probability of more than 0.05.
Now, just because we do not reject the null hypothesis this does not mean that, by default, we accept it. We simply do not reject the null hypothesis. We remain uncertain as to whether or not the groups differ. This is for two reasons:
- it may be that there is, in fact, no difference between the groups
- there may indeed be an essential difference between the groups but we have been unable to detect it – this can often occur if the sample sizes are too small
Interpretation
Just because we are able to detect a statistically significant difference between groups with respect to some characteristic, this does not necessarily mean that the result is important. It may have little or no practical value whatsoever. Some differences are so small that they are only detectable through statistical testing but, to all intents and purposes, they would go unnoticed in everyday situations.
Reference
Wiederholt, J.L and Bryant, B.R. (2001) Gray Oral Reading Test-Fourth Edition (GORT-4) San Antonio, TX: Pearson.