Introduction
Hypothesis testing is a statistical method that is used to test whether a hypothesis is true or false. It is used in many fields, including climate science. In climate science, hypothesis testing is used to test whether human activities are causing climate change.

Here’s an example of how hypothesis testing can be used in climate science:

**Hypothesis:** Human activities are causing climate change.

**Null hypothesis:** Human activities are not causing climate change.

To test this hypothesis, scientists collect data on temperature changes and greenhouse gas emissions over time. They then use statistical methods to determine whether the observed changes are consistent with the hypothesis or not.

For example, scientists have found that the observed increase in global temperatures over the past century is consistent with the hypothesis that human activities are causing climate change^{1}. They have also found that the observed increase in atmospheric carbon dioxide levels is consistent with the hypothesis^{1}.

How does hypothesis test work?

Hypothesis testing involves four steps:

- State the null and alternative hypotheses.
- Select a significance level.
- Calculate the test statistic.
- Make a decision about the null hypothesis.

First, scientists form a hypothesis that they want to test. This hypothesis should be based on previous research and observations.

The significance level is the probability of rejecting the null hypothesis when it is actually true. This is also known as the alpha level. The significance level is typically set at 0.05, which means that there is a 5% chance of rejecting the null hypothesis when it is actually true.

The test statistic is a number that is calculated from the data and used to determine whether to reject the null hypothesis. The specific test statistic that is used depends on the type of hypothesis test being conducted.

The decision about the null hypothesis is made based on the test statistic and the significance level. If the test statistic is more extreme than a certain value, called the critical value, then the null hypothesis is rejected. Otherwise, the null hypothesis is not rejected.

What are the types of hypothesis tests?

There are many different types of hypothesis tests, but some of the most common ones include:

- Two-sample t-tests are used to compare the means of two populations.
- One-sample t-tests are used to compare the mean of a population to a known value.
- Chi-squared tests are used to compare two or more categorical variables.
- F-tests are used to compare the variances of two or more populations.

One common test is the t-test, which is used to compare two groups of data and determine whether they are significantly different from each other. Another common test is the chi-squared test, which is used to determine whether there is a significant association between two variables.

When should hypothesis testing be used?

Statisticians define two types of errors in hypothesis testing. Creatively, they call these errors Type I and Type II errors. Both types of error relate to incorrect conclusions about the null hypothesis. Statisticians define two types of errors in hypothesis testing. Creatively, they call these errors Type I and Type II errors. Both types of error relate to incorrect conclusions about the null hypothesis.

The significance level is an evidentiary standard that you set to determine whether your sample data are strong enough to reject the null hypothesis. Hypothesis tests define that standard using the probability of rejecting a null hypothesis that is actually true. You set this value based on your willingness to risk a false positive.

Using the significance level to set the Type I error rate

When the significance level is 0.05 and the null hypothesis is true, there is a 5% chance that the test will reject the null hypothesis incorrectly. If you set alpha to 0.01, there is a 1% of a false positive. If 5% is good, then 1% seems even better, right? As you’ll see, there is a tradeoff between Type I and Type II errors. If you hold everything else constant, as you reduce the chance for a false positive, you increase the opportunity for a false negative.

Type I errors are relatively straightforward. statisticians designed hypothesis tests to incorporate everything that affects this error rate so that you can specify it for your studies. As long as your experimental design is sound, you collect valid data, and the data satisfy the assumptions of the hypothesis test, the Type I error rate equals the significance level that you specify. However, if there is a problem in one of those areas, it can affect the false positive rate.

Warning about a potential misinterpretation of Type I errors and the Significance Level

When the null hypothesis is correct for the population, the probability that a test produces a false positive equal the significance level. However, when you look at a statistically significant test result, you cannot state that there is a 5% chance that it represents a false positive.

Why is that the case? Imagine that we perform 100 studies on a population where the null hypothesis is true. If we use a significance level of 0.05, we’d expect that five of the studies will produce statistically significant results—false positives. Afterward, when we go to look at those significant studies, what is the probability that each one is a false positive? Not 5 percent but 100%!

That scenario also illustrates a point that I made earlier. The true picture becomes more evident after repeated experimentation. Given the pattern of results that are predominantly not significant, it is unlikely that an effect exists in the population.

Type II Error: False Negatives

When you perform a hypothesis test and your p-value is greater than your significance level, your results are not statistically significant. That’s disappointing because your sample provides insufficient evidence for concluding that the effect you’re studying exists in the population. However, there is a chance that the effect is present in the population even though the test results don’t support it. If that’s the case, you’ve just experienced a Type II error. The probability of making a Type II error is known as beta (β).

What causes Type II errors? Whereas Type I errors are caused by one thing, sample error, there are a host of possible reasons for Type II errors—small effect sizes, small sample sizes, and high data variability. Furthermore, unlike Type I errors, you can’t set the Type II error rate for your analysis. Instead, the best that you can do is estimate it before you begin your study by approximating properties of the alternative hypothesis that you’re studying. When you do this type of estimation, it’s called power analysis.

To estimate the Type II error rate, you create a hypothetical probability distribution that represents the properties of a true alternative hypothesis. However, when you’re performing a hypothesis test, you typically don’t know which hypothesis is true, much less the specific properties of the distribution for the alternative hypothesis. Consequently, the true Type II error rate is usually unknown!

Fire alarm analogy for the types of errors

A fire alarm provides a good analogy for the types of hypothesis testing errors. Preferably, the alarm rings when there is a fire and does not ring in the absence of a fire. However, if the alarm rings when there is no fire, it is a false positive, or a Type I error in statistical terms. Conversely, if the fire alarm fails to ring when there is a fire, it is a false negative, or a Type II error.

Hypothesis testing should be used when you want to determine whether there is enough evidence to support a claim about a population. For example, you might want to use hypothesis testing to determine whether there is a difference in the average height of men and women, or whether a new treatment is effective in reducing the symptoms of a disease.

Conclusion

Hypothesis testing is a powerful statistical tool that can be used to make inferences about populations. However, it is important to use hypothesis testing correctly in order to avoid making incorrect conclusions.

Types I & Type II Errors in Hypothesis Testing

In hypothesis testing, a Type I error is a false positive while a Type II error is a false negative.

Hypothesis tests use sample data to make inferences about the properties of a population. You gain tremendous benefits by working with random samples because it is usually impossible to measure the entire population.

However, there are tradeoffs when you use samples. The samples we use are typically a minuscule percentage of the entire population. Consequently, they occasionally misrepresent the population severely enough to cause hypothesis tests to make Type I and Type II errors.

Potential Outcomes in Hypothesis Testing

Hypothesis testing is a procedure in inferential statistics that assesses two mutually exclusive theories about the properties of a population. For a generic hypothesis test, the two hypotheses are as follows:

Null hypothesis: There is no effect

Alternative hypothesis: There is an effect.

The sample data must provide sufficient evidence to reject the null hypothesis and conclude that the effect exists in the population. Ideally, a hypothesis test fails to reject the null hypothesis when the effect is not present in the population, and it rejects the null hypothesis when the effect exists.

Using hypothesis tests correctly improves your chances of drawing trustworthy conclusions. However, errors are bound to occur.

Unlike the fire alarm analogy, there is no sure way to determine whether an error occurred after you perform a hypothesis test. Typically, a clearer picture develops over time as other researchers conduct similar studies and an overall pattern of results appears. Seeing how your results fit in with similar studies is a crucial step in assessing your study’s findings.

Each type of error in more depth.

Type I Error: False Positives

When you see a p-value that is less than your significance level, you get excited because your results are statistically significant. However, it could be a type I error. The supposed effect might not exist in the population. Again, there is usually no warning when this occurs.

Why do these errors occur? It comes down to sample error. Your random sample has overestimated the effect by chance. It was the luck of the draw. This type of error doesn’t indicate that the researchers did anything wrong. The experimental design, data collection, data validity, and statistical analysis can all be correct, and yet this type of error still occurs.

Even though we don’t know for sure which studies have false positive results, we do know their rate of occurrence. The rate of occurrence for Type I errors equals the significance level of the hypothesis test, which is also known as alpha (α).

Type II errors and the power of the analysis

The Type II error rate (beta) is the probability of a false negative. Therefore, the inverse of Type II errors is the probability of correctly detecting an effect. Statisticians refer to this concept as the power of a hypothesis test. Consequently, 1 – β = the statistical power. Analysts typically estimate power rather than beta directly.

If you read my post about power and sample size analysis, you know that the three factors that affect power are sample size, variability in the population, and the effect size. As you design your experiment, you can enter estimates of these three factors into statistical software and it calculates the estimated power for your test.

Suppose you perform a power analysis for an upcoming study and calculate an estimated power of 90%. For this study, the estimated Type II error rate is 10% (1 – 0.9). Keep in mind that variability and effect size are based on estimates and guesses. Consequently, power and the Type II error rate are just estimates rather than something you set directly. These estimates are only as good as the inputs into your power analysis.

Low variability and larger effect sizes decrease the Type II error rate, which increases the statistical power. However, researchers usually have less control over those aspects of a hypothesis test. Typically, researchers have the most control over sample size, making it the critical way to manage your Type II error rate. Holding everything else constant, increasing the sample size reduces the Type II error rate and increases power.

One Error Worse Than the Other?

As you’ve seen, the nature of the two types of error, their causes, and the certainty of their rates of occurrence are all very different.

A common question is whether one type of error is worse than the other? Statisticians designed hypothesis tests to control Type I errors while Type II errors are much less defined. Consequently, many statisticians state that it is better to fail to detect an effect when it exists than it is to conclude an effect exists when it doesn’t. That is to say, there is a tendency to assume that Type I errors are worse.

However, reality is more complex than that. You should carefully consider the consequences of each type of error for your specific test.

Suppose you are assessing the strength of a new jet engine part that is under consideration. Peoples lives are riding on the part’s strength. A false negative in this scenario merely means that the part is strong enough but the test fails to detect it. This situation does not put anyone’s life at risk. On the other hand, Type I errors are worse in this situation because they indicate the part is strong enough when it is not.

Now suppose that the jet engine part is already in use but there are concerns about it failing. In this case, you want the test to be more sensitive to detecting problems even at the risk of false positives. Type II errors are worse in this scenario because the test fails to recognize the problem and leaves these problematic parts in use for longer.

Using hypothesis tests effectively requires that you understand their error rates. By setting the significance level and estimating your test’s power, you can manage both error rates so they meet your requirements.

6a.1 – Introduction to Hypothesis Testing

Basic Terms

The first step in hypothesis testing is to set up two competing hypotheses. The hypotheses are the most important aspect. If the hypotheses are incorrect, your conclusion will also be incorrect.

The two hypotheses are named the null hypothesis and the alternative hypothesis.

Null hypothesis

The null hypothesis is typically denoted as

. The null hypothesis states the “status quo”. This hypothesis is assumed to be true until there is evidence to suggest otherwise.

Alternative hypothesis

The alternative hypothesis is typically denoted as

or

. This is the statement that one wants to conclude. It is also called the research hypothesis.

The goal of hypothesis testing is to see if there is enough evidence against the null hypothesis. In other words, to see if there is enough evidence to reject the null hypothesis. If there is not enough evidence, then we fail to reject the null hypothesis.

Consider the following example where we set up these hypotheses.

Example 6-1

A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or innocent. Set up the null and alternative hypotheses for this example.

Answer

Putting this in a hypothesis testing framework, the hypotheses being tested are:

The man is guilty

The man is innocent

Let’s set up the null and alternative hypotheses.

Mr. Orangejuice is innocent

Mr. Orangejuice is guilty

Remember that we assume the null hypothesis is true and try to see if we have evidence against the null. Therefore, it makes sense in this example to assume the man is innocent and test to see if there is evidence that he is guilty.

The Logic of Hypothesis Testing

We want to know the answer to a research question. We determine our null and alternative hypotheses. Now it is time to make a decision.

The decision is either going to be…

reject the null hypothesis or…

fail to reject the null hypothesis.

Note! Why can’t we say we “accept the null”? The reason is that we are assuming the null hypothesis is true and trying to see if there is evidence against it. Therefore, the conclusion should be in terms of rejecting the null.

Consider the following table. The table shows the decision/conclusion of the hypothesis test and the unknown “reality”, or truth. We do not know if the null is true or if it is false. If the null is false and we reject it, then we made the correct decision. If the null hypothesis is true and we fail to reject it, then we made the correct decision.

Decision Reality

is true

is false

Reject

, (conclude

) Correct decision

Fail to reject

Correct decision

So what happens when we do not make the correct decision?

When doing hypothesis testing, two types of mistakes may be made and we call them Type I error and Type II error. If we reject the null hypothesis when it is true, then we made a type I error. If the null hypothesis is false and we failed to reject it, we made another error called a Type II error.

Decision Reality

is true

is false

Reject

, (conclude

) Type I error Correct decision

Fail to reject

Correct decision Type II error

Types of errors

Type I error

When we reject the null hypothesis when the null hypothesis is true.

Type II error

When we fail to reject the null hypothesis when the null hypothesis is false.

The “reality”, or truth, about the null hypothesis is unknown and therefore we do not know if we have made the correct decision or if we committed an error. We can, however, define the likelihood of these events.

(‘Alpha’)

The probability of committing a Type I error. Also known as the significance level.

(‘Beta’)

The probability of committing a Type II error.

Power

Power is the probability the null hypothesis is rejected given that it is false (ie.

)

and

are probabilities of committing an error so we want these values to be low. However, we cannot decrease both. As

decreases,

increases.

Note! Type I error is also thought of as the event that we reject the null hypothesis GIVEN the null is true. In other words, Type I error is a conditional event and

is a conditional probability. The same idea applies to Type II error and

.

Example 6-1 Cont’d…

A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or not guilty. We found before that…

Mr. Orangejuice is innocent

Mr. Orangejuice is guilty

Interpret Type I error,

, Type II error,

.

Answer

Type I Error:

Type I error is committed if we reject

when it is true. In other words, when the man is innocent but found guilty.

:

is the probability of a Type I error, or in other words, it is the probability that Mr. Orangejuice is innocent but found guilty.

Type II Error:

Type II error is committed if we fail to reject

when it is false. In other words, when the man is guilty but found not guilty.

:

is the probability of a Type II error, or in other words, it is the probability that Mr. Orangejuice is guilty but found not guilty.

As you can see here, the Type I error (putting an innocent man in jail) is the more serious error. Ethically, it is more serious to put an innocent man in jail than to let a guilty man go free. So to minimize the probability of a type I error we would choose a smaller significance level.

]]>