Hypothesis testing · Comparing means

t-Test Calculator

One-sample, two-sample (Welch or Student), or paired t-tests — from summary statistics or your raw data. You get the t statistic, df, p-value, the confidence interval, an effect size, and a plain reading of what the p-value actually licenses you to say.

Input

x̄₁

s₁

n₁

x̄₂

s₂

n₂

Variances

Alternative hypothesis

Significance level α / CI

Result

In plain English

A t-test asks one simple question: is the difference you can see real, or could it just be random luck? It weighs the size of the difference against how noisy the data are.

t statistic: The difference between groups, measured in units of noise. The further from 0, the harder it is to wave away as chance.
p-value: If there were truly no difference, this is how often you'd see a gap at least this big just by luck. A small p (say under 0.05) means the “no difference” story looks unlikely.
degrees of freedom (df): Roughly how much information your data carry — it grows with sample size.
confidence interval: The plausible range for the real size of the difference.
Cohen's d: How big the difference is in practical terms (small / medium / large) — a separate question from whether it's statistically “significant.”

Frequently asked

What's the difference between a paired and a two-sample t-test?

A paired t-test compares two measurements on the same subjects (before/after) and tests the average within-pair difference. A two-sample t-test compares two independent groups. Using the wrong one invents or discards the pairing and changes the answer.

Should I use Welch's or Student's t-test?

Welch's, by default. It doesn't assume the two groups have equal variances, and it performs just as well as Student's when they do — so there's rarely a reason to prefer the equal-variance version.

What does a significant t-test tell me?

That the observed difference is larger than you'd comfortably expect from chance alone — not that it is large or important. Always read the effect size (Cohen's d) and the confidence interval alongside the p-value.

What are the assumptions of a t-test?

That the data are roughly normal — or the sample is large enough for the central limit theorem to take over — that observations are independent, and, for the two-sample test, either equal variances (Student’s) or not (Welch’s, the safer default). The t-test is fairly robust to mild non-normality, but heavy skew, clear outliers or dependent observations call for a different approach, such as a nonparametric test.

t-Test Calculator

Result

In plain English

Frequently asked

Watch out for

The P-Hacking Sandbox

The Winner's Curse

p-value ↔ statistic →

Degrees of Freedom →

t-Statistic →