T-Test Calculator
Perform one-sample, two-sample, and paired t-tests online. Calculate t-statistic, p-value, and confidence intervals with step-by-step statistical analysis.
One-Sample t-test: Compare sample mean to known population mean
Related Calculators
About This Calculator
The t-test is one of the most widely used statistical tests for comparing means. It helps determine whether there's a statistically significant difference between groups or whether a sample mean differs from a hypothesized value. This calculator performs one-sample, two-sample, and paired t-tests with complete statistical output.
What is a t-test? The t-test uses the t-distribution to test hypotheses about population means when the population standard deviation is unknown and the sample size is relatively small. It was developed by William Sealy Gosset (publishing under the pseudonym "Student") while working at Guinness Brewery.
Types of t-tests:
- One-sample t-test: Compare a sample mean to a known or hypothesized population mean
- Two-sample t-test: Compare means of two independent groups
- Paired t-test: Compare means of matched pairs (before/after, matched subjects)
When to use the t-test:
- Data is approximately normally distributed
- Data is continuous (interval/ratio scale)
- Sample size is sufficient (generally n β₯ 15-30)
- Observations are independent (except for paired test)
Key outputs:
- t-statistic: Measures how many standard errors the sample mean is from the hypothesized mean
- p-value: Probability of observing this result if null hypothesis is true
- Confidence interval: Range likely containing the true parameter
For comparing more than two groups, see our ANOVA Calculator. For correlation analysis, try our Correlation Calculator.
How to Use the T-Test Calculator
- 1Select the appropriate t-test type for your data.
- 2Choose the tail type (two-tailed for β , one-tailed for < or >).
- 3Set your significance level (Ξ±), typically 0.05.
- 4Enter the required statistics for your chosen test.
- 5Review the t-statistic and p-value.
- 6Check if the result is statistically significant.
- 7Examine the confidence interval.
- 8Consider the effect size (Cohen's d) for practical significance.
- 9Interpret results in context of your research question.
- 10Report all relevant statistics in your conclusions.
One-Sample t-test
Compare a sample mean to a known or hypothesized population value.
When to Use
Use one-sample t-test when you want to determine if a sample mean differs significantly from a specific value.
Examples:
- Is the average height of students different from the national average?
- Does the mean test score differ from 100?
- Is the average processing time different from the target?
The Formula
t = (xΜ - ΞΌβ) / (s / βn)
Where:
- xΜ = sample mean
- ΞΌβ = hypothesized population mean
- s = sample standard deviation
- n = sample size
- df = n - 1
Hypotheses
Two-tailed:
- Hβ: ΞΌ = ΞΌβ
- Hβ: ΞΌ β ΞΌβ
One-tailed (right):
- Hβ: ΞΌ β€ ΞΌβ
- Hβ: ΞΌ > ΞΌβ
Example
Research question: Is the average IQ of a group different from 100?
Data:
- Sample mean: 105.3
- Sample SD: 12.4
- Sample size: 36
Calculation:
- SE = 12.4 / β36 = 2.067
- t = (105.3 - 100) / 2.067 = 2.565
- df = 35
- p-value β 0.015
Conclusion: At Ξ± = 0.05, we reject Hβ and conclude the mean IQ is significantly different from 100.
Two-Sample t-test
Compare means of two independent groups.
When to Use
Use two-sample t-test when comparing means of two separate, unrelated groups.
Examples:
- Compare test scores between two teaching methods
- Compare recovery times between two treatments
- Compare salaries between two departments
Two Versions
Pooled (Equal Variance)
Assumes both groups have equal population variances.
Formula: t = (xΜβ - xΜβ) / (sp Γ β(1/nβ + 1/nβ))
Where sp = pooled standard deviation
Welch's (Unequal Variance)
Does not assume equal variances - generally more robust.
Formula: t = (xΜβ - xΜβ) / β(sβΒ²/nβ + sβΒ²/nβ)
Choosing Between Them
Use Welch's test when:
- Sample sizes are unequal
- Variances appear different
- You're uncertain about equal variance assumption
Rule of thumb: If larger variance / smaller variance > 2, use Welch's
Example
Research question: Do two medications have different effects on blood pressure?
Drug A: n=25, mean=-8.2 mmHg, SD=4.5 Drug B: n=28, mean=-5.1 mmHg, SD=5.2
Using Welch's test:
- SE = β(4.5Β²/25 + 5.2Β²/28) = 1.33
- t = (-8.2 - (-5.1)) / 1.33 = -2.33
- df β 50.5
- p-value β 0.024
Conclusion: Drug A shows significantly greater blood pressure reduction.
Paired t-test
Compare means of matched or paired observations.
When to Use
Use paired t-test when observations come in pairs:
- Before/after measurements on same subjects
- Matched pairs (twins, matched controls)
- Two measurements on each subject
The Concept
Instead of comparing two groups, analyze the differences within each pair.
Why it's more powerful:
- Controls for individual variation
- Each subject serves as their own control
- Reduces variability in the comparison
The Formula
t = dΜ / (sd / βn)
Where:
- dΜ = mean of differences
- sd = standard deviation of differences
- n = number of pairs
- df = n - 1
Calculating Differences
| Subject | Before | After | Difference (d) |
|---|---|---|---|
| 1 | 150 | 142 | -8 |
| 2 | 165 | 158 | -7 |
| 3 | 145 | 140 | -5 |
| ... | ... | ... | ... |
Mean difference: dΜ SD of differences: sd
Example
Research question: Does a weight loss program reduce weight?
Data: 20 participants, before and after weights
- Mean difference: -3.5 kg
- SD of differences: 2.8 kg
Calculation:
- SE = 2.8 / β20 = 0.626
- t = -3.5 / 0.626 = -5.59
- df = 19
- p-value < 0.001
Conclusion: The program produces significant weight loss.
Understanding p-values
Interpreting the probability value from t-tests.
What p-value Means
The p-value is the probability of obtaining results at least as extreme as observed, assuming the null hypothesis is true.
NOT:
- The probability Hβ is true
- The probability the result is due to chance
- The effect size
Interpreting p-values
| p-value | Interpretation |
|---|---|
| p < 0.001 | Very strong evidence against Hβ |
| p < 0.01 | Strong evidence against Hβ |
| p < 0.05 | Moderate evidence against Hβ |
| p < 0.10 | Weak evidence against Hβ |
| p β₯ 0.10 | Little evidence against Hβ |
Common Significance Levels (Ξ±)
- Ξ± = 0.05: Most common, good balance
- Ξ± = 0.01: More conservative, fewer false positives
- Ξ± = 0.10: More liberal, fewer false negatives
One-tailed vs. Two-tailed
Two-tailed (default): Tests for any difference (β )
- Use when direction of difference is unknown
One-tailed: Tests for specific direction (< or >)
- Use only when direction is predicted in advance
- p-value is half of two-tailed
Statistical vs. Practical Significance
Important: A statistically significant result may not be practically meaningful!
Example:
- Treatment reduces pain by 0.5 points (p = 0.01)
- Statistically significant? Yes
- Practically meaningful? Maybe not (small effect)
Always report effect size alongside p-value!
Effect Size: Cohen's d
Measuring the magnitude of the difference.
What is Effect Size?
Effect size quantifies the magnitude of a result independent of sample size. Cohen's d is the most common measure for mean differences.
Formula
Cohen's d = (Meanβ - Meanβ) / Pooled SD
For one-sample: d = (xΜ - ΞΌβ) / s
Interpretation
| Cohen's d | Interpretation | Example |
|---|---|---|
| 0.2 | Small | Barely noticeable |
| 0.5 | Medium | Noticeable |
| 0.8 | Large | Obvious |
| 1.2 | Very large | Substantial |
| 2.0 | Huge | Massive |
Why Effect Size Matters
Scenario 1: Large sample, small effect
- n = 10,000
- Difference = 1 point
- p < 0.001 (significant!)
- d = 0.1 (trivial effect)
Scenario 2: Small sample, large effect
- n = 20
- Difference = 15 points
- p = 0.06 (not significant)
- d = 1.5 (large effect)
Reporting Standards
Always report:
- Descriptive statistics (means, SDs, n)
- Test statistic (t)
- Degrees of freedom
- p-value
- Effect size (Cohen's d)
- Confidence interval
Example report: "The treatment group (M = 45.2, SD = 8.5, n = 25) scored significantly higher than the control group (M = 38.7, SD = 9.1, n = 28), t(51) = 2.78, p = .008, d = 0.74, 95% CI [1.77, 11.23]."
Assumptions and Violations
Understanding when t-tests are valid.
Key Assumptions
1. Normality
Data should be approximately normally distributed.
Checking:
- Histograms/Q-Q plots
- Shapiro-Wilk test
Violation handling:
- t-tests are robust to mild violations
- Use non-parametric tests (Mann-Whitney, Wilcoxon) for severe violations
- Large samples (n > 30) are generally fine
2. Independence
Observations should be independent.
Exceptions:
- Paired t-test handles dependence within pairs
- Two-sample: groups must be independent
Violation handling:
- Use paired test for matched data
- Consider mixed models for complex dependencies
3. Equal Variance (Two-sample)
Groups should have similar variances.
Checking:
- Levene's test
- Compare SD ratio
Violation handling:
- Use Welch's t-test
- Generally recommended as default
Sample Size Considerations
| Situation | Minimum n per group |
|---|---|
| Normal data, equal n | 10-15 |
| Slightly non-normal | 20-25 |
| Unequal groups | 15-20 in smaller group |
| Very non-normal | Use non-parametric |
Power Analysis
Before conducting study, determine needed sample size:
For Ξ± = 0.05, power = 0.80:
| Effect Size (d) | n per group |
|---|---|
| Small (0.2) | ~400 |
| Medium (0.5) | ~65 |
| Large (0.8) | ~25 |
Pro Tips
- π‘Always visualize your data before running statistical tests.
- π‘Report effect size (Cohen's d) alongside p-values for complete interpretation.
- π‘Use Welch's t-test as default for two-sample comparisons.
- π‘Check assumptions: normality, independence, and equal variance when required.
- π‘One-tailed tests require a priori justification - don't choose based on data.
- π‘Large sample sizes can make trivial differences statistically significant.
- π‘Confidence intervals provide more information than p-values alone.
- π‘Use paired t-test when you have matched or repeated measurements.
- π‘Statistical significance doesn't imply practical importance.
- π‘Plan sample size before collecting data using power analysis.
- π‘Report complete statistics: means, SDs, n, t, df, p, d, and CI.
- π‘Consider multiple testing corrections when running many t-tests.
Frequently Asked Questions
A two-tailed test checks if the mean differs in either direction (β ), while a one-tailed test checks only one direction (< or >). Two-tailed is the default and more conservative. Use one-tailed only when you have a strong theoretical reason to predict the direction of the effect before seeing the data.

