Skip to main content
📈

Linear Regression Calculator

Calculate linear regression equation, R-squared, correlation coefficient, and predictions. Perform least squares regression with step-by-step analysis and residual statistics.

About This Calculator

Linear regression is the foundation of predictive modeling - it finds the best-fitting straight line through a set of data points. This calculator performs simple linear regression (one predictor), calculating the regression equation, correlation coefficient, R-squared, and making predictions with confidence intervals.

What is Linear Regression? Linear regression models the relationship between a dependent variable (Y) and one or more independent variables (X) using a straight line: y = mx + b (or y = β₀ + β₁x in statistics notation).

The Least Squares Method: The "best fit" line minimizes the sum of squared residuals (differences between observed and predicted values). This is why it's called "least squares regression."

Key Output Measures:

  • Slope (β₁): The change in Y for each unit change in X
  • Intercept (β₀): The predicted Y when X equals zero
  • R² (coefficient of determination): Proportion of variance explained by the model
  • r (correlation): Strength and direction of the linear relationship

When to Use Linear Regression:

  • Predicting outcomes based on one predictor
  • Understanding relationships between variables
  • Making forecasts from historical data
  • Quality control and process improvement

This calculator handles simple linear regression. For correlation analysis, see our Correlation Calculator. For hypothesis testing, try our T-Test Calculator.

How to Use the Linear Regression Calculator

  1. 1Enter your X values (independent variable) separated by commas or spaces.
  2. 2Enter your Y values (dependent variable) in the same order.
  3. 3Ensure both lists have the same number of values.
  4. 4Review the regression equation (y = mx + b).
  5. 5Check R-squared to see how well the model fits.
  6. 6Examine the correlation coefficient (r) for relationship strength.
  7. 7Optionally enter an X value to get a predicted Y.
  8. 8Check the residuals table for potential outliers.
  9. 9Review coefficient statistics for significance testing.
  10. 10Use the ANOVA table for overall model evaluation.

The Regression Equation

Understanding the line of best fit.

The Formula

ŷ = β₀ + β₁x

Where:

  • ŷ (y-hat) = predicted value of Y
  • β₀ = y-intercept (value when x = 0)
  • β₁ = slope (change in y per unit change in x)
  • x = independent variable value

Calculating the Coefficients

Slope (β₁): β₁ = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)²

Intercept (β₀): β₀ = ȳ - β₁x̄

Where x̄ and ȳ are the sample means.

Example

XY
12.1
24.3
35.8
48.2
59.9

Calculations:

  • x̄ = 3, ȳ = 6.06
  • Σ(xi - x̄)(yi - ȳ) = 15.6
  • Σ(xi - x̄)² = 10
  • β₁ = 15.6 / 10 = 1.96
  • β₀ = 6.06 - 1.96(3) = 0.18

Equation: ŷ = 0.18 + 1.96x

Interpretation

Slope = 1.96: Each 1-unit increase in X is associated with a 1.96-unit increase in Y.

Intercept = 0.18: When X = 0, the predicted Y is 0.18 (may or may not be meaningful depending on context).

R-Squared and Model Fit

Measuring how well the regression fits the data.

What is R²?

R² = 1 - (SS_residual / SS_total)

Or equivalently: R² = SS_regression / SS_total

Where:

  • SS_total = Σ(yi - ȳ)² (total variance)
  • SS_regression = Σ(ŷi - ȳ)² (explained variance)
  • SS_residual = Σ(yi - ŷi)² (unexplained variance)

Interpretation

R² represents the proportion of variance in Y explained by X.

R² ValueInterpretation
0.00-0.25Very weak fit
0.25-0.50Moderate fit
0.50-0.75Good fit
0.75-0.90Strong fit
0.90-1.00Excellent fit

Example: R² = 0.85 means 85% of the variation in Y is explained by the linear relationship with X.

Adjusted R²

Adjusted R² = 1 - [(1-R²)(n-1)/(n-p-1)]

Where p = number of predictors.

Adjusted R² penalizes for adding predictors that don't improve the model. For simple regression (one predictor), it's slightly lower than R².

Cautions

  • R² doesn't indicate if a model is correct
  • High R² doesn't mean causation
  • Nonlinear relationships may have low R² but strong association
  • Always examine residual plots

Correlation Coefficient

Understanding the relationship between r and R².

Correlation (r)

r = Σ(xi - x̄)(yi - ȳ) / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]

Range: -1 to +1

Interpretation

r ValueInterpretation
0.9 to 1.0Very strong positive
0.7 to 0.9Strong positive
0.5 to 0.7Moderate positive
0.3 to 0.5Weak positive
0 to 0.3Very weak/no correlation
-0.3 to 0Very weak/no correlation
-0.5 to -0.3Weak negative
-0.7 to -0.5Moderate negative
-0.9 to -0.7Strong negative
-1.0 to -0.9Very strong negative

Relationship with R²

For simple linear regression: R² = r²

Examples:

  • r = 0.9 → R² = 0.81
  • r = -0.8 → R² = 0.64
  • r = 0.5 → R² = 0.25

Testing Significance

Null hypothesis: ρ = 0 (no correlation in population)

Test statistic: t = r√(n-2) / √(1-r²)

with df = n - 2

A significant t-value suggests the correlation isn't zero in the population.

Residuals and Assumptions

Checking if the regression model is appropriate.

What are Residuals?

Residual = Actual - Predicted = yi - ŷi

Residuals represent the error or unexplained portion for each observation.

Properties of Good Residuals

1. Normality

Residuals should be approximately normally distributed.

  • Check: Histogram, Q-Q plot
  • Violation: May affect confidence intervals and tests

2. Constant Variance (Homoscedasticity)

Residuals should have constant spread across all X values.

  • Check: Plot residuals vs. predicted values
  • Violation (heteroscedasticity): Fan-shaped pattern

3. Independence

Residuals should be independent (no pattern).

  • Check: Plot residuals vs. order
  • Violation: Wavy pattern suggests autocorrelation

4. No Outliers

Large residuals may indicate outliers or influential points.

  • Check: Standardized residuals > |3| are concerning
  • Action: Investigate, don't automatically remove

Residual Plots to Create

  1. Residuals vs. Fitted Values: Check for patterns
  2. Residuals vs. X: Check for nonlinearity
  3. Normal Q-Q Plot: Check normality
  4. Residuals vs. Order: Check independence

What Violations Mean

Nonlinear pattern: Consider polynomial or other nonlinear models Funnel shape: Consider weighted regression or transformation Autocorrelation: May need time series methods

Making Predictions

Using the regression equation for forecasting.

Point Prediction

Simply plug X into the equation: ŷ = β₀ + β₁x

Example: If ŷ = 2.5 + 1.8x, then for x = 10: ŷ = 2.5 + 1.8(10) = 20.5

Confidence vs. Prediction Intervals

Confidence Interval for Mean Y

Estimates where the average Y falls for a given X.

CI = ŷ ± t(α/2) × SE(ŷ)

SE(ŷ) = s × √[1/n + (x-x̄)²/Σ(xi-x̄)²]

Prediction Interval for Individual Y

Estimates where a single new observation might fall.

PI = ŷ ± t(α/2) × SE(pred)

SE(pred) = s × √[1 + 1/n + (x-x̄)²/Σ(xi-x̄)²]

Key difference: Prediction intervals are always wider because they include individual variation.

Extrapolation Warning

Interpolation: Predicting within the range of your data (safe) Extrapolation: Predicting beyond your data range (risky!)

The relationship may not hold outside observed X values. Extrapolation assumes the linear trend continues, which may not be true.

Example Comparison

For x = 15 with the equation ŷ = 2.5 + 1.8x:

  • Point prediction: 29.5
  • 95% CI for mean: (28.2, 30.8)
  • 95% PI for individual: (24.1, 34.9)

Statistical Significance Testing

Testing if the regression relationship is real.

Testing the Slope

Null hypothesis: β₁ = 0 (no linear relationship)

Test statistic: t = β₁ / SE(β₁)

with df = n - 2

If |t| > t(critical), reject H₀ and conclude the slope is significant.

The F-Test (ANOVA)

Tests overall model significance.

F = MS_regression / MS_residual F = (SS_regression/1) / (SS_residual/(n-2))

For simple regression, F = t² (they give the same p-value).

ANOVA Table Structure

SourceSSdfMSF
RegressionSS_R1MS_RF
ResidualSS_resn-2MS_res
TotalSS_Tn-1

Interpreting p-values

For slope:

  • p < 0.05: Significant relationship (at 95% confidence)
  • p ≥ 0.05: Cannot conclude relationship exists

Caution:

  • Significance depends on sample size
  • Large samples can detect tiny, unimportant effects
  • Small samples may miss real effects
  • Always report effect size (R²) alongside p-value

Confidence Intervals for Coefficients

95% CI for slope: β₁ ± t(0.025, n-2) × SE(β₁)

If interval doesn't include 0, slope is significant at 0.05 level.

Pro Tips

  • 💡Always plot your data before running regression to check for linearity.
  • 💡Examine residual plots to validate model assumptions.
  • 💡Don't confuse correlation with causation - regression shows association, not cause.
  • 💡Check for outliers that might unduly influence the regression line.
  • 💡Report both R² and the slope to convey model fit and effect size.
  • 💡Be cautious about extrapolating predictions beyond your data range.
  • 💡Use prediction intervals (not confidence intervals) for individual predictions.
  • 💡Consider transforming variables if relationship appears nonlinear.
  • 💡Sample size matters - aim for at least 15-20 observations.
  • 💡A significant p-value doesn't mean the effect is large or important.
  • 💡Check that independent and dependent variables make theoretical sense.
  • 💡Remember that high R² doesn't validate the model - residuals do.

Frequently Asked Questions

Correlation measures the strength of linear association between two variables (r ranges from -1 to 1) without distinguishing independent/dependent variables. Regression goes further by fitting a line to predict Y from X, providing an equation for prediction. For simple linear regression, R² = r².

Nina Bao
Written byNina BaoContent Writer
Updated January 17, 2026

More Calculators You Might Like