Linear Regression Calculator
Calculate linear regression equation, R-squared, correlation coefficient, and predictions. Perform least squares regression with step-by-step analysis and residual statistics.
Related Calculators
About This Calculator
Linear regression is the foundation of predictive modeling - it finds the best-fitting straight line through a set of data points. This calculator performs simple linear regression (one predictor), calculating the regression equation, correlation coefficient, R-squared, and making predictions with confidence intervals.
What is Linear Regression? Linear regression models the relationship between a dependent variable (Y) and one or more independent variables (X) using a straight line: y = mx + b (or y = β₀ + β₁x in statistics notation).
The Least Squares Method: The "best fit" line minimizes the sum of squared residuals (differences between observed and predicted values). This is why it's called "least squares regression."
Key Output Measures:
- Slope (β₁): The change in Y for each unit change in X
- Intercept (β₀): The predicted Y when X equals zero
- R² (coefficient of determination): Proportion of variance explained by the model
- r (correlation): Strength and direction of the linear relationship
When to Use Linear Regression:
- Predicting outcomes based on one predictor
- Understanding relationships between variables
- Making forecasts from historical data
- Quality control and process improvement
This calculator handles simple linear regression. For correlation analysis, see our Correlation Calculator. For hypothesis testing, try our T-Test Calculator.
How to Use the Linear Regression Calculator
- 1Enter your X values (independent variable) separated by commas or spaces.
- 2Enter your Y values (dependent variable) in the same order.
- 3Ensure both lists have the same number of values.
- 4Review the regression equation (y = mx + b).
- 5Check R-squared to see how well the model fits.
- 6Examine the correlation coefficient (r) for relationship strength.
- 7Optionally enter an X value to get a predicted Y.
- 8Check the residuals table for potential outliers.
- 9Review coefficient statistics for significance testing.
- 10Use the ANOVA table for overall model evaluation.
The Regression Equation
Understanding the line of best fit.
The Formula
ŷ = β₀ + β₁x
Where:
- ŷ (y-hat) = predicted value of Y
- β₀ = y-intercept (value when x = 0)
- β₁ = slope (change in y per unit change in x)
- x = independent variable value
Calculating the Coefficients
Slope (β₁): β₁ = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)²
Intercept (β₀): β₀ = ȳ - β₁x̄
Where x̄ and ȳ are the sample means.
Example
| X | Y |
|---|---|
| 1 | 2.1 |
| 2 | 4.3 |
| 3 | 5.8 |
| 4 | 8.2 |
| 5 | 9.9 |
Calculations:
- x̄ = 3, ȳ = 6.06
- Σ(xi - x̄)(yi - ȳ) = 15.6
- Σ(xi - x̄)² = 10
- β₁ = 15.6 / 10 = 1.96
- β₀ = 6.06 - 1.96(3) = 0.18
Equation: ŷ = 0.18 + 1.96x
Interpretation
Slope = 1.96: Each 1-unit increase in X is associated with a 1.96-unit increase in Y.
Intercept = 0.18: When X = 0, the predicted Y is 0.18 (may or may not be meaningful depending on context).
R-Squared and Model Fit
Measuring how well the regression fits the data.
What is R²?
R² = 1 - (SS_residual / SS_total)
Or equivalently: R² = SS_regression / SS_total
Where:
- SS_total = Σ(yi - ȳ)² (total variance)
- SS_regression = Σ(ŷi - ȳ)² (explained variance)
- SS_residual = Σ(yi - ŷi)² (unexplained variance)
Interpretation
R² represents the proportion of variance in Y explained by X.
| R² Value | Interpretation |
|---|---|
| 0.00-0.25 | Very weak fit |
| 0.25-0.50 | Moderate fit |
| 0.50-0.75 | Good fit |
| 0.75-0.90 | Strong fit |
| 0.90-1.00 | Excellent fit |
Example: R² = 0.85 means 85% of the variation in Y is explained by the linear relationship with X.
Adjusted R²
Adjusted R² = 1 - [(1-R²)(n-1)/(n-p-1)]
Where p = number of predictors.
Adjusted R² penalizes for adding predictors that don't improve the model. For simple regression (one predictor), it's slightly lower than R².
Cautions
- R² doesn't indicate if a model is correct
- High R² doesn't mean causation
- Nonlinear relationships may have low R² but strong association
- Always examine residual plots
Correlation Coefficient
Understanding the relationship between r and R².
Correlation (r)
r = Σ(xi - x̄)(yi - ȳ) / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]
Range: -1 to +1
Interpretation
| r Value | Interpretation |
|---|---|
| 0.9 to 1.0 | Very strong positive |
| 0.7 to 0.9 | Strong positive |
| 0.5 to 0.7 | Moderate positive |
| 0.3 to 0.5 | Weak positive |
| 0 to 0.3 | Very weak/no correlation |
| -0.3 to 0 | Very weak/no correlation |
| -0.5 to -0.3 | Weak negative |
| -0.7 to -0.5 | Moderate negative |
| -0.9 to -0.7 | Strong negative |
| -1.0 to -0.9 | Very strong negative |
Relationship with R²
For simple linear regression: R² = r²
Examples:
- r = 0.9 → R² = 0.81
- r = -0.8 → R² = 0.64
- r = 0.5 → R² = 0.25
Testing Significance
Null hypothesis: ρ = 0 (no correlation in population)
Test statistic: t = r√(n-2) / √(1-r²)
with df = n - 2
A significant t-value suggests the correlation isn't zero in the population.
Residuals and Assumptions
Checking if the regression model is appropriate.
What are Residuals?
Residual = Actual - Predicted = yi - ŷi
Residuals represent the error or unexplained portion for each observation.
Properties of Good Residuals
1. Normality
Residuals should be approximately normally distributed.
- Check: Histogram, Q-Q plot
- Violation: May affect confidence intervals and tests
2. Constant Variance (Homoscedasticity)
Residuals should have constant spread across all X values.
- Check: Plot residuals vs. predicted values
- Violation (heteroscedasticity): Fan-shaped pattern
3. Independence
Residuals should be independent (no pattern).
- Check: Plot residuals vs. order
- Violation: Wavy pattern suggests autocorrelation
4. No Outliers
Large residuals may indicate outliers or influential points.
- Check: Standardized residuals > |3| are concerning
- Action: Investigate, don't automatically remove
Residual Plots to Create
- Residuals vs. Fitted Values: Check for patterns
- Residuals vs. X: Check for nonlinearity
- Normal Q-Q Plot: Check normality
- Residuals vs. Order: Check independence
What Violations Mean
Nonlinear pattern: Consider polynomial or other nonlinear models Funnel shape: Consider weighted regression or transformation Autocorrelation: May need time series methods
Making Predictions
Using the regression equation for forecasting.
Point Prediction
Simply plug X into the equation: ŷ = β₀ + β₁x
Example: If ŷ = 2.5 + 1.8x, then for x = 10: ŷ = 2.5 + 1.8(10) = 20.5
Confidence vs. Prediction Intervals
Confidence Interval for Mean Y
Estimates where the average Y falls for a given X.
CI = ŷ ± t(α/2) × SE(ŷ)
SE(ŷ) = s × √[1/n + (x-x̄)²/Σ(xi-x̄)²]
Prediction Interval for Individual Y
Estimates where a single new observation might fall.
PI = ŷ ± t(α/2) × SE(pred)
SE(pred) = s × √[1 + 1/n + (x-x̄)²/Σ(xi-x̄)²]
Key difference: Prediction intervals are always wider because they include individual variation.
Extrapolation Warning
Interpolation: Predicting within the range of your data (safe) Extrapolation: Predicting beyond your data range (risky!)
The relationship may not hold outside observed X values. Extrapolation assumes the linear trend continues, which may not be true.
Example Comparison
For x = 15 with the equation ŷ = 2.5 + 1.8x:
- Point prediction: 29.5
- 95% CI for mean: (28.2, 30.8)
- 95% PI for individual: (24.1, 34.9)
Statistical Significance Testing
Testing if the regression relationship is real.
Testing the Slope
Null hypothesis: β₁ = 0 (no linear relationship)
Test statistic: t = β₁ / SE(β₁)
with df = n - 2
If |t| > t(critical), reject H₀ and conclude the slope is significant.
The F-Test (ANOVA)
Tests overall model significance.
F = MS_regression / MS_residual F = (SS_regression/1) / (SS_residual/(n-2))
For simple regression, F = t² (they give the same p-value).
ANOVA Table Structure
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Regression | SS_R | 1 | MS_R | F |
| Residual | SS_res | n-2 | MS_res | |
| Total | SS_T | n-1 |
Interpreting p-values
For slope:
- p < 0.05: Significant relationship (at 95% confidence)
- p ≥ 0.05: Cannot conclude relationship exists
Caution:
- Significance depends on sample size
- Large samples can detect tiny, unimportant effects
- Small samples may miss real effects
- Always report effect size (R²) alongside p-value
Confidence Intervals for Coefficients
95% CI for slope: β₁ ± t(0.025, n-2) × SE(β₁)
If interval doesn't include 0, slope is significant at 0.05 level.
Pro Tips
- 💡Always plot your data before running regression to check for linearity.
- 💡Examine residual plots to validate model assumptions.
- 💡Don't confuse correlation with causation - regression shows association, not cause.
- 💡Check for outliers that might unduly influence the regression line.
- 💡Report both R² and the slope to convey model fit and effect size.
- 💡Be cautious about extrapolating predictions beyond your data range.
- 💡Use prediction intervals (not confidence intervals) for individual predictions.
- 💡Consider transforming variables if relationship appears nonlinear.
- 💡Sample size matters - aim for at least 15-20 observations.
- 💡A significant p-value doesn't mean the effect is large or important.
- 💡Check that independent and dependent variables make theoretical sense.
- 💡Remember that high R² doesn't validate the model - residuals do.
Frequently Asked Questions
Correlation measures the strength of linear association between two variables (r ranges from -1 to 1) without distinguishing independent/dependent variables. Regression goes further by fitting a line to predict Y from X, providing an equation for prediction. For simple linear regression, R² = r².

