Core inference · Regression
Coefficient of Determination (R²) Calculator
R², the coefficient of determination, is the share of the variation in an outcome that a model accounts for — 0 means it explains nothing, 1 means it explains everything. It is the most quoted goodness-of-fit number and the most over-read: a high R² says the line tracks the points, not that the model is right, the relationship causal, or the predictions unbiased. This computes it from your data or from observed-versus-predicted values, splits the variance into explained and unexplained, and is blunt about what the number cannot tell you.
Fits a straight line by least squares and reports R² — for a simple linear fit this equals the square of the correlation r.
Works for any model’s predictions, not just a line: R² = 1 − SSE ∕ SST. Here R² can even go negative, when the model fits worse than a flat mean.
Result
In plain English
Start with how much the outcome varies on its own — the spread of the y values around their average. A model tries to account for some of that spread. R² is the fraction it manages: the variation the model explains, divided by the total variation there was to explain. An R² of 0.75 means three-quarters of the ups and downs in the data line up with the model, and a quarter are left over as noise the model does not capture.
- coefficient of determination (R²)
- The proportion of variance in the outcome explained by the model: R² = SSR ∕ SST = 1 − SSE ∕ SST. Runs from 0 (explains nothing) to 1 (explains everything).
- total variation (SST)
- Σ(y − ȳ)² — how much the outcomes vary around their own mean. The baseline a model has to beat.
- explained vs residual (SSR, SSE)
- SSR = Σ(ŷ − ȳ)² is the part the model captures; SSE = Σ(y − ŷ)² is the leftover. They add up to SST, and R² is the explained share.
- adjusted R²
- R² never falls when you add a predictor, even a useless one, so adjusted R² docks it for each variable used — the honest number to compare models of different size.
- what R² is NOT
- Not a verdict that the model is correct, the relationship causal, or the predictions unbiased. A systematically wrong model can score high; a perfectly real but noisy effect can score low.
- r vs R²
- For a simple straight-line fit, R² is exactly the square of Pearson’s correlation r. With more than one predictor, R² generalises but is no longer any single correlation squared.
Frequently asked
How do you calculate R² (the coefficient of determination)?
R² = 1 − SSE ∕ SST, where SST = Σ(y − ȳ)² is the total variation of the outcome around its mean and SSE = Σ(y − ŷ)² is the variation left unexplained by the model’s predictions ŷ. Equivalently R² = SSR ∕ SST, the explained share. For a simple linear regression you can also just square the correlation: R² = r². The result is the fraction of the outcome’s variance the model accounts for, between 0 and 1.
What is a “good” R²?
It depends entirely on the field. In physics a fitted law might be expected to reach 0.99; in human behaviour an R² of 0.3 can be a genuine, useful finding, because people are noisy. So there is no universal threshold — a “high” R² in one domain is a failure in another. More importantly, a high R² is not the goal in itself: it does not mean the model is correct or the predictions trustworthy, only that the fitted curve tracks this particular sample of points.
Can R² be negative?
From a fitted least-squares regression on the same data, no — it lands between 0 and 1. But when you score an independent model’s predictions with R² = 1 − SSE ∕ SST (the observed-versus-predicted mode here), it can go below zero: that happens when the model’s errors are larger than simply predicting the mean every time. A negative R² is a blunt signal that the model is actively worse than doing nothing.
What is the difference between R² and adjusted R²?
Plain R² can only rise — or stay flat — as you add predictors, because every extra variable gives the fit a little more room, even if it is pure noise. That makes R² useless for comparing models with different numbers of predictors. Adjusted R² penalises each predictor used, so it rises only if the new variable earns its keep, and can fall if it does not. Use adjusted R² (or out-of-sample testing) when choosing between models; use plain R² to describe a single fit.