SST = SSR+SSE
where, each term represents following items.
- Total sum of squares, SST: the total variation in the observed values of the reponse variable.
- Regression sum of squares, SSR: the variation in the observed values of the response variable explained by the regression
- Error sum of squares, SSE: the variation in the observed values of the response variable not explained by the regression.
Also, the coefficient of determination, r^2 is
R^2 = SSR/SST
There is a command 'fit' in gnuplot.
The tricky part is to obtain SST using the fit command, which is a curve fitting, but produces some useful variables, FIT_WSSR and FIT_NDF. For details, refer to gnuplot tricks: Basic statistics.
mean(x)= m
fit mean(x) 'your data file' using 1:2 via m # 1 is the x axis and 2 is the y axis
SST = FIT_WSSR/(FIT_NDF+1)
f(x) = a*x + b
fit f(x) 'your data file' using 1:2 via a, b
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST
set label sprintf("f(x)=%fx+%f\nR^2=%f", a, b, R2) # print r^2.
plot 'your data file' using 1:2 title 'data', f(x) notitle
is the same code for a potencial fit?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteTry:
ReplyDeletestats 'your datafile' using 1:2
That will print out R directly. So you just have to square it to get R^2.
I think your calculation of the SSE is slightly incorrect: since you fit a function with two variables, you need to divide by `FIT_NDF + 2` instead of `FIT_NDF` to use the correct number of records. (Compare this with the calculation of the SST, which added one to the number of degrees of freedom because it fit a function with one variable.)
ReplyDelete