Search This Blog

Tuesday, April 30, 2013

Calculating R^2 with gnuplot

A relevant knowledge on the ANOVA.

SST = SSR+SSE

where, each term represents following items.
  • Total sum of squares, SST: the total variation in the observed values of the reponse variable.
  • Regression sum of squares, SSR: the variation in the observed values of the response variable explained by the regression
  • Error sum of squares, SSE: the variation in the observed values of the response variable not explained by the regression.
Also, the coefficient of determination, r^2 is 

 R^2 = SSR/SST

There is a command 'fit' in gnuplot.

The tricky part is to obtain SST using the fit command, which is a curve fitting, but produces some useful variables, FIT_WSSR and FIT_NDF. For details, refer to gnuplot tricks: Basic statistics.

mean(x)= m
fit mean(x) 'your data file' using 1:2 via m # 1 is the x axis and 2 is the y axis
SST = FIT_WSSR/(FIT_NDF+1)

f(x) = a*x + b
fit f(x) 'your data file' using 1:2 via a, b
SSE=FIT_WSSR/(FIT_NDF)

SSR=SST-SSE
R2=SSR/SST

set label sprintf("f(x)=%fx+%f\nR^2=%f", a, b, R2) # print r^2.
plot 'your data file' using 1:2 title 'data', f(x) notitle