Search This Blog

Tuesday, April 30, 2013

Calculating R^2 with gnuplot

A relevant knowledge on the ANOVA.

SST = SSR+SSE

where, each term represents following items.
  • Total sum of squares, SST: the total variation in the observed values of the reponse variable.
  • Regression sum of squares, SSR: the variation in the observed values of the response variable explained by the regression
  • Error sum of squares, SSE: the variation in the observed values of the response variable not explained by the regression.
Also, the coefficient of determination, r^2 is 

 R^2 = SSR/SST

There is a command 'fit' in gnuplot.

The tricky part is to obtain SST using the fit command, which is a curve fitting, but produces some useful variables, FIT_WSSR and FIT_NDF. For details, refer to gnuplot tricks: Basic statistics.

mean(x)= m
fit mean(x) 'your data file' using 1:2 via m # 1 is the x axis and 2 is the y axis
SST = FIT_WSSR/(FIT_NDF+1)

f(x) = a*x + b
fit f(x) 'your data file' using 1:2 via a, b
SSE=FIT_WSSR/(FIT_NDF)

SSR=SST-SSE
R2=SSR/SST

set label sprintf("f(x)=%fx+%f\nR^2=%f", a, b, R2) # print r^2.
plot 'your data file' using 1:2 title 'data', f(x) notitle




4 comments:

  1. is the same code for a potencial fit?

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Try:

    stats 'your datafile' using 1:2

    That will print out R directly. So you just have to square it to get R^2.

    ReplyDelete
  4. I think your calculation of the SSE is slightly incorrect: since you fit a function with two variables, you need to divide by `FIT_NDF + 2` instead of `FIT_NDF` to use the correct number of records. (Compare this with the calculation of the SST, which added one to the number of degrees of freedom because it fit a function with one variable.)

    ReplyDelete