# Linear Regression: Coefficient of Determination in Python

Last Update: February 21, 2022

Coefficient of Determination in Python can be estimated using `statsmodels` package `ols` function, its `summary` method and `rsquared`, `rsquared_adj` properties found within `statsmodels.formula.api` module to fit linear regression, print its summary results and estimated coefficients of determination. Main parameters within `ols` function are `formula` with `“y ~ x1 + … + xp”` model description string and `data` with data frame object including model variables.

As example, we can estimate coefficients of multiple determination from multiple linear regression of house price explained by its lot size and number of bedrooms using data included within `AER` R package `HousePrices` object [1].

First, we import package `statsmodels` for data downloading and model fitting [2].

``````In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
``````

Second, we create `houseprices` data object using `get_rdataset` function and display first five rows and three columns of data using `print` function and `head` data frame method to view its structure.

``````In [2]:
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
``````
``````Out [2]:
price  lotsize  bedrooms
0  42000.0     5850         3
1  38500.0     4000         2
2  49500.0     3060         3
3  60500.0     6650         3
4  61000.0     6360         2
``````

Third, we fit model with `ols` function using variables within `houseprices` data object and store outcome within `mlr` object. Within `ols` function, parameter `formula = “price ~ lotsize + bedrooms”` fits model where house price is explained by its lot size and number of bedrooms.

``````In [3]:
mlr = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
``````

Fourth, we can print `mlr` model summary results which include estimated coefficients of multiple determination using its `summary` method.

``````In [4]:
print(mlr.summary())
``````
``````Out [4]:
OLS Regression Results
==============================================================================
Dep. Variable:                  price   R-squared:                       0.370
Method:                 Least Squares   F-statistic:                     159.6
Date:                Wed, 25 Aug 2021   Prob (F-statistic):           2.95e-55
Time:                        18:41:02   Log-Likelihood:                -6213.1
No. Observations:                 546   AIC:                         1.243e+04
Df Residuals:                     543   BIC:                         1.245e+04
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   5612.5997   4102.819      1.368      0.172   -2446.741    1.37e+04
lotsize        6.0530      0.424     14.265      0.000       5.219       6.887
bedrooms    1.057e+04   1247.676      8.470      0.000    8116.488     1.3e+04
==============================================================================
Omnibus:                       77.789   Durbin-Watson:                   1.193
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              146.854
Skew:                           0.833   Prob(JB):                     1.29e-32
Kurtosis:                       4.919   Cond. No.                     2.60e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.6e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
``````

Fifth, we can also print `mlr` model estimated coefficients of determination using its `rsquared` and `rsquared_adj` properties.

``````In [5]:
print(mlr.rsquared)
``````
``````Out [5]:
0.37026934405815837
``````
``````In [6]:
``````
``````Out [6]:
0.3679498941283542
``````

Courses

My online courses are hosted at Teachable website.

For more details on this concept, you can view my Linear Regression in Python Course.

References

[1] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

[2] Seabold, Skipper, and Josef Perktold. (2010). “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference.

+