How to use python to do regression with fixed effect and cluster
This blog mainly based on linearmodels (PyPI, Documents), which is a good interface to do regression analysis.
Sample data used in this article can be download here
Panel Models
Basic Linear Regression
First we load a sample dataset and use that dataset to do some simple regression
import numpy as np
from statsmodels.datasets import cpunish
data = cpunish.load_pandas().data
data.head()
EXECUTIONS | INCOME | PERPOVERTY | PERBLACK | VC100k96 | SOUTH | DEGREE | |
---|---|---|---|---|---|---|---|
0 | 37.0 | 34453.0 | 16.7 | 12.2 | 644.0 | 1.0 | 0.16 |
1 | 9.0 | 41534.0 | 12.5 | 20.0 | 351.0 | 1.0 | 0.27 |
2 | 6.0 | 35802.0 | 10.6 | 11.2 | 591.0 | 0.0 | 0.21 |
3 | 4.0 | 26954.0 | 18.4 | 16.1 | 524.0 | 1.0 | 0.16 |
4 | 3.0 | 31468.0 | 14.8 | 25.9 | 565.0 | 1.0 | 0.19 |
import statsmodels.api as sm
dir(sm.datasets)
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__',
'anes96',
'cancer',
'ccard',
'check_internet',
'china_smoking',
'clear_data_home',
'co2',
'committee',
'copper',
'cpunish',
'elnino',
'engel',
'fair',
'fertility',
'get_data_home',
'get_rdataset',
'grunfeld',
'heart',
'longley',
'macrodata',
'modechoice',
'nile',
'randhie',
'scotland',
'spector',
'stackloss',
'star98',
'statecrime',
'strikes',
'sunspots',
'utils',
'webuse']
Written on April 23, 2018