How to use python to do regression with fixed effect and cluster

This blog mainly based on linearmodels (PyPI, Documents), which is a good interface to do regression analysis.

Sample data used in this article can be download here

Panel Models

Basic Linear Regression

First we load a sample dataset and use that dataset to do some simple regression

import numpy as np
from statsmodels.datasets import cpunish
data = cpunish.load_pandas().data

data.head()
EXECUTIONS INCOME PERPOVERTY PERBLACK VC100k96 SOUTH DEGREE
0 37.0 34453.0 16.7 12.2 644.0 1.0 0.16
1 9.0 41534.0 12.5 20.0 351.0 1.0 0.27
2 6.0 35802.0 10.6 11.2 591.0 0.0 0.21
3 4.0 26954.0 18.4 16.1 524.0 1.0 0.16
4 3.0 31468.0 14.8 25.9 565.0 1.0 0.19
import statsmodels.api as sm
dir(sm.datasets)
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'anes96',
 'cancer',
 'ccard',
 'check_internet',
 'china_smoking',
 'clear_data_home',
 'co2',
 'committee',
 'copper',
 'cpunish',
 'elnino',
 'engel',
 'fair',
 'fertility',
 'get_data_home',
 'get_rdataset',
 'grunfeld',
 'heart',
 'longley',
 'macrodata',
 'modechoice',
 'nile',
 'randhie',
 'scotland',
 'spector',
 'stackloss',
 'star98',
 'statecrime',
 'strikes',
 'sunspots',
 'utils',
 'webuse']
Written on April 23, 2018