-U DoubleML pip install
DoubleML Trainings: Getting Started
Welcome to the DoubleML Trainings!
We are very happy to welcome you to our Trainings in Causal Machine Learning with DoubleML! Please have a look at the following instructions to get ready for our DoubleML trainings.
Virtual Meetings and Communication 💻
We will send you the invite links to our virtual meetings via the email that you provided during sign-up on eventbrite. Our sessions will be hosted via Microsoft Teams. You can either install the Microsoft Teams app on your machine or access our meetings from the browser.
We will use a slack workspace for communication during the training. You will be sent an invite link or be added by the course organizers.
Materials: Slides and Notebooks
You will receive a link to the materials (slides, notebooks, etc.) in the days before our training starts.
Installation
During our trainings, we will work with DoubleML
and other packages in Python. So please make sure you have access to a working Python environment on your local machine or on a cloud service.
Installing DoubleML
for Python
Please read the installation instructions and make sure you installed the latest release (>= DoubleML 0.7.0
) of DoubleML
on your local machine prior to our tutorial. If you have an earlier version of DoubleML
installed, please update your installation.
To install DoubleML
via pip or conda without a virtual environment type
or use conda
-c conda-forge doubleml conda install
Please check that you installed a DoubleML
version as of version 0.7.0
or higher by typing
import doubleml
print(doubleml.__version__)
For more information on installing DoubleML
read our online installation guide.
Installing Additional Packages
In addition to DoubleML
and its dependencies, we will use the packages xgboost
, lightgbm
and networkx
. To install these packages please run
pip install xgboost lightgbm networkx
Getting Ready for the Tutorial
Run the following example to check whether you are ready for the tutorial.
Load the DoubleML
package after completed installation.
import doubleml as dml
Load the Bonus data set.
from doubleml.datasets import fetch_bonus
# Load bonus data
= fetch_bonus('DataFrame')
df_bonus print(df_bonus.head(5))
index abdt tg inuidur1 inuidur2 female black hispanic othrace \
0 0 10824 0 2.890372 18 0 0 0 0
1 3 10824 0 0.000000 1 0 0 0 0
2 4 10747 0 3.295837 27 0 0 0 0
3 11 10607 1 2.197225 9 0 0 0 0
4 12 10831 0 3.295837 27 0 0 0 0
dep ... recall agelt35 agegt54 durable nondurable lusd husd muld \
0 2 ... 0 0 0 0 0 0 1 0
1 0 ... 0 0 0 0 0 1 0 0
2 0 ... 0 0 0 0 0 1 0 0
3 0 ... 0 1 0 0 0 0 0 1
4 1 ... 0 0 1 1 0 1 0 0
dep1 dep2
0 0.0 1.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
4 1.0 0.0
[5 rows x 26 columns]
Create a data backend.
# Specify the data and variables for the causal model
from doubleml import DoubleMLData
= DoubleMLData(df_bonus,
dml_data_bonus ='inuidur1',
y_col='tg',
d_cols=['female', 'black', 'othrace', 'dep1', 'dep2',
x_cols'q2', 'q3', 'q4', 'q5', 'q6', 'agelt35', 'agegt54',
'durable', 'lusd', 'husd'])
print(dml_data_bonus)
================== DoubleMLData Object ==================
------------------ Data summary ------------------
Outcome variable: inuidur1
Treatment variable(s): ['tg']
Covariates: ['female', 'black', 'othrace', 'dep1', 'dep2', 'q2', 'q3', 'q4', 'q5', 'q6', 'agelt35', 'agegt54', 'durable', 'lusd', 'husd']
Instrument variable(s): None
No. Observations: 5099
------------------ DataFrame info ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5099 entries, 0 to 5098
Columns: 26 entries, index to dep2
dtypes: float64(3), int64(23)
memory usage: 1.0 MB
Create two learners for the nuisance components using scikit-learn
.
from sklearn.base import clone
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LassoCV
= RandomForestRegressor(n_estimators = 500, max_features = 'sqrt', max_depth= 5)
learner
= clone(learner)
ml_l_bonus = clone(learner) ml_m_bonus
Create a new instance of a causal model, here a partially linear regression model via DoubleMLPLR
.
import numpy as np
from doubleml import DoubleMLPLR
3141)
np.random.seed(= DoubleMLPLR(dml_data_bonus, ml_l_bonus, ml_m_bonus)
obj_dml_plr_bonus ;
obj_dml_plr_bonus.fit()print(obj_dml_plr_bonus)
================== DoubleMLPLR Object ==================
------------------ Data summary ------------------
Outcome variable: inuidur1
Treatment variable(s): ['tg']
Covariates: ['female', 'black', 'othrace', 'dep1', 'dep2', 'q2', 'q3', 'q4', 'q5', 'q6', 'agelt35', 'agegt54', 'durable', 'lusd', 'husd']
Instrument variable(s): None
No. Observations: 5099
------------------ Score & algorithm ------------------
Score function: partialling out
------------------ Machine learner ------------------
Learner ml_l: RandomForestRegressor(max_depth=5, max_features='sqrt', n_estimators=500)
Learner ml_m: RandomForestRegressor(max_depth=5, max_features='sqrt', n_estimators=500)
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[1.200303]]
Learner ml_m RMSE: [[0.47419634]]
------------------ Resampling ------------------
No. folds: 5
No. repeated sample splits: 1
------------------ Fit summary ------------------
coef std err t P>|t| 2.5 % 97.5 %
tg -0.076684 0.035411 -2.165549 0.030346 -0.146087 -0.00728
Ready to Go 🚀
Once you are able to run this code, you are ready for our tutorial!
Questions and Contact
In case you have any questions, please contact us via trainings@economicai.com.