Get Started with dte_adj ======================== This page contains basic usage of dte_adj library. Generate data for training cumulative distribution function: .. code-block:: python import numpy as np def generate_data(n, d_x=100, rho=0.5): """ Generate data according to the described data generating process (DGP). Args: n (int): Number of samples. d_x (int): Number of covariates. Default is 100. rho (float): Success probability for the Bernoulli distribution. Default is 0.5. Returns: X (np.ndarray): Covariates matrix of shape (n, d_x). D (np.ndarray): Treatment variable array of shape (n,). Y (np.ndarray): Outcome variable array of shape (n,). """ # Generate covariates X from a uniform distribution on (0, 1) X = np.random.uniform(0, 1, (n, d_x)) # Generate treatment variable D from a Bernoulli distribution with success probability rho D = np.random.binomial(1, rho, n) # Define beta_j and gamma_j according to the problem statement beta = np.zeros(d_x) gamma = np.zeros(d_x) # Set the first 50 values of beta and gamma to 1 beta[:50] = 1 gamma[:50] = 1 # Compute the outcome Y U = np.random.normal(0, 1, n) # Error term linear_term = np.dot(X, beta) quadratic_term = np.dot(X**2, gamma) # Outcome equation Y = 5 * D + linear_term + quadratic_term + U return X, D, Y n = 1000 # Sample size X, D, Y = generate_data(n) Then, let's build an empirical cumulative distribution function (CDF). .. code-block:: python import dte_adj from dte_adj.plot import plot estimator = dte_adj.SimpleDistributionEstimator() estimator.fit(X, D, Y) locations = np.linspace(Y.min(), Y.max(), 20) cdf = estimator.predict(1, locations) Distributional treatment effect (DTE) can be computed easily in the following code. .. code-block:: python dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="simple") A convenience function is available to visualize distribution effects. This method can be used for other distribution parameters including Probability Treatment Effect (PTE) and Quantile Treatment Effect (QTE). .. code-block:: python plot(locations, dte, lower_bound, upper_bound, title="DTE of simple estimator") .. image:: _static/dte_empirical.png :alt: DTE of empirical estimator :height: 300px :width: 450px :align: center To initialize the adjusted distribution function, the base model for conditional distribution function needs to be passed. In the following example, we use Logistic Regression. Please make sure that your base model implements ``fit`` and ``predict_proba`` methods. .. code-block:: python from sklearn.linear_model import LogisticRegression logit = LogisticRegression() estimator = dte_adj.AdjustedDistributionEstimator(logit, folds=3) estimator.fit(X, D, Y) cdf = estimator.predict(1, locations) DTE can be computed and visualized in the following code. .. code-block:: python dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="simple") plot(locations, dte, lower_bound, upper_bound, title="DTE of adjusted estimator with simple confidence band") .. image:: _static/dte_simple.png :alt: DTE of adjusted estimator with simple confidence band :height: 300px :width: 450px :align: center Confidence bands can be computed in different ways. In the following code, we use moment method to calculate the confidence band. .. code-block:: python dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="moment") plot(locations, dte, lower_bound, upper_bound, title="DTE of adjusted estimator with moment confidence band") .. image:: _static/dte_moment.png :alt: DTE of adjusted estimator with moment confidence band :height: 300px :width: 450px :align: center Also, an uniform confidence band is used when ``uniform`` is specified for the ``variance_type`` argument. .. code-block:: python dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="uniform") plot(locations, dte, lower_bound, upper_bound, title="DTE of adjusted estimator with uniform confidence band") .. image:: _static/dte_uniform.png :alt: DTE of adjusted estimator with uniform confidence band :height: 300px :width: 450px :align: center To compute PTE, we can use ``predict_pte`` method. .. code-block:: python pte, lower_bound, upper_bound = estimator.predict_pte(target_treatment_arm=1, control_treatment_arm=0, width=1, locations=locations, variance_type="simple") plot(locations, pte, lower_bound, upper_bound, chart_type="bar", title="PTE of adjusted estimator with simple confidence band") .. image:: _static/pte_empirical.png :alt: PTE of adjusted estimator with simple confidence band :height: 300px :width: 450px :align: center To compute QTE, we use ``predict_qte`` method. The confidence band is computed by bootstrap method. .. code-block:: python quantiles = np.array([0.1 * i for i in range(1, 10)], dtype=np.float32) qte, lower_bound, upper_bound = estimator.predict_qte(target_treatment_arm=1, control_treatment_arm=0, quantiles=quantiles, n_bootstrap=30) plot(quantiles, qte, lower_bound, upper_bound, title="QTE of adjusted estimator") .. image:: _static/qte.png :alt: QTE of adjusted estimator :height: 300px :width: 450px :align: center You can use any model with ``predict_proba`` or ``predict`` method to adjust the distribution function estimation. For example, the following code use XGBoost classifier to estimate the conditional distribution. .. code-block:: python import xgboost as xgb estimator = dte_adj.AdjustedDistributionEstimator(xgb.XGBClassifier(), folds=3) estimator.fit(X, D, Y) cdf = estimator.predict(1, locations) ``predict_dte`` and ``predict_pte`` methods provide an option to train a model for multiple locations simultaneously. To enable the feature, pass ``is_multi_task=True``. .. code-block:: python from sklearn.linear_model import LinearRegression model = LinearRegression() estimator = dte_adj.AdjustedDistributionEstimator(model, folds=3) estimator.fit(X, D, Y) dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, is_multi_task=True, locations=locations, variance_type="moment")