Learn R Programming

dmlalg

The dmlalg package contains implementations of double machine learning algorithms in R.

Installation

You can install the released version of dmlalg from CRAN with:

install.packages("dmlalg")

Partially linear models with confounding variables

The aim of this first set of functions it to perform inference for the linear parameter in partially linear models with confounding variables. The standard DML estimator of the linear parameter has a two-stage least squares interpretation, which can lead to a large variance and overwide confidence intervals. We apply regularization to reduce the variance of the estimator, which produces narrower confidence intervals that remain approximately valid. Nuisance terms can be flexibly estimated with machine learning algorithms.

This algorithm is described in Emmenegger and Bühlmann (2021b) and implemented in the function regsdml.

  • regsdml computes the estimate of the linear parameter in a partially linear model with endogenous variables with regularized and standard double machine learning methods.
  • summary method for objects fitted with regsdml
  • confint method for objects fitted with regsdml
  • coef method for objects fitted with regsdml
  • vcov method for objects fitted with regsdml
  • print method for objects fitted with regsdml

Example

This is a basic example which shows you how to solve a common problem:

library(dmlalg)

## Generate some data:
set.seed(19)
# true linear parameter
beta0 <- 1
n <- 40
# observed confounder
w <- pi * runif(n, -1, 1)
# instrument
a <- 3 * tanh(2 * w) + rnorm(n, 0, 1)
# unobserved confounder
h <- 2 * sin(w) + rnorm(n, 0, 1)
# linear covariate
x <- -1 * abs(a) - h - 2 * tanh(w) + rnorm(n, 0, 1)
# response
y <- beta0 * x - 3 * cos(pi * 0.25 * h) + 0.5 * w ^ 2 + rnorm(n, 0, 1)

## Estimate the linear coefficient from x to y
## (The parameters are chosen small enough to make estimation fast):
## Caveat: A spline estimator is extrapolated, which raises a warning message. 
## Extrapolation lies in the nature of our method. To omit the warning message
## resulting from the spline estimator, another estimator may be used. 
fit <- regsdml(a, w, x, y,
               gamma = exp(seq(-4, 1, length.out = 4)),
               S = 3,
               do_regDML_all_gamma = TRUE,
               cond_method = c("forest",  # for E[A|W]
                               "spline",  # for E[X|W]
                               "spline"), # for E[Y|W]
               params = list(list(ntree = 1), NULL, NULL))
#> Warning in print_W_E_fun(errors, warningMsgs): 
#> Warning messages:
#> some 'x' values beyond boundary knots may cause ill-conditioned bases
## parm = c(2, 3) prints an additional summary for the 2nd and 3rd gamma-values
summary(fit, parm = c(2, 3),
        correlation = TRUE,
        print_gamma = TRUE) 
#> 
#> Coefficients :
#> regsDML (2.72e+00) :
#>    Estimate Std. Error  z value     Pr(>|z|)
#> b1 0.910255  0.1731559 5.256852 1.465421e-07
#> 
#> regDMLall (9.70e-02) :
#>     Estimate Std. Error  z value     Pr(>|z|)
#> b1 0.7986392  0.1514027 5.274935 1.328031e-07
#> 
#> regDMLall (5.13e-01) :
#>    Estimate Std. Error  z value     Pr(>|z|)
#> b1 0.846176  0.1651298 5.124308 2.986318e-07
#> 
#> 
#> Variance-covariance matrices :
#> regsDML (2.72e+00) :
#>            b1
#> b1 0.02998297
#> 
#> regDMLall (9.70e-02) :
#>            b1
#> b1 0.02292277
#> 
#> regDMLall (5.13e-01) :
#>            b1
#> b1 0.02726785
confint(fit, parm = c(2, 3),
        print_gamma = TRUE) 
#> 
#> Two-sided confidence intervals at level 0.95 : 
#> 
#> regsDML (2.72e+00) :
#>        2.5 %   97.5 %
#> b1 0.5708757 1.249634
#> 
#> regDMLall (9.70e-02) :
#>        2.5 %   97.5 %
#> b1 0.5018955 1.095383
#> 
#> regDMLall (5.13e-01) :
#>        2.5 %   97.5 %
#> b1 0.5225276 1.169824
coef(fit) # coefficients
#>     regsDML
#> b1 0.910255
vcov(fit) # variance-covariance matrices
#> 
#> Variance-covariance matrices :
#> regsDML :
#>            b1
#> b1 0.02998297

## Alternatively, provide the data in a single data frame
## (see also caveat above):
data <- data.frame(a = a, w = w, x = x, y = y)
fit <- regsdml(a = "a", w = "w", x = "x", y = "y", data = data,
               gamma = exp(seq(-4, 1, length.out = 4)),
               S = 3)
#> Warning in print_W_E_fun(errors, warningMsgs): 
#> Warning messages:
#> some 'x' values beyond boundary knots may cause ill-conditioned bases

## With more realistic parameter choices:
if (FALSE) {
  fit <- regsdml(a, w, x, y,
                 cond_method = c("forest",  # for E[A|W]
                                 "spline",  # for E[X|W]
                                 "spline")) # for E[Y|W]
  summary(fit)
  confint(fit)

  ## Alternatively, provide the data in a single data frame:
  ## (see also caveat above):
  data <- data.frame(a = a, w = w, x = x, y = y)
  fit <- regsdml(a = "a", w = "w", x = "x", y = "y", data = data)
}

Estimating linear coefficients in partially linear mixed-effects models with repeated measurements using double machine learning

The aim of this second set of functions is to estimate and perform inference for the linear coefficient in a partially linear mixed-effects model with DML. Machine learning algorithms allows us to incorporate more complex interaction structures and high-dimensional variables.

This algorithm is described in Emmenegger and Bühlmann (2021a) and implemented in the function mmdml.

  • mmdml computes the estimate of the linear parameter in a partially linear mixed-effects model using double machine learning methods.
  • confint method for objects fitted with mmdml
  • fixef method for objects fitted with mmdml
  • print method for objects fitted with mmdml
  • ranef method for objects fitted with mmdml
  • residuals method for objects fitted with mmdml
  • sigma method for objects fitted with mmdml
  • summary method for objects fitted with mmdml
  • vcov method for objects fitted with mmdml
  • VarCorr method for objects fitted with mmdml

Example

This is a basic example which shows you how to solve a common problem:

library(dmlalg)

## generate data
RNGkind("L'Ecuyer-CMRG")
set.seed(19)
data1 <- example_data_mmdml(beta0 = 0.2)
data2 <- example_data_mmdml(beta0 = c(0.2, 0.2))

## fit models
## Caveat: Warning messages are displayed because the small number of
## observations results in a singular random effects model
fit1 <-
  mmdml(w = c("w1", "w2", "w3"), x = "x1", y = "resp", z = c("id", "cask"),
        data = data1, z_formula = "(1|id) + (1|cask:id)", group = "id", S = 3)
#> Warning in mmdml(w = c("w1", "w2", "w3"), x = "x1", y = "resp", z = c("id", : 
#> Warning messages:
#> boundary (singular) fit: see ?isSingular

fit2 <-
  mmdml(w = c("w1", "w2", "w3"), x = c("x1", "x2"), y = "resp", z = c("id", "cask"),
        data = data2, z_formula = "(1|id) + (1|cask:id)", group = "id", S = 3)
#> Warning in mmdml(w = c("w1", "w2", "w3"), x = c("x1", "x2"), y = "resp", : 
#> Warning messages:
#> boundary (singular) fit: see ?isSingular

## apply methods
confint(fit2)
#>           2.5%     97.5%
#> x1 -0.03415795 0.3480103
#> x2  0.15930098 0.3893938
fixef(fit2)
#>        x1        x2 
#> 0.1569261 0.2743474
print(fit2)
#> Semiparametric mixed model fit by maximum likelihood ['mmdml']
#> Random effects:
#>  Groups   Name        Std.Dev. 
#>  cask:id  (Intercept) 1.908e-06
#>  id       (Intercept) 1.107e-01
#>  Residual             2.756e-01
#> Number of obs: 46, groups:  cask:id, 20; id, 10
#> Fixed Effects:
#>     x1      x2  
#> 0.1569  0.2743  
#> optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
ranef(fit2)
#> $`cask:id`
#>        (Intercept)
#> 1:1  -0.0023043914
#> 1:10 -0.0050894736
#> 1:2   0.0024571669
#> 1:3   0.0007708872
#> 1:4  -0.0012417525
#> 1:5   0.0029010344
#> 1:6   0.0012307712
#> 1:7  -0.0028418387
#> 1:8  -0.0015618712
#> 1:9  -0.0048037635
#> 2:1   0.0100768089
#> 2:10 -0.0031560819
#> 2:2  -0.0033427429
#> 2:3  -0.0044928425
#> 2:4  -0.0054049237
#> 2:5  -0.0021157461
#> 2:6  -0.0023122280
#> 2:7   0.0038004751
#> 2:8   0.0148222090
#> 2:9   0.0026385335
#> 
#> $id
#>     (Intercept)
#> 1   0.100740957
#> 10 -0.124434023
#> 2  -0.036918731
#> 3  -0.030230821
#> 4  -0.081051109
#> 5   0.018887512
#> 6  -0.006711504
#> 7   0.025545300
#> 8   0.235373382
#> 9  -0.020965920
residuals(fit2)
#> [[1]]
#>  [1] -0.1311195998  0.5733692328  0.1398125051 -0.0705463911 -0.1196552839
#>  [6] -0.0354080600  0.6205378654 -0.1057642425 -0.4355021749 -0.0633888854
#> [11]  0.0070044016 -0.1777683530 -0.0214893719  0.0052358066  0.1594839987
#> [16] -0.2353753755 -0.2216497409 -0.1034882421  0.0175984650 -0.0388497525
#> [21]  0.4636325671 -0.2597143034  0.3528825573 -0.4739722035  0.0007039458
#> [26]  0.0700307380 -0.1315655000 -0.1617002846  0.2162465843  0.0934414339
#> [31] -0.0480554546 -0.1342562672 -0.2349311153  0.4021334289  0.6761796261
#> [36]  0.3514207835 -0.0918140917 -0.2144924370 -0.3184478283 -0.2704273590
#> [41] -0.1953366308  0.7209607369 -0.1050645053 -0.2895904461 -0.2737160112
#> [46]  0.0941353224
#> 
#> [[2]]
#>  [1]  0.066708484  0.381936532  0.083961541 -0.244607521 -0.116940987
#>  [6] -0.015024540  0.605540877  0.128223071 -0.186010749 -0.119432458
#> [11] -0.101885530 -0.153724682 -0.214346785 -0.126400135  0.090522034
#> [16] -0.103818112 -0.170763502 -0.102507199  0.047067741 -0.026325741
#> [21]  0.472126666 -0.231575911  0.324749223 -0.423215690 -0.013990681
#> [26]  0.066537726 -0.086954711 -0.025470109  0.227756255  0.224213587
#> [31] -0.070700603  0.081484834 -0.226268534  0.615553468  0.723110460
#> [36]  0.333538915 -0.076459138 -0.198241935 -0.245660371 -0.366166157
#> [41] -0.142947352  0.677671159 -0.047532882 -0.305555800 -0.379445954
#> [46]  0.007159723
#> 
#> [[3]]
#>  [1]  0.09685066  0.34629638  0.09582384 -0.27150981 -0.12653048 -0.02387543
#>  [7]  0.62488259  0.12730531 -0.19466784 -0.12227940 -0.07635676 -0.16470188
#> [13] -0.20223445 -0.11432450  0.13844295 -0.12234863 -0.18662475 -0.09034621
#> [19]  0.07330126 -0.02704395  0.51049151 -0.23716208  0.36116367 -0.42669942
#> [25] -0.02948530  0.10139429 -0.06858354 -0.03611104  0.19153360  0.21971922
#> [31] -0.04085530  0.09453877 -0.20903814  0.60734696  0.69658489  0.33318587
#> [37] -0.09082740 -0.21317885 -0.24276713 -0.34992920 -0.09491974  0.68198892
#> [43] -0.07291051 -0.24350682 -0.40714805  0.05067157
sigma(fit2)
#> [1] 0.2756384
summary(fit2)
#> Semiparametric mixed model fit by maximum likelihood ['mmdml']
#> Scaled residuals (nr_res = 3): 
#>     Min      1Q  Median      3Q     Max 
#> -1.7195 -0.6674 -0.2394  0.3637  2.6234 
#> 
#> Random effects:
#>  Groups   Name        Variance  Std.Dev. 
#>  cask:id  (Intercept) 3.641e-12 1.908e-06
#>  id       (Intercept) 1.226e-02 1.107e-01
#>  Residual             7.598e-02 2.756e-01
#> Number of obs: 46, groups:  cask:id, 20; id, 10
#> 
#> Fixed effects:
#>    Estimate Std. Error z value Pr(>|z|)    
#> x1  0.15693    0.09749   1.610    0.107    
#> x2  0.27435    0.05870   4.674 2.96e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>    x1    
#> x2 -0.029
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see ?isSingular
vcov(fit2)
#> 2 x 2 Matrix of class "dpoMatrix"
#>               x1            x2
#> x1  9.505018e-03 -9.208662e-05
#> x2 -9.208662e-05  3.445483e-03
VarCorr(fit2)
#>  Groups   Name        Std.Dev.  
#>  cask:id  (Intercept) 1.9083e-06
#>  id       (Intercept) 1.1074e-01
#>  Residual             2.7564e-01

Emmenegger, Corinne, and Peter Bühlmann. 2021a. “Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements.” Preprint arXiv:2108.13657. https://arxiv.org/abs/2108.13657.

———. 2021b. “Regularizing Double Machine Learning in Partially Linear Endogenous Models.” Preprint arXiv:2101.12525. https://arxiv.org/abs/2101.12525.

Copy Link

Version

Install

install.packages('dmlalg')

Monthly Downloads

90

Version

1.0.2

License

GPL (>= 3)

Maintainer

Corinne Emmenegger

Last Published

February 3rd, 2022

Functions in dmlalg (1.0.2)

coef.regsdml

Accessing the coefficients of regsdml fits
example_data_mmdml

Generate data from partially linear mixed-effects model
confint.regsdml

Confidence Intervals for coefficient estimates of regsdml fits
mmdml

Estimating linear coefficients in partially linear mixed-effects models with repeated measurements using double machine learning.
confint.mmdml

Confidence Intervals for coefficient estimates of mmdml fits
print.mmdml

Printing mmdml fits
lme4-extractors

Extract Components from 'mmdml' Fits Imported from 'lme4'
dmlalg

dmlalg: double machine learning algorithms
print.regsdml

Printing regsdml fits
vcov.regsdml

Accessing the variance-covariance matrices of regsdml fits
residuals.mmdml

Confidence Intervals for coefficient estimates of regsDML fits
sigma.mmdml

Extract Residual Standard Deviation 'Sigma' from mmdml Fits
summary.regsdml

Summarizing regsdml fits
summary.mmdml

Summarizing mmdml fits
regsdml

Estimating linear coefficients with double machine learning (DML)