Abstract
We consider estimation of and inference for the mean outcome under the optimal dynamic two time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to depend only on a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric beyond possible knowledge about the treatment and censoring mechanism. This contrasts from the current literature that relies on parametric assumptions. We establish that the mean of the counterfactual outcome under the optimal dynamic treatment is a pathwise differentiable parameter under conditions, and develop a targeted minimum loss-based estimator (TMLE) of this target parameter. We establish asymptotic linearity and statistical inference for this estimator under specified conditions. In a sequentially randomized trial the statistical inference relies upon a second-order difference between the estimator of the optimal dynamic treatment and the optimal dynamic treatment to be asymptotically negligible, which may be a problematic condition when the rule is based on multivariate time-dependent covariates. To avoid this condition, we also develop TMLEs and statistical inference for data adaptive target parameters that are defined in terms of the mean outcome under the estimate of the optimal dynamic treatment. In particular, we develop a novel cross-validated TMLE approach that provides asymptotic inference under minimal conditions, avoiding the need for any empirical process conditions. We offer simulation results to support our theoretical findings.
1 Introduction
Suppose we observe n in4dependent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial (RCT) in which one follows up a group of subjects, and treatment assignment at two time points is sequentially randomized, where the probability of receiving treatment might be determined by a baseline covariate for the first-line treatment, and time-dependent intermediate covariate (such as a biomarker of interest) for the second-line treatment [1]. Such trials are often called sequential multiple assignment randomized trials (SMART). A dynamic treatment rule deterministically assigns treatment as a function of the available history. If treatment is assigned at two time points, then this dynamic treatment rule consists of two rules, one for each time point [1–4]. The mean outcome under a dynamic treatment is a counterfactual quantity of interest representing what the mean outcome would have been if everybody would have received treatment according to the dynamic treatment rule [5–11]. Dynamic treatments represent prespecified multiple time-point interventions that at each treatment-decision stage are allowed to respond to the currently available treatment and covariate history. Examples of multiple time-point dynamic treatment regimes are given in Lavori and Dawson [12, 13]; Murphy [14]; Rosthøj et al. [15]; Thall et al. [16, 17]; Wagner et al. [18]; Petersen et al. [19]; van der Laan and Petersen [20]; and Robins et al. [21], ranging from rules that change the dose of a drug, change or augment the treatment, to making a decision on when to start a new treatment, in response to the history of the subject.
More recently, SMART designs have been implemented in practice: Lavori and Dawson [12, 22]; Murphy [14]; Thall et al. [16]; Chakraborty et al. [23]; Kasari [24]; Lei et al. [25]; Nahum-Shani et al. [26, 27]; Jones [28]; Lei et al. [25]. For an extensive list of SMARTs, we refer the reader to the website http://methodology.psu.edu/ra/adap-inter/projects. For an excellent and recent overview of the literature on dynamic treatments we refer to Chakraborty and Murphy [29].
We define the optimal dynamic multiple time-point treatment regime as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to only respond to a user-supplied subset of the baseline and intermediate covariates. The literature on Q-learning shows that we can describe the optimal dynamic treatment among all dynamic treatments in a sequential manner [14, 30–33]. The optimal rule can be learned through fitting the likelihood and then calculating the optimal rule under this fit of the likelihood. This approach can be implemented with maximum likelihood estimation based on parametric models. It has been noted (e.g., Robins [32], Chakraborty and Murphy [29]) that the estimator of the parameters of one of the regressions (except the first one) when using parametric regression models is a non-smooth function of the estimator of the parameters of the previous regression, and that this results in non-regularity of the estimators of the parameter vector. This raises challenges for obtaining statistical inference, even when assuming that these parametric regression models are correctly specified. Chakraborty and Murphy [29] discuss various approaches and advances that aim to resolve this delicate issue such as inverting hypothesis testing [32], establishing non-normal limit distributions of the estimators (E. Laber, D. Lizotte, M. Qian, S. Murphy, submitted), or using the m out of n bootstrap.
Murphy [30] and Robins [31, 32] developed structural nested mean models tailored to optimal dynamic treatments. These models assume a parametric model for the “blip function” defined as the additive effect of a blip in current treatment on a counterfactual outcome, conditional on the observed past, in the counterfactual world in which future treatment is assigned optimally. Statistical inference for the parameters of the blip function proceeds accordingly, but Robins [32] points out the irregularity of the estimator, resulting in some serious challenges for statistical inference as referenced above. Structural nested mean models have also been generalized to blip functions that condition on a (counterfactual) subset of the past, thereby allowing the learning of optimal rules that are restricted to only using this subset of the past [32] and Section 6.5 in van der Laan and Robins [34].
An alternative approach, referenced as the direct approach in Chakraborty and Murphy [29], uses marginal structural models (MSMs) for the dynamic regime-specific mean outcome for a user-supplied class of dynamic treatments. If one assumes the marginal structural models are correctly specified, then the parameters of the marginal structural model map into a dynamic treatment that is optimal among the user-supplied class of dynamic regimes. In addition, the MSM also provides the complete dose–response curve, that is, the mean counterfactual outcome for each dynamic treatment in the user-supplied class. This generalization of the original marginal structural models for static interventions to MSMs for dynamic treatments was developed independently by Orellana et al. [35]; van der Laan and Petersen [20]. These articles present inverse probability of treatment and censoring weighted (IPCW) estimators and double robust augmented IPCW estimators based on general longitudinal data structures, allowing for right censoring, time-dependent covariates, and survival outcomes. Double robust estimating equation-based methods that estimate the nuisance parameters with sequential parametric regression models using clever covariates were developed for static intervention MSMs by Bang and Robins [36]. An analogous targeted minimum loss-based estimator (TMLE) [37–39] was developed for marginal structural models for a user-supplied class of dynamic treatments by Petersen et al. [40]. This estimator builds on the TMLE for the mean outcome for a single dynamic treatment developed by van der Laan and Gruber [41]. Additional application papers of interest are [42–44] which involve fitting MSMs for dynamic treatments defined by treatment-tailoring threshold using IPCW methods.
Each of the above referenced approaches for learning an optimal dynamic treatment that also aims to provide statistical inference relies on parametric assumptions: obviously, Q-learning based on parametric models, but also the structural nested mean models and the marginal structural models both rely on parametric models for the blip function and dose–response curve, respectively. As a consequence, even in a SMART, the statistical inference for the optimal dynamic treatment heavily relies on assumptions that are generally believed to be false, and will thus be expected to be biased.
To avoid such biases, we define the statistical model for the data distribution as nonparametric, beyond possible knowledge about the treatment mechanism (e.g., known in an RCT) and censoring mechanism. This forces us to define the optimal dynamic treatment and the corresponding mean outcome as parameters defined on this nonparametric model, and to develop data adaptive estimators of the optimal dynamic treatment. In order to not only consider the most ambitious fully optimal rule, we define the V-optimal rules as the optimal rule that only uses a user-supplied subset V of the available covariates. This allows us to consider suboptimal rules that are easier to estimate and thereby allow for statistical inference for the counterfactual mean outcome under the suboptimal rule. This is analogous to the generalized structural nested mean models whose blip functions only condition on a counterfactual subset of the past. In a companion article we describe how to estimate the V-optimal rule.
In Example 4 of Robins et al. [45], the authors develop an asymptotic confidence set for the optimal treatment regime in an RCT under a large semiparametric model that only assumes that the treatment mechanism is known. This confidence set is certainly of interest and warrants further consideration in the optimal treatment literature. They get this confidence set by deriving the efficient influence curve for the mean squared blip function. They propose selecting a data adaptive estimate of the optimal treatment rule by a particular cross-validation scheme over a set of basis functions, and show that this estimator achieves a data adaptive rate of convergence under smoothness assumptions on the blip function. Our work is distinct from this earlier work in that the earlier work does not directly consider the mean outcome under the optimal rule and only considers data generated by a point treatment RCT.
In this article we describe how to obtain semiparametric inference about the mean outcome under the two time point V-optimal rule. We will show that the mean outcome under the optimal rule is a pathwise differentiable parameter of the data distribution, indicating that it is possible to develop asymptotically linear estimators of this target parameter under conditions. In fact, we obtain the surprising result that the pathwise derivative of this target parameter equals the pathwise derivative of the mean counterfactual outcome under a given dynamic treatment rule set at the optimal rule, treating the latter as known. By a reference to the current literature for double robust and efficient estimation of the mean outcome under a given rule, we then obtain a TMLE for the mean outcome under the optimal rule. Subsequently, we prove asymptotic linearity and efficiency of this TMLE, allowing us to construct confidence intervals for the mean outcome under the optimal dynamic treatment or its contrast with respect to a standard treatment. Thus, contrary to the irregularity of the estimators of the unknown parameters in the semiparametric structural nested mean model, we can construct regular estimators of the mean outcome under the optimal rule in the nonparametric model.
In a SMART the statistical inference would only rely upon a second-order difference between the estimator of the optimal dynamic treatment and the optimal dynamic treatment itself to be asymptotically negligible. This is a reasonable condition if we restrict ourselves to rules only responding to a one-dimensional time-dependent covariate, or if we are willing to make smoothness assumptions. To avoid this condition, we also develop TMLEs and statistical inference for data adaptive target parameters that are defined in terms of the mean outcome under the estimate of the optimal dynamic treatment (see van der Laan et al. [46] for a general approach for statistical inference for data adaptive target parameters). In particular, we develop a novel cross-validated TMLE (CV-TMLE) approach that provides asymptotic inference under minimal conditions.
For the sake of presentation, we focus on two time point treatments in this article. In the appendices of our earlier technical reports [47, 48] we generalize these results to general multiple time point treatments, and develop general (sequential) super-learning based on the efficient CV-TMLE of the risk of a candidate estimator. In this appendix we also develop a TMLE of a projection of the blip functions on a parametric working model (with corresponding statistical inference, which presents a result of interest in its own right). We emphasize that this technical report is distinct from our companion paper in this issue, which focuses on the data adaptive estimation of optimal treatment strategies.
1.1 Organization of article
Section 2 defines the mean outcome under the optimal rule as a causal parameter and gives identifiability assumptions under which the causal parameter is identified with a statistical parameter of the observed data distribution.
The remainder of the paper describes strategies to estimate the counterfactual mean outcome under the optimal rule and related quantities. This paper assumes that we have an estimate of the optimal rule in our semiparametric model. In our companion paper we describe how to obtain estimates of the V-optimal rule.
The first part of this article concerns estimation of the mean outcome under the optimal rule. Section 3 establishes the pathwise differentiability of the mean outcome under the V-optimal rule conditions. A closed form expression for the efficient influence curve for this statistical parameter is given, which represents a key ingredient in semiparametric inference for the statistical target parameter. We obtain the surprising result that, under straightforward conditions, estimating the mean outcome under the unknown optimal treatment rule is the same in first order as estimating the mean outcome under the optimal rule when the rule is known from the outset. Section 4 presents the key properties of a TMLE for the mean outcome under the optimal rule, which is presented in detail in “TMLE of the mean outcome under a given rule” in Appendix B due to its similarity to TMLEs presented previously in the literature. Section 5 presents an asymptotic linearity theorem for this TMLE and corresponding statistical inference.
The second part of this article concerns statistical inference for data adaptive target parameters that are defined in terms of the mean outcome under the estimate of the optimal dynamic treatment, thereby avoiding the consistency and rate condition for the fitted V-optimal rule as required for asymptotic linearity of the TMLE of the mean outcome under the actual V-optimal rule. These results are of interest in practice because an estimated, possibly suboptimal, rule will be implemented in the population, not some unknown optimal rule. Section 6 presents an asymptotic linearity theorem for the TMLE presented in Section 4, but now with the target parameter defined as the mean outcome under the estimated rule. In Section 7 we present the CV-TMLE framework. A specific CV-TMLE algorithm is described in “CV-TMLE of the mean outcome under data adaptive V-optimal rule” in Appendix B due to its similarity to CV-TMLEs presented previously in the literature. The CV-TMLE provides asymptotic inference under minimal conditions for the mean outcome under a dynamic treatment fitted on a training sample, averaged across the different splits in training sample and validation sample. Both results allow us to construct confidence intervals that have the correct asymptotic coverage of the random true target parameter, and the fixed mean outcome under the optimal rule under conditions, but statistical inference based on the CV-TMLE does not require an empirical process condition that would put a brake on the allowed data adaptivity of the estimator.
Section 8 presents the simulation methods. The simulations estimate the optimal rule using an ensemble algorithm presented in our companion paper, and then given this estimate apply the estimators of the optimal rule presented in this paper. Section 9 presents the coverage and efficiency of the various estimators in our simulation. Appendix C gives analytic intuition as to why some of the simulation results may have occurred. Section 10 closes with a discussion and directions for future work.
All proofs can be found in Appendix A.
2 Formulation of optimal dynamic treatment estimation problem
Suppose we observe n i.i.d. copies O1,…,On∈O of
where A(j)=(A1(j),A2(j)), A1(j) is a binary treatment, and A2(j) is an indicator of not being right censored at “time” j, j=0,1. That is, A2(0)=0 implies that (L(1),A1(1),Y) is n ot observed, and A2(1)=0 implies that Y is not observed. Each time point j has covariates L(j) that precede treatment, j=0,1, and the outcome of interest is given by Y and occurs after time point 1. For a time-dependent process X(⋅), we use the notation ˉX(t)=(X(s):s≤t), where ˉX(−1)=∅. Let M be a statistical model that makes no assumptions on the marginal distribution Q0,L(0) of L(0) and the conditional distribution Q0,L(1) of L(1), given A(0),L(0), but might make assumptions on the conditional distributions g0A(j) of A(j), given ˉA(j−1),ˉL(j), j=0,1. We will refer to g0 as the intervention mechanism, which can be factorized in a treatment mechanism g01 and censoring mechanism g02 as follows:
In particular, the data might have been generated by a SMART, in which case g01 is known.
Let V(1) be a function of (L(0),A(0),L(1)), and let V(0) be a function of L(0). Let V=(V(0),V(1)). Consider dynamic treatment rules V(0)→dA(0)(V(0))∈{0,1}×{1} and (A(0),V(1))→dA(1)(A(0),V(1))∈{0,1}×{1} for assigning treatment A(0) and A(1), respectively, where the rule for A(0) is only a function of V(0), and the rule for A(1) is only a function of (A(0),V(1)). Note that these rules are restricted to set the censoring indicators A2(j)=1, j=0,1. Let D be the set of all such rules. We assume that V(0) is a function of V(1) (i.e., observing V(1) includes observing V(0)), but in the theorem below we indicate an alternative assumption. For d∈D, we let
If we assume a structural equation model [7] for variables stating that
where the collection of functions f=(fL(0),fA(0),fL(1),fA(1)) is unspecified or partially specified, we can define counterfactuals Yd defined by the modified system in which the equations for A(0),A(1) are replaced by A(0)=dA(0)(V(0)) and A(1)=dA(1)(A(0),V(1)), respectively. Denote the distribution of these counterfactual quantities as P0,d, where we note that P0,d is implied by the collection of functions f and the joint distribution of exogeneous variables (UL(0),UA(0),UL(1),UA(1),UY). We can now define the causally optimal rule under P0,d as
where
More generally, for a distribution
where
For the remainder of this article, if for a static or dynamic intervention d, we use notation
The strong positivity assumption will be defined as the above assumption, but where the 0 is replaced by a
We now define a statistical parameter representing the mean outcome
For a distribution P, define the V-optimal rule as
For simplicity, we will write
Under our identifiability assumptions,
The next theorem presents an explicit form of the V-optimal individualized treatment rule
Theorem 1. Suppose
where
then the above expression for the V-optimal rule
3 The efficient influence curve of the mean outcome under V-optimal rule
In this section we establish the pathwise differentiability of
where
Above
At times it will be convenient to write
Whenever
where
Theorem 2. Suppose
That is,
The above theorem is proved as Theorem 8 in van der Laan and Luedtke [48] so the proof is omitted here.
We will at times denote
Theorem 3. Let
where for all
From the study of the statistical target parameter
The following lemma bounds
Lemma 1. Let
where the expression in each expectation is taken to be 0 when the indicator is 0. Fix
where
The conditions in (6) are moment bounds which ensure that
Using the upper bound on
Hence
The bounds given in Lemma 1 are loose. It is not in general necessary to estimate the blip functions
Convergence rates of estimators of
p | Sufficient | |
2 | 1 | |
2 | ||
4 | 1 | |
2 | ||
1 | ||
2 | ||
4 TMLE of the mean outcome under V-optimal rule
Throughout this and the next section we assume that condition (5) holds at
Here we note some of the key properties of the TMLE. Let
that estimates
where we have applied the TMLE in the appendix to the case where
Recall that
Further, one can show using standard M-estimator analysis that the targeted
5 Asymptotic efficiency of the TMLE of the mean outcome under the V-optimal rule
We now wish to analyze the TMLE
Theorem 4. Assume
where
where
The proof of the above theorem, which is given in the appendix, makes use of the fact that the TMLE satisfies (7). We now give two sets of conditions which control the remainder term
Corollary 1. Suppose the conditions of Theorem 4 further suppose that
That is,
The next corollary is more general in that it applies to situations where the intervention mechanism
Corollary 2. Suppose all of the conditions of Theorem 4 hold, and that
for some
for some function
If it is also know that
where
Equation (11) is a corollary of Theorem 2.3 of van der Laan and Robins [34]. The rest of the theorem is the result of a simple rearrangement of terms, so the proof is omitted.
Condition (9) is trivially satisfied in a randomized clinical trial without missingness, where we can take
5.1 Asymptotic linearity of TMLE in a SMART setting
Suppose the data is generated by a sequential RCT and there is no missingness so that
If there is right censoring, then
5.2 Statistical inference
Suppose one wishes to estimate the mean outcome under the optimal rule
so that one could use
An asymptotic 95% confidence interval for
6 Statistical inference for mean outcome under data adaptively determined dynamic treatment
Let
That is, we construct an estimator
where
We do not assume that (5) holds in this section, but we do implicitly make the weaker assumption that
As shown in the proof of Theorem 3,
This relation is key to the proof of the following theorem, which is analogous to Theorem 4. Note crucially that the theorem does not have any conditions on the remainder term
Theorem 5. Assume
Assume
for some
If
The proof of the above theorem is nearly identical to the proof of Theorem 4 so is omitted. For general
7 Statistical inference for the average of sample-split specific mean counterfactual outcomes under data adaptively determined dynamic treatments
Again let
In this section, we present a method that provides an estimator and statistical inference for the data adaptive target parameter
Note that
One applies the estimator
The next subsection defines the general CV-TMLE for data adaptive target parameters. We subsequently present an asymptotic linearity theorem allowing us to construct asymptotic 95% confidence intervals.
7.1 General description of CV-TMLE
Here we give a general overview of the CV-TMLE procedure. In “CV-TMLE of the mean outcome under data adaptive V-optimal rule” in Appendix B we present a particular CV-TMLE which satisfies all of the properties described in this section. Denote the realizations of
represent an initial estimate of
where these submodels rely on an estimate
Let
where
The CV-TMLE implementation presented in the appendix satisfies this equation with
In the current literature we have referred to this estimator as the CV-TMLE [53–56]. We give a concrete CV-TMLE algorithm for
7.2 Statistical inference based on the CV-TMLE
We now proceed with the analysis of this CV-TMLE
Theorem 6. Let
for some
Note that
Corollary 3. Suppose the conditions from Theorem 6 hold with
for some
We can conclude that:
The proof of the above result is just a rearrangement of terms so is omitted. Consider our setting. Suppose
of the asymptotic variance
Now consider the case where
is second order, that is,
8 Simulation methods
We start by presenting two single time point simulations. In earlier technical reports we directly describe the single time point problem [47, 48]. Here, we instead note that a single time point optimal treatment is a special case of a two time point treatment when only the second treatment is of interest. In particular, we can see this by taking
8.1 Data
8.1.1 Single time point
We simulate 1,000 data sets of 1,000 observations from an RCT without missingness. We have that:
where Y is a Bernoulli random variable and H is an unobserved
We consider two choices for
8.1.2 Two time point
We again simulate 1,000 data sets of 1,000 observations from an RCT without missingness. The observed variables have the following distribution:
where
Note that
Static treatments yield mean outcomes
8.2 Optimal rule estimation methods
For now suppose we have estimators of the optimal rule with reasonable convergence properties, by which we mean that the true mean outcome under the fitted rule is close to the mean outcome under the optimal rule. In our companion paper in this volume we describe these estimators and show precisely how close these estimators come to achieving the optimal mean outcome. Here we note that our estimation algorithms correspond to using the full candidate library of weighted classification and blip function-based estimators proposed in table 2 of our companion paper, with the weighted log loss function used to determine the convex combination of candidates. We provide oracle inequalities for this estimator in our companion paper, and argue that it represents a powerful approach to data adaptively estimate the optimal rule without over- or underfitting the data. For a sample size n, we denote the rule estimated on the whole sample by
8.3 Inference procedures
We use four procedures to estimate the mean outcome under the fitted rule. All inference procedures rely on the intervention mechanism
The first method uses the TMLE described in “TMLE of the mean outcome under a given rule” in Appendix B. The second method uses the analogous estimating equation approach that uses the double robust inverse probability of censoring weighted (DR-IPCW) estimating equation implied by
All inference procedures also rely on an estimate of
For the single time point case, we compare plugging in the true value of
over the empirical distribution of
The procedures used to estimate the optimal rule rely on similar means, and we supply these estimation procedures with the incorrect value
The simulation was implemented in R [57]. The code used to run the simulations is available upon request. We are currently looking to implement the methods in this paper and the companion paper in an R package.
8.4 Evaluating performance
We use the coverage of asymptotic 95% confidence intervals to evaluate the performance of the various methods. As we establish in the earlier parts of this paper, each inference approach yields two interesting target parameters with respect to which we can compute coverage. All approaches give asymptotically valid inference for the mean outcome under the optimal rule under conditions, and thus the coverage with respect to this parameter is assessed across all methods.
The TMLE and DR-IPCW estimating equation-based approaches also estimate the data adaptive target parameter
The CV-TMLE and cross-validated DR-IPCW estimating equation approaches estimate the data adaptive target parameter
9 Simulation results
Figure 1 shows that the (CV-)TMLE is more efficient than the (CV-)DR-IPCW estimating equation methods in our single time point simulation, except for the cross-validated methods when
![Figure 1 Relative efficiency of TMLE and DR-IPCW methods compared to both EP0Yd0$$E_{P_0} Y_{d_0}$$ and the data adaptive parameter EP0(ψn−ψ0n)2$$E_{P_0}(\psi_{n}-\psi_{0n})^2$$ for the TMLE and DR-IPCW, and EP0(ψn−ψ˜0n)2$$E_{P_0}(\psi_{n}-\tilde{\psi}_{0n})^2$$ for the cross-validated methods. Results are provided both for the cases where the estimate En[Y|Aˉ(1),W]$$E_n[Y|\bar{A}(1),W]$$ of EP0[Y|Aˉ(1),W]$$E_{P_0}[Y|\bar{A}(1),W]$$ is correctly specified and the case where this estimate is incorrectly specified with the constant function 1/2. Error bars indicate 95% confidence intervals to account for uncertainty from the finite number of Monte Carlo draws in our simulation. (a) V=L1(1), (b) V=L1(1), …, L4(1)](/document/doi/10.1515/jci-2013-0022/asset/graphic/jci-2013-0022_figure1.jpg)
Relative efficiency of TMLE and DR-IPCW methods compared to both
![Figure 2 Coverage of 95% confidence intervals from the TMLE and DR-IPCW methods with respect to both EP0Yd0$$E_{P_0} Y_{d_0}$$ and the data adaptive parameter ψ0n$$\psi_{0n}$$ for the TMLE and DR-IPCW and ψ0n˜$$\tilde{\psi_{0n}}$$ for the cross-validated methods. Results are provided both for the cases where the estimate En[Y|Aˉ(1),W]$$E_n[Y|\bar{A}(1),W]$$ of EP0[Y|Aˉ(1),W]$$E_{P_0}[Y|\bar{A}(1),W]$$ is correctly specified and the case where this estimate is incorrectly specified with the constant function 1/2. The (CV-)TMLE outperforms the (CV-)DR-IPCW estimating equation approach for almost all settings. Error bars indicate 95% confidence intervals to account for uncertainty from the finite number of Monte Carlo draws in our simulation. (a) V=L1(1), (b) V=L1(1), …, L4(1)](/document/doi/10.1515/jci-2013-0022/asset/graphic/jci-2013-0022_figure2.jpg)
Coverage of 95% confidence intervals from the TMLE and DR-IPCW methods with respect to both
Figure 3a shows that the (CV-)TMLE is always more efficient than the (CV-)DR-IPCW estimating equation methods for our two time point simulation. Figure 3b shows that this increased efficiency does not come at the expense of coverage: the (CV-)TMLE always has better coverage than the (CV-)DR-IPCW estimators in our two time point simulation. In general, we see that the cross-validated methods always achieve approximately 95% coverage for the data adaptive parameter. This is to be expected because the cross-validated methods only learn the optimal rule on validation sets, and thus avoid finite sample bias when the conditional means of the outcome are averaged over the validation samples.
![Figure 3 (a) Relative efficiency of TMLE and DR-IPCW methods compared to both EP0Yd0$$E_{P_0} Y_{d_0}$$ and the data adaptive parameter EP0(ψn−ψ0n)2$$E_{P_0}(\psi_{n}-\psi_{0n})^2$$ for the TMLE and DR-IPCW, and EP0(ψn−ψ˜0n)2$$E_{P_0}(\psi_{n}-\tilde{\psi}_{0n})^2$$ for the cross-validated methods. (b) Coverage of 95% confidence intervals from the TMLE and DR-IPCW methods with respect to both EP0Yd0$$E_{P_0} Y_{d_0}$$ and the data adaptive parameter ψ0n$$\psi_{0n}$$ for the TMLE and DR-IPCW and ψ0n˜$$\tilde{\psi_{0n}}$$ for the cross-validated methods. Both (a) and (b) give results both for the cases where the estimates of EP0[Y|Aˉ(1)=dn(A(0),V),Lˉ(1)]$$E_{P_0}[Y|\bar{A}(1)=d_n(A(0),V),\bar{L}(1)]$$ and EP0[Ydn|L(0)]$$E_{P_0}[Y_{d_n}|L(0)]$$ are correctly specified and the case where these estimates are incorrectly specified with the constant function 1/2. Error bars indicate 95% confidence intervals to account for uncertainty from the finite number of Monte Carlo draws in our simulation](/document/doi/10.1515/jci-2013-0022/asset/graphic/jci-2013-0022_figure3.jpg)
(a) Relative efficiency of TMLE and DR-IPCW methods compared to both
It may at first be surprising that the TMLE outperforms the DR-IPCW estimating equation method in a randomized clinical trial, especially given that the CV-TMLE and CV-DR-IPCW achieve similar coverage. In Appendix C we give intuition as to why this may be the case in a single time point randomized clinical trial. In short, this difference in coverage appears to occur because our proposed TMLE only fluctuates the conditional means for individuals who received the fitted treatment, thereby reducing finite sample bias that may result from estimating the optimal rule on the same sample that is used to estimate the mean outcome under this fitted rule.
We also looked at the average confidence interval width across Monte Carlo simulations for each method and simulation setting. For a given simulation setting, all four estimation methods gave approximately the same (
10 Discussion
This article investigated semiparametric statistical inference for the mean outcome under the V-optimal rule and statistical inference for the data adaptive target parameter defined as the mean outcome under a data adaptively determined V-optimal rule (treating the latter as given).
We proved a surprising and useful result stating that the mean outcome under the V-optimal rule is represented by a statistical parameter whose pathwise derivative is identical to what it would have been if the unknown rule had been treated as known, under the condition that the data is generated by a non-exceptional law [52]. As a consequence, the efficient influence curve is immediately known, and any of the efficient estimators for the mean outcome under a given rule can be applied at the estimated rule. In particular, we demonstrate a TMLE, and present asymptotic linearity results. However, the dependence of the statistical target parameter on the unknown rule affects the second-order terms of the TMLE, and, as a consequence, the asymptotic linearity of the TMLE requires that a second-order difference between the estimated rule and the V-optimal rule converges to zero at a rate faster than
Therefore, we proceeded to pursue statistical inference for so-called data adaptive target parameters. Specifically, we presented statistical inference for the mean outcome under the dynamic treatment regime we fitted based on the data. We showed that statistical inference for this data adaptive target parameter does not rely on the convergence rate of our estimated rule to the optimal rule, and in fact only requires that the data adaptively fitted rule converges to some (possibly suboptimal) fixed rule. However, even in a sequential RCT, the asymptotic linearity theorem still relies on an empirical process condition that limits the data adaptivity of the estimator of the rule. So, even though the assumptions are much weaker, they can still cause problems in finite samples when V is high dimensional, and possibly even asymptotically.
Therefore, we proceeded with the average of sample split specific target parameters, as in general proposed by van der Laan et al. [46], where we show that statistical inference can now avoid the empirical process condition. Specifically, our data adaptive target parameter is now defined as an average across J sample splits in training and validation sample of the mean outcome under the dynamic treatment fitted on the training sample. We presented CV-TMLE of this data adaptive target parameter, and we established an asymptotic linearity theorem that does not require that the estimated rule is consistent for the optimal rule, let alone at a particular rate. The CV-TMLE also does not require the empirical process condition. As a consequence, in a sequential RCT, this method provides valid asymptotic statistical inference without any conditions, beyond the requirement that the estimated rule converges to some (possibly suboptimal) fixed rule.
We supported our theoretical findings with simulations, both in the single and two time point settings. Our simulations supported our claim that it is easier to have good coverage of the proposed data adaptive target parameters than the mean outcome under the optimal rule, though the results for this harder mean outcome under the optimal rule parameter were also promising. In future work we hope to apply these methods to actual data sets of interest, generated by observational controlled trial as well as RCTs.
It might also be of interest to propose working models for the mean outcome
Drawing inferences concerning optimal treatment strategies is an important topic that will hopefully help guide future health policy decisions. We believe that working with a large semiparametric model is desirable because it helps to ensure that the projected health benefits from implementing an estimated treatment strategy are not due to bias from a misspecified model. The TMLEs presented in this article have many desirable statistical properties and represent one way to get estimates and make inference in this large model. We look forward to future advances in statistical inference for parameters that involve optimal dynamic treatment regimes.
Funding statement: Funding: National Institute of Allergy and Infectious Diseases, (Grant / Award Number: ‘R01 AI074345-06’)
Acknowledgements
This research was supported by an NIH grant R01 AI074345-06. AL was supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) program. The authors would like to thank the anonymous reviewers and Erica Moodie for their invaluable comments and suggestions to improve the quality of the paper. The authors would also like to thank Oleg Sofrygin for valuable discussions.
Appendix A
Proofs
Proof of Theorem 1. Let
For each value of
However, by assumption, the latter function only depends on
Given we found
where we used the iterative conditional expectation rule, taking the conditional expectation of
The following lemma will be useful for proving Theorem 2.
Lemma 1. Recall the definitions of
where
Proof of Lemma A.1. For a point treatment data structure
□
Proof of Theorem 3. By the definition of
□
Proof of Lemma 1. Below we omit the dependence of
The first term in the final equality is always 0 because
where the final inequality holds by Hölder’s inequality. The above also holds when the limit is taken as
Proof of Theorem 4. By Theorem 3, we have
where
The Donsker condition and the mean square consistency of
see, for example, van der Vaart and Wellner [58]. By assumption,
as desired. □
Proof of Theorem 6. For all
Summing over j and using (13) gives:
We also have that:
The above follows from the first by applying the law of total expectation conditional on the training sample, and then noting that each
It follows that:
□
Appendix B: Estimators of the mean outcome under the optimal rule
TMLE of the mean outcome under a given rule
This TMLE for a fixed dynamic treatment rule has been presented in the literature, but for the sake of being self-contained it will be shortly described here. The TMLE yields a substitution estimator that empirically solves the estimating equations corresponding to the efficient influence curve, analogous to Theorem 2 for general d. By substitution estimator, we mean that the TMLE can be written as the mapping
Assume without loss of generality that
Regress
Note that we have only used individuals who are not right censored at time 1 to obtain this fit. The above regression can be fitted using a data adaptive technique such as super-learning [59]. To estimate
where we remind the reader that we are treating the rule
where
Let
as offset. This defines a targeted estimate
of the regression function, where we remind the reader that the targeted estimate is chosen to ensure that the empirical mean of the component
We now develop a targeted estimator of the second regression function in
on
One can estimate this quantity using the super-learner algorithm among all individuals who are not right censored at time 0. For honest cross-validation in the super-learner algorithm, the nuisance parameter
For an estimate of
where
Let
on
Plugging the targeted regressions and
Let
CV-TMLE of the mean outcome under data adaptive V-optimal rule
Let
represent an initial estimate of
where
Note that the fluctuation
where
for all
as offset. Thus each observation i is paired with nuisance parameters that are fit on the training sample which does not contain observation i. This defines a targeted estimate
of
We now aim to get a targeted estimate of
in the same manner as we estimated the quantity in (18), with the caveat that we replace
Consider the fluctuation submodel
where
Again the fluctuation
where
on
as offset. This defines a targeted estimate
of
Let
Further, each
The only modification relative to the original CV-TMLE presented in Zheng and van der Laan [55] is that in the above description we change our target on each training sample into the training sample-specific target parameter implied by the fit
Appendix C: Why the TMLE may have better coverage than the estimating equation approach in a randomized clinical trial
We wrote this section after performing our simulations because we wanted to understand why the TMLE is outperforming the DR-IPCW estimating equation approach by such a wide margin. The two approaches do not typically give such disparate estimates in a randomized clinical trial, so it is natural to ask why this is happening in our simulations. Part of this section is conjecture (which is in line with our simulations), but we offer some justification to support this conjecture.
We now offer a heuristic explanation of why the TMLE may have better coverage than the DR-IPCW estimating equation approach when estimating the data adaptive parameter
where g is the intervention mechanism under P. Again we have that
For any fixed rule
where
Further,
where the expectation is over the observed sample
The DR-IPCW estimating equation gives the estimator:
This estimator has bias
Consider the simple linear TMLE which fluctuates
where we recall that
if
This linear fluctuation TMLE has bias
The arguments presented in this section are mainly interesting if
The term
References
1. RobinsJM. A new approach to causal inference in mortality studies with sustained exposure periods-application to control of the healthy worker survivor effect. Math Mod1986;7:1393–512.10.1016/0270-0255(86)90088-6Search in Google Scholar
2. RobinsJM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. In Proceedings of the Biopharmaceutical Section. American Statistical Association, 1993.Search in Google Scholar
3. RobinsJM. Causal inference from complex longitudinal data. In BerkaneM, editor. Latent variable modeling and applications to causality. New York: Springer, 1997:69–117.10.1007/978-1-4612-1842-5_4Search in Google Scholar
4. RobinsJM. Marginal structural models versus structural nested models as tools for causal inference. In: HalloranME, BerryD, editors. Statistical models in epidemiology, the environment, and clinical trials (Minneapolis, MN, 1997). New York: Springer, 2000:95–133.10.1007/978-1-4612-1284-3_2Search in Google Scholar
5. HollandPW. Statistics and causal inference. J Am Stat Assoc1986;810:945–60.10.1080/01621459.1986.10478354Search in Google Scholar
6. NeymanJ. Sur les applications de la théorie des probabilites aux experiences agaricales: essay des principle (1923). Excerpts reprinted (1990) in English (D. Dabrowska and T. Speed), trans. Stat Sci1990;5:463–72.Search in Google Scholar
7. PearlJ. Causality: models, reasoning and inference, 2nd ed. New York: Cambridge University Press, 2009.10.1017/CBO9780511803161Search in Google Scholar
8. RobinsJM. Addendum to: “A new approach to causal inference in mortality studies with a sustained exposure period–application to control of the healthy worker survivor effect”. Comput Math Appl1987;140:923–45. ISSN 0097-494310.1016/0898-1221(87)90238-0Search in Google Scholar
9. RobinsJM. A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. J Chron Dis (40, Supplement)1987;2:139s–161s.10.1016/S0021-9681(87)80018-8Search in Google Scholar
10. RubinDB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol1974;66:688–701.10.1037/h0037350Search in Google Scholar
11. RubinDB. Matched sampling for causal effects. Cambridge, MA: Cambridge University Press, 2006.10.1017/CBO9780511810725Search in Google Scholar
12. LavoriP, DawsonR. A design for testing clinical strategies: biased adaptive within-subject randomization. J R Stat Soc Ser A2000;163:29–38.10.1111/1467-985X.00154Search in Google Scholar
13. LavoriP, DawsonR. Adaptive treatment strategies in chronic disease. Annu Rev Med2008;59:443–53.10.1146/annurev.med.59.062606.122232Search in Google Scholar
14. MurphyS. An experimental design for the development of adaptive treatment strategies. Stat Med2005;24:1455–81.10.1002/sim.2022Search in Google Scholar
15. RosthøjS, FullwoodC, HendersonR, StewartS. Estimation of optimal dynamic anticoagulation regimes from observational data: a regret-based approach. Stat Med2006;88:4197–215.10.1002/sim.2694Search in Google Scholar
16. ThallP, MillikanR, SungH-G. Evaluating multiple treatment courses in clinical trials. Stat Med2000;19:10111028.10.1002/(SICI)1097-0258(20000430)19:8<1011::AID-SIM414>3.0.CO;2-MSearch in Google Scholar
17. ThallP, SungH, EsteyE. Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. J Am Stat Assoc2002;39:29–39.10.1198/016214502753479202Search in Google Scholar
18. WagnerE, AustinB, DavisC, HindmarshM, SchaeferJ, BonomiA. Improving chronic illness care: translating evidence into action. Health Aff2001;20:64–78.10.1377/hlthaff.20.6.64Search in Google Scholar
19. PetersenML, DeeksSG, MartinJN, van der LaanMJ. History-adjusted marginal structural models to estimate time-varying effect modification. Am J Epidemiol2007;166:985–93.10.1093/aje/kwm232Search in Google Scholar
20. van der LaanMJ, PetersenML. Causal effect models for realistic individualized treatment and intention to treat rules. Int J Biostat2007;3:Article 3.10.2202/1557-4679.1022Search in Google Scholar
21. RobinsJ, OrallanaL, RotnitzkyA. Estimation and extrapolation of optimal treatment and testing strategies. Stat Med2008;27:4678–721.10.1002/sim.3301Search in Google Scholar
22. LavoriP, DawsonR. Dynamic treatment regimes: practical design considerations. Clin Trials2004;1:9–20.10.1191/1740774S04cn002oaSearch in Google Scholar
23. ChakrabortyB, MurphySA, StrecherV. Inference for non-regular parameters in optimal dynamic treatment regimes. Stat Methods Med Res2010;19:317–43.10.1177/0962280209105013Search in Google Scholar PubMed PubMed Central
24. KasariC. Developmental and augmented intervention for facilitating expressive language. ClinicalTrials.gov database, updated Apr. 26, 2012, Natl. Inst:0 accessed July 24, 2013, 2009.Search in Google Scholar
25. LeiH, Nahum-ShaniI, LynchK, OslinD, MurphyS. A SMART design for building individualized treatment sequences. Annu Rev Clin Psychol2011;8:21–48.10.1146/annurev-clinpsy-032511-143152Search in Google Scholar PubMed PubMed Central
26. Nahum-ShaniI, QianM, AlmirallD, PelhamWE, GnagyB, FabianoGA, et al. Experimental design and primary data analysis methods for comparing adaptive interventions. Psychol Methods2012;17:457–77.10.1037/a0029372Search in Google Scholar PubMed PubMed Central
27. Nahum-ShaniI, QianM, AlmirallD, PelhamWE, GnagyB, FabianoGA, et al. Q-learning: a data analysis method for constructing adaptive interventions. Psychol Methods2012;17:478–94.10.1037/a0029373Search in Google Scholar PubMed PubMed Central
28. JonesH. Reinforcement-based treatment for pregnant drug abusers. ClinicalTrials.gov data base, updated October 19, 2012, Natl. Inst:0 accessed July24, 2013, 2010.Search in Google Scholar
29. ChakrabortyB, MurphySA. Dynamic treatment regimens. Annu Rev Stat Appl2013;1:1–18.10.1146/annurev-statistics-022513-115553Search in Google Scholar PubMed PubMed Central
30. MurphySA. Optimal dynamic treatment regimes. J R Stat Soc Ser B2003;65:331–55.10.1111/1467-9868.00389Search in Google Scholar
31. RobinsJM. Discussion of “optimal dynamic treatment regimes” by Susan A. Murphy. J R Stat Soc Ser B2003;65:355–66.10.1111/1467-9868.00389Search in Google Scholar
32. RobinsJM. Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium on Biostatistics 2004:189–326.10.1007/978-1-4419-9076-1_11Search in Google Scholar
33. SuttonR, SungH. Reinforcement learning: an introduction. Cambridge, MA: MIT Press, 1998.Search in Google Scholar
34. van der LaanMJ, RobinsJM. Unified methods for censored longitudinal data and causality. New York: Springer, 2003.10.1007/978-0-387-21700-0Search in Google Scholar
35. OrellanaL, RotnitzkyA, RobinsJM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part I: main content. Int J Biostat2010;6:Article 8.10.2202/1557-4679.1200Search in Google Scholar
36. BangH, RobinsJM. Doubly robust estimation in missing data and causal inference models. Biometrics2005;61:962–72.10.1111/j.1541-0420.2005.00377.xSearch in Google Scholar PubMed
37. van der LaanMJ. The construction and analysis of adaptive group sequential designs. Technical Report 232, Division of Biostatistics, University of California, Berkeley, CA, 2008.Search in Google Scholar
38. van der LaanMJ, RoseS. Targeted learning: causal inference for observational and experimental data. New York: Springer, 2012.Search in Google Scholar
39. van der LaanMJ, RubinDB. Targeted maximum likelihood learning. Int J Biostat2006;2:Article 11.10.2202/1557-4679.1043Search in Google Scholar
40. PetersenM, SchwabJ, GruberS, BlaserN, SchomakerM, van der LaanMJ. Targeted minimum loss based estimation of marginal structural working models. J Causal Inference2013;submitted.Search in Google Scholar
41. van der LaanMJ, GruberS. Targeted minimum loss based estimation of causal effects of multiple time point interventions. Int J Biostat2012;8:Article 9.10.1515/1557-4679.1370Search in Google Scholar PubMed
42. CottonC, HeagertyP. A data augmentation method for estimating the causal effect of adherence to treatment regimens targeting control of an intermediate measure. Stat Biosci2011;3:28–44.10.1007/s12561-011-9038-1Search in Google Scholar
43. HernanMA, LanoyE, CostagliolaD, RobinsJM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic Clin Pharmacol2006;98:237–42.10.1111/j.1742-7843.2006.pto_329.xSearch in Google Scholar PubMed
44. ShortreedS, MoodieE. Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential-multiple assignment randomized CATIE Schizophrenia Study. J R Stat Soc C2012;61:577–99.10.1111/j.1467-9876.2012.01041.xSearch in Google Scholar PubMed PubMed Central
45. RobinsJM, LiL, TchetgenE, van der VaartAW. Higher order influence functions and minimax estimation of non-linear functionals. In Probability and statistics: essays in honor of David A. Freedman. Beachwood, OH: Institute of Mathematical Statistics, 2008:335–421. doi:10.1214/193940307000000527. Available at: http://projecteuclid.org/euclid.imsc/120758009210.1214/193940307000000527Search in Google Scholar
46. van der LaanMJ, HubbardAE, KheradS. Statistical inference for data adaptive target parameters. Technical Report 314, Division of Biostatistics, University of California, Berkeley, CA, 2013.Search in Google Scholar
47. van der LaanMJ. Targeted learning of an optimal dynamic treatment and statistical inference for its mean outcome. Technical Report 317, UC Berkeley, CA, 2013.Search in Google Scholar
48. van der LaanMJ, LuedtkeAR. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. Technical Report 329, UC Berkeley, CA, 2014.Search in Google Scholar
49. RobinsJM, RotnitzkyA, ScharfsteinDO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: HalloranME, BerryD, editors. Statistical models in epidemiology, the environment and clinical trials. IMA Volumes in Mathematics and Its Applications. Springer, 1999.10.1007/978-1-4612-1284-3_1Search in Google Scholar
50. BickelPJ, KlaassenCA, RitovY, WellnerJ. Efficient and adaptive estimation for semiparametric models. New York: Springer, 1997.Search in Google Scholar
51. van der VaartAW. Asymptotic statistics. New York: Cambridge University Press, 1998.Search in Google Scholar
52. RobinsJ, RotnitzkyA. Discussion of “dynamic treatment regimes: technical challenges and applications. Electron J Stat2014;8:1273–89. doi:10.1214/14-EJS908. URL http://dx.doi.org/10.1214/14-EJS908Search in Google Scholar
53. DíazI, van der LaanMJ. Targeted data adaptive estimation of the causal dose response curve. Technical Report 306, Division of Biostatistics, University of California, Berkeley, CA, submitted to JCI, 2013.Search in Google Scholar
54. van der LaanMJ, PetersenML. Targeted learning. In ZhangC, MaY, editors. Ensemble machine learning. New York: Springer, 2012.10.1007/978-1-4419-9326-7_4Search in Google Scholar
55. ZhengW, van der LaanMJ. Asymptotic theory for cross-validated targeted maximum likelihood estimation. Technical Report 273, Division of Biostatistics, University of California, Berkeley, CA, 2010.10.2202/1557-4679.1181Search in Google Scholar PubMed PubMed Central
56. ZhengW, van der LaanMJ. Cross-validated targeted minimum loss based estimation. In van der LaanMJ, RoseS, editors. Targeted learning: causal inference for observational and experimental studies. New York: Springer, 2012.Search in Google Scholar
57. R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. Available at: http://www.R-project.org/Search in Google Scholar
58. van der VaartAW, WellnerJA. Weak convergence and empirical processes. New York: Springer, 1996.10.1007/978-1-4757-2545-2Search in Google Scholar
59. van der LaanMJ, PolleyE, HubbardA. Super learner. Stat Appl Genet Mol Biol2007;6:Article 25.10.2202/1544-6115.1309Search in Google Scholar PubMed
60. HernánMA, RobinsJM. Estimating causal effects from epidemiological data. J Epidemiol Community Health2006;60:578–86.10.1136/jech.2004.029496Search in Google Scholar PubMed PubMed Central
©2015 by De Gruyter