Abstract
This work focuses on designing experiments for pharmacometrics studies using Non-Linear Mixed Effects Models including covariates to describe between-subject variability. Before collecting and modelling new clinical trial data, choosing an appropriate design is crucial. Assuming a known model with covariate effects and a joint distribution for covariates in the target population from previous clinical studies, we propose to optimise the allocation of covariates among the subjects to be included in the new trial. It aims achieving better overall parameter estimations and therefore increase the power of statistical tests on covariate effects to detect significance, and more importantly, clinical relevance or non-relevance of relationships. We suggested dividing the domain of continuous covariates into clinically meaningful intervals and optimised their proportions, along with the proportion of each category for the discrete covariates. We used the Fisher Information Matrix and developed a fast and deterministic computation method, leveraging Gaussian quadrature and copula modelling. The optimisation problem was formulated as a convex problem subject to linear constraints, allowing resolution using Projected Gradient Descent algorithm. Different scenarios for a pharmacokinetics model were explored. We showed the benefit of covariate optimisation in reducing the number of subjects needed to achieve desired power in covariate tests.
1 Introduction
In pharmacometrics longitudinal data collected during clinical trials are leveraged through Non-Linear Mixed Effect Models (NLMEM). In particular, NLMEM allow to quantify the relationships between covariates and parameters [1]. Indeed, covariates are often included in pharmacometrics models to describe predictable between subjects variability, thereby enhancing model fit and/or model-based predictions [2].
Before collecting and modelling the data, a key step is to chose an appropriate design. In NLMEM, a population design refers to the number of subjects and their allocation to elementary designs, an elementary design being a combination of values for the design variables, such as number of samples, sampling times, dosage regimen, etc. Usually, covariates are not considered at this step.
In its Population Pharmacokinetics (PK) Guidance for Industry 2022 [3], the FDA emphasises that “Simulations and optimal design methods can maximize the utility of population PK data collection and analyses” and stresses in particular that these methods can aim to estimate “major covariate effects of interest” with greater precision. Nevertheless, the method proposed for power assessment and sample size computation in this recommendation is clinical trial simulations (CTS), although it is computationally expensive and less exhaustive than optimal design strategies. The traditional theory of optimal experimental design, initially developed for non-linear models [4], have already been imported to the field of NLMEM and pharmacometrics [5, 6]. It relies on Fisher Information Matrix (FIM) computation, through linearization of the structural model, or stochastic simulations or Gaussian approximation [7–9]. Thus, before the start of a new study, given a NLMEM and parameters value, one can compute the expected standard error (SE) of the model parameters and deduce whether enough information will be collected to meet the objectives of the study. Thereafter, design optimisation consist in finding the design of experiments that provides the most informative data for estimating model parameters while respecting some design constraints, such as number of subjects, number of sampling times or samplings times windows. These optimal design strategies require to assume a model and a vector of population parameters values. Methods to compute robust designs to the prior choice of parameter values or/and to the prior choice of model in the context of NLMEM have already been explored [10–13] but will not be considered here.
Various optimisation software especially dedicated to design optimisation in the context of clinical trial using NLMEM have been developed in the past 20 years. Among them we can cite PFIM [14], which was the first software tool proposed in 2001 for evaluating a design without using simulations, especially, the R package PFIM 6.1 [15, 16] (CRAN release October 2024); PopED [17, 18] and NONMEM$DESIGN [19].
Numerous studies have demonstrated the benefits of using this methodology to optimise sampling times but very few studies have addressed the question of which covariate values would give the most information.
Nevertheless, if a study aims to detect whether some covariate relationships are significant or relevant, this design optimisation procedure needs to account for covariates. In a previous work, we explored methods for handling covariates in FIM computation [20]. There are based on Monte-Carlo (MC) approximation, and propose to leverage either an historical covariate dataset, a known covariate distribution or a copula. Their accuracy in predicting uncertainty and power of significance and relevance tests was assessed in a simulation study [20] using a simple PK example and under various scenarios, including different sample sizes, sampling points, and IIV. In addition the FIM computation using a prior covariate dataset was applied to a Phase 3 example with a model including 27 covariate relationships and was found to accurately predict uncertainty even in this more complex setting. These methods have not yet been used for computing the FIM during design optimisation process, but it does not present any particular technical difficulties. In this work we propose to go a step further: assuming a covariate distribution within the target population, we propose, for a fixed design, NLMEM and parameter vector, to optimise the selection of subjects to be included in the future trial from this distribution.
Previous works have looked at how to account for covariates when allocating patients to different arms, and how an unbalanced design could improve efficiency and ethics, including model-based approaches (see for instance [21] for a review). However, these approaches do not use the longitudinal data framework nor NLMEM. In addition, they aim to optimised the design sequentially and seek, for each new patient entering the trial, to allocate them to the best possible arm in terms of efficacy and ethical considerations, while maintaining randomisation. Consequently, it does not correspond to our setting aiming to optimise the design prior to setting up the study. In [22], a new method based of the FIM for optimising design using NLMEM where covariates are design variables is proposed. However, the proposed methods require the FIM to be evaluated for a set of fixed covariate values, without taking into account the distribution of these covariates, so the combinatorial approach can result in a very large number of elementary designs to be evaluated and the optimisation result is discrete, making it difficult to manage continuous covariates. In addition, there are no recommendations on how to take into account multiple covariates that are correlated with each other, or on the practical management of covariate distributions.
To this end, we developed in this work a new method aiming to reduce the computational cost of the FIM computation, to handle continuous and correlated covariates and to optimise their distribution. Indeed, with optimisation purposes, the required number of FIM evaluation increases and run time becomes a challenge. Therefore, avoiding MC simulations is a lever to reduce this cost. As a consequence, we introduce a Gaussian quadrature [23] (GQ) based approach, thus faster and deterministic. We have coupled this approach with the use of copula to describe the distribution of covariates, as it is very flexible and if already available, it can be used without individual data. Regarding optimisation, a hurdle to be overcome is that optimising the distribution of continuous covariates is not trivial as it is unrealistic to recommend including a given number of patients matching an exact value of the covariate. Therefore, we propose to segment the continuous covariates into different intervals and to optimise the proportion of each of them among the trial population. This has the combined advantage to remains realistic in practice while defining discrete parameters to be optimised. Finally, we propose a method for the constrained optimisation of covariates using a projected gradient method. The FIM is computed over a finite set combining elementary designs and covariate distributions and each combination is optimally weighted, given the constraints, using the Projected Gradient Descent (PGD) algorithm [24]. Once covariate distribution is optimised, the sample size can be optimised to reach desired level of precision or desired power in statistical test on covariate effects .
All this methods have been implemented using the R language [25] in a working version of PFIM 6. Scripts are available in the Zenodo repository https://doi.org/10.5281/zenodo.14778034. Based on the previously used simple PK model, we developed a workflow to assess the number of nodes required in the quadrature for an accurate FIM computation. Because prior data are not always available at design step, we used public database from the National Health and Nutrition Examination Surveys (NHANES) database [26] to get prior covariate distributions. Thereafter we explored various scenarios suing a simple PK model demonstrating the benefit of covariate distribution optimisation.
In Section 2, the notations and methods are detailed. In Section 3 we go through the example settings and results are presented in Section 4. Lastly global results are discussed in section 5.
2 Methods
In this section we detail notations and methods on study design and NLMEM. We expose the FIM computation accounting for covariate distribution, and especially we describe the segmentation of continuous covariates for optimisation purposes and how it is handle in the FIM computation. Then, we explain the FIM integration using copula and Gauss-Legendre Quadrature (GLQ). Lastly, we present optimisation of covariate distributions using PGD.
2.1 Notations on study designs and NLMEM
In the following, a population design is denoted Ξ = {N, (ξ1, …, ξN)}, where N is the number of subjects and the ξi are their elementary designs.
The observation vector yi for the ith subject is modelled as given in equation (1) where f denotes the structural non-linear model and θi = u (µ, β, Zi, ηi), the p-vector of individual parameters. The latter are defined as a function u, depending on fixed effects, composed of the typical values denoted µ and of the covariate effects β, and on individual random effects ηi, where ηi ∼ 𝒩 (0, Ω). Individual parameters also depend on the vector of possibly transformed individual covariates Zi.
We denote Zi = (ZDis,i, ZCont,i) a covariate vector when considered as a random vector, and zi = (zDis,i, zCont,i) as a realisation of this random vector. The lower .Dis refers to the discrete sub-vector of the covariate vector, while .Cont refers to the continuous sub-vector.
In pharmacometrics modelling, parameters are generally log-normally distributed with additive covariate relationships on the log scale, i.e. log θi = log µ + β (zi − zpop) + ηi.
The residual error is modelled by the function g, depending on ξi, θi and some parameters σ, and by the random variable εi, with .
The P -vector of population parameters is denoted Ψ = {µ, β, λ}, where λ contains the variance parameters (elements of Ω and of the residual error model σ).
To assess the magnitude of a covariate effect, one can compute the ratio of change in primary parameters when covariate values change relative to a reference value zref . Detailed formulas are given in [20], and for a log-normally distributed parameter with an additive covariate relationship on the log scale, the ratio writes if the covariate is binary, and the ratio associated to a given percentile PX of the covariate distribution if continuous writes
, where P 50 denotes the median of the covariate distribution.
2.2 Elementary and population FIM
Given an elementary design ξi, a population parameters vector Ψ, and a covariate realisation zi the expected elementary FIM, MF (ξi, Ψ, zi), is defined as the covariance matrix of the Fisher score (equation (2)) where l(yi; Ψ, zi, ξi) is the likelihood of the vector of observations yi for the population parameters Ψ, given the individual vector of covariates zi and the elementary design ξi (equation (3)).
Knowing the distribution pZ for the covariate vector, the expected FIM can be expressed as an expectation over the covariate vector as given in equation (4).
Assuming that the covariate distribution pZ is the same for all the elementary design, the expected population FIM is the sum of the elementary FIM (equation (5)).
In the following, we consider that the N subjects have the same elementary design ξ, thus the expected population FIM is MF (Ξ, Ψ, pZ) = NMF (ξ, Ψ, pZ).
2.2.1 Elementary FIM with discrete covariates only
In the case with only D discrete covariates, each with Qd, d = 1, …, D categories. The covariate vector follows a discrete distribution with possible values, denoted zDis,d, d = 1, …, n(QD), each having the probability pDis,d, d = 1, …, n(QD). Therefore, the elementary FIM for ξ writes as a weighted sum as given in equation (6).
2.2.2 Elementary FIM with discrete and continuous covariates
In case there are both non-independent discrete and continuous covariates, we denote pDis the distribution of the discrete sub-vector, and for each of its possible value zDis,d, we denote pCont|z Dis,d the distribution of the continuous sub-vector conditionally to zDis,d. The elementary FIM for a given ξ is given in equation (7).
2.3 Segmentation of continuous covariates
We assume here that a priori distributions are known for the covariates, and our aim is to distribute the subjects in the study. To achieve this, we suggested to segment the continuous covariates into different intervals, chosen for their clinical meaning. We then balanced the allocation between the different continuous intervals and between the discrete covariate categories. The advantage is that it maintains the continuous nature of the covariate in the FIM calculus, while offering a discrete segmentation, which is useful both for reducing dimensionality during optimisation and for inclusion of subjects based on the optimisation results.
We take the general framework, with respectively D discrete and C continuous covariates. The domain of each continuous covariate, indexed by c, is segmented in intervals denoted
. Combining those continuous intervals leads to
possible intervals for ZCont . Finally, there are L = n(QD) × n(ℐ) possible distributions for the covariate vector Z.
Conditionally to ZDis = zDis,d, the distribution of is denoted
. The proportion of subjects within the trial population and for which zDis,i = zDis,d and
is denoted:
.
This segmentation leads to the elementary FIM formula given in equation (8), which now depends on x.
We denote x the vector of size L containing the and for a better clarity in the notations, we denote xl the lth element of x and
the associated distribution for Z. The overall distribution for Z is denoted pZ(x). These notations result in the FIM formula given in equation (9).
Finally, the multivariate distribution of covariates within the trial is described as the weighting of L multivariate distributions, and it is these proportions that we are seeking to optimise.
2.4 FIM integration using copula and Gauss-Legendre Quadrature
We describe here how copula modelling and GLQ can be leverage to compute an elementary FIM MF ξ, Ψ, zDis, pCont, z .
The first aspect of our method is to characterise the covariate distribution through copula modelling to account for correlations in FIM computation as already proposed [20].
It is well known that given a random vector (Z1, Z2, …, ZC) which follows the distribution pCont, z|Dis and has continuous marginals F1, …, FC, the random vector (U1, U2, …, UC) = (F1(Z1), F2(Z2), …, FC(ZC)) has marginals uniformly distributed on [0; 1]. By definition, the copula of (Z1, Z2, …, ZC) is the cumulative distribution function (cdf) of (U1, U2, …, UC). The Rosenblatt transform [27] denoted TCont, z| Dis is defined in equation (10) and its inverse in equation (11).
Therefore, applying the inverse Rosenblatt transform to a vector of random uniform variables return a vector that has the same distribution as (Z1, Z2, …, ZC).
Consequently, MF ξ, Ψ, zDis, pCont, |zDis can be expressed using as given in equation (12).
The second aspect is to approximate this integral through a GQ [23]. The latter consists in approximating an integral by a weighted sum of the function to be integrated, evaluated at a number n(Q) of given points in the integration domain, called quadrature nodes. The advantage of GQ integration over the MC is that, since the nodes are chosen to represent the distribution harmoniously, equivalent accuracy is achieved with fewer GQ nodes than with random draws for MC [28].
In our case the integration domain for each of the C integrals is [0; 1]. Therefore, we apply C GLQ with a substitution, as GLQ allows to integrate over the interval [-1, 1]. Denoting vq and wq the qth node and its weight in GLQ, the substitutions are: and
.
Finally, MF ξ, Ψ, zDis, pCont, |zDis can be expressed as given in equation (13).
This methodology can thereafter be applied to compute each of the L matrices that appears in equation (9).
2.5 Optimisation of covariate distributions
We describe here the proposed algorithm to optimise the proportions xl of the L distributions of covariates within the population.
2.5.1 Optimisation criteria
In order to compare the FIM between different designs, a criteria mapping the matrix to a scalar needs to be defined. Among various optimality criterion used in theory of optimal design (see for instance [4]), the D-criterion is a popular one and is wieldy used in the field of pharmacometrics. It uses the determinant of the FIM, as given in equation (14) and maximizing it ensures that the parameter estimates are as precise as possible. Indeed, a larger determinant of the FIM corresponds to a smaller volume of the confidence ellipsoid for the parameter estimates, indicating higher precision.
The comparison of covariate repartition x against a reference xref for the D-criterion is done by calculating the D-efficiency ED (equation (15)).
2.5.2 Optimising covariate proportions using Projected Gradient Descent
Our aim is to maximize the D-criterion, for a fixed design Ξ by balancing between the different covariate distributions weighted by the xl, and given some constraints on these proportions.
Linear constraints
First, as x represents a vector of proportions, it should sum to 1 and each xl has to be in [0; 1]. Additional constraints can be set on the xl, for instance to make sure that a given interval for a continuous covariate does not represent more than a desired proportion and not less than another proportion. Therefore there are both equality and inequality constraints, and the optimisation problem is given in equation (16). We denote 𝒬 the constraint space.
Projected Gradient Descent
As the D-criterion is a convex function, the PGD [24] can be used to solve the problem given in equation (16).
Starting from an initial point x0 ∈ 𝒬, PGD iterates the equation given in (17), where αk ∈ [0; ∞] is the gradient step-size at the kth iteration, ϕD the gradient of ∇ϕD with respect to x and P𝒬 is the projection on the constraint space. This projection is also the solution of a minimization problem, given in equation (18).
PGD iterates until a stopping condition is met. The detailed computation for ∇ϕD (Ξ, Ψ, pZ, xk) when all the subjects have the same elementary design ξ is given in [29] and the result is recall in equation (19), where tr denotes the trace operator.
2.6 Optimisation of sample size
After having optimised the proportion of each covariate distribution, we can compute the sample size needed, NSN, all other things being equal, to reach desired confidence level on parameter estimates or on power of statistical tests on covariate effect.
Denoting SE(Ξ, Ψ, pZ(x))β the standard error on a parameter β ∈ Ψ for a design Ξ, population vector Ψ and a covariate distribution defined by pZ and x, the number of subjects needed NSN to reach a desired level SE* on this parameter is given in equation (20).
From this, we can calculate the NSN required to achieve the desired power for significance tests on covariates [20, 30], as well as relevance tests on covariate relationships [20] for a given equivalence interval [Rinf ; Rsup]. The non-relevance of a covariate effect is assessed by a TOST procedure at level α, with the null hypothesis is H0: “the covariate effect is relevant”, i.e. rl,c ∉ [Rinf ; Rsup] while the alternative hypothesis is H1: “the covariate is not relevant”, i.e. rl,c ∈ [Rinf ; Rsup]. This two sided null hypothesis can be split into two, respectively H0,inf and H0,sup :
Thus H0 is rejected if both H0,inf and H0,sup are rejected.
For log-normally distributed parameter with additive covariate relationships on the log-scale, and at level 1 − α, the null hypothesis is rejected if
and if
where depending on the sign of z − zref, Bsup and Binf equal
or
. The power is given by the equation (21), which holds if
.
The NSN to reach a desired non-relevance power P* is found using a round finder approach to solve equation (22).
2.7 Implementation
All these methods have been implementing using the R language [25]. For FIM evaluation an extension of the package PFIM 6, based on previous work [20] have been done. All scripts are available online at NEWZENODO. The quadrature nodes were computed using the function gauss.quad() from the package statmod [31]. For PGD, the projection on the constraint space was made using the function lsei() from the package limSolve [32].
3 Evaluation example using a simple PK model
In this section we explored an example to demonstrate the benefit of covariate distribution optimisation in reducing the sample size needed to draw conclusions about the relevance of covariate relationships.
3.1 Settings
3.1.1 Model
The PK model was a one compartment model with IV bolus and linear elimination, , with two parameters: the clearance (Cl) and the volume of distribution (V). The residual error was modelled as a combined error model: g = a+bf and exponential random effects without correlation were assumed for V and
and
. All covariate effects were modelled as additive on the log-scale: log θi = log θpop + βθ,ZZi + ηθ,i.
Firstly, three univariate scenarios were considered, with the model including either an effect of the Sex (SEX) on V, an effect of the Body Mass Index (BMI) on V, or an effect of the creatinine clearance (CLCR) on Cl. Both BMI and CLCR were first normalised and log transformed, applying respectively log and log
, where BMIpop and CLCRpop respectively denoted the average BMI and CLCR among Male individuals with Normal RF and Healthy Weight for BMI. These scenarios are hereafter referred to as ‘SEX only’, ‘BMI only’ and ‘CLCR only’. Then, the three effects were taken into account simultaneously, and this scenario is referred to as the ‘Three covariates’ scenario.
3.1.2 Covariate distribution specification
Because covariate data may not be available at design step, we used the public database from the National Health and Nutrition Examination Surveys (NHANES) database [26] to get prior covariate distributions. Covariate data from 2009 to 2020 were extracted, keeping only subjects between 18 and 80 years old, with BMI higher or equal to 18.5 kg/m2 and for whom there were no missing data for Age, Body Weight (BW), BMI, Height, Creatinine, and SEX. The body weight was adjusted for ideal body weight computed with Robinson’s formula [33] and the Cockcroft and Gault equation with adjusted body weight [34] was then used to estimate the creatinine clearance, CLCR. 29 463 covariate vectors were selected in the NHANES database and Table 1 provides the summary statistics of these covariates.
The BMI was segmented into three usual intervals, namely Obesity if BMI ≥ 30 kg/m2, Overweight if 25 ≤ BMI < 30 kg/m2 and Healthy Weight if 18.5 ≤ BMI < 25 kg/m2. The CLCR was segmented into the four usual intervals to define the renal function (RF), namely Normal RF if CLCR ≥ 90 mL/min, Mild RF if 60 ≤ CLCR < 90 mL/min, Moderate RF if 30 ≤ CLCR < 60 mL/min and Severe RF if CLCR < 30 mL/min. Of note, BMIpop = 22.9 kg/m2 and CLCRpop = 112.2 mL/min.
While there was a good balance between Male and Female and between the different BMI intervals, Severe RF was poorly represented with only 1.2% of the data.
Combining SEX, BMI intervals and CLCR intervals, there were 2 × 4 × 3 = 24 possible covariate combinations. Nevertheless, those including Severe RF were poorly represented in the dataset (e.g. only 40 subjects are Male, with Severe RF and Obesity). Therefore, Severe RF individuals were pooled into two classes depending on their SEX and without taking into account their BMI status. This pooling was carried out to ensure the adequacy of copula fits. The detailed repartition between the resulted 20 covariate combinations is given in the first panel of Figure 7.
3.1.3 Parameter values
The PK parameter values are given in Table 2, such as ratios of change caused by being a Female on V, being Obese on V and having Severe RF on Cl. With ratios of 0.70, i.e. outside of [0.80; 1.25], both the effect of being a Female on V and having Severe RF on Cl were tested for relevance; while rV,BMI (Obesity) was within the equivalence interval, thus the effect of being Obese on V was tested for non-relevance. It should be noted that rV,BMI (Obesity) and rCl,CLCR (Severe RF) were computed for the threshold values for belonging to these intervals, namely for 30 kg/m2 and 30 mL/min.
3.1.4 Design
The considered design includes N = 24 subjects receiving 250 mg of drug at time 0, and for each of them, three samples are collected at 1, 4 and 12 hours post-dose. The concentration curves f (t) accounting for univariate covariate effects and with parameters given in Table 2 are shown on Figure 1. Because being a Female reduces the volume, the typical concentration curve for Female is above the Male one. Because the higher the BMI the higher the volume, the typical area for concentration curve in individual with Healthy Weight is above the one for Overweight, which is above the one for Obesity. Regarding CLCR status, the lower the CLCR, the lower the typical curve, since renal impairment decreases CLCR.
Evolution of the concentration f (t) according to the fixed effects in the three univariate scenarios: the first panel corresponds to an effect of SEX on V, the second panel to an effect of BMI on V and the third panel to an effect of CLCR on Cl.For the second and third panels, the black lines correspond to the limits of the intervals of the continuous covariates.
3.2 FIM evaluation using copula and quadrature
3.2.1 Copula fitting
In the example with SEX only (i.e. without continuous covariate), the population FIM was computed for each SEX and the total FIM was obtained as the weighted sum, each FIM weighted by the proportion of the corresponding SEX in the database. With BMI only, three copula were fitted: one for each of the three datasets corresponding to BMI intervals. In the same way, in the example with CLCR only, four copula were fitted: one for each of the four datasets corresponding to CLCR intervals.
For the example with three covariates, a copula was fitted for each of the 20 covariate combination on the associated subset of data.
All copula were fitted as vine copula, using the R package rvinecopulib0.6.3.1.1 [35]. Copula fit evaluation was conducted through a simulation-based strategy as described in [36] using 100 replicates. The relative error (RE) between summary metrics Msim in copula-simulated populations and in the NHANES dataset was computed as . Summary metrics were the mean, the standard deviation, the median and the 5th and 95th percentiles. Overlaps between simulated and true distributions were also computed such as the simulated and observed correlations between BMI and CLCR. Diagnostics plots were explored using the R package pmxcopula [37] with minor edits.
3.2.2 Quadrature settings
According to the Gauss-Legendre Quadrature rule, the approximation using n(Q) nodes is exact for integrands that are polynomials of degree 2n(Q) − 1 or less. Nevertheless, in our case, the integrand is the composition of the FIM and the inverse Rosenblatt transform of the copula describing the continuous covariate distribution, therefore, in a general setting, it is not a polynomial function. Consequently, the quadrature remains an approximation and increasing n(Q) increases the accuracy.
To determine the number of nodes required in the quadrature for an acceptable accuracy in FIM evaluation, we computed the FIM for n(Q) between 1 and 25, and looked for stabilization in the D-criterion. The same number of nodes was used for all the continuous covariates and for each copula. As a benchmark, we also computed the FIM using copula with Monte Carlo (MC) simulations. This computation was performed using n(MC) = 10, 100, 500, 1000, and 2000 Monte Carlo samples drawn from each copula. Additionally, since it is a stochastic method, the process was repeated S = 100 times to assess the uncertainty of the computed FIM. The target value was chosen as the average ϕD obtained with 1000, and 2000 Monte-Carlo samples, and the RE on the D-criterion was computed with respect to this value. In the same time, we computed the run time as a function of n(Q), and the average run time in MC procedure as a function of the number of samples.
This comparison was performed for both BMI only and CLCR only, and because of computation already done, optimisation was then carried out with the higher number of nodes, i.e. n(Q) = 25. Because of reasonable run time, n(Q) = 25 was also used for the Three covariate scenario and MC approach using n(MC) = 500 was also explored as a reference.
3.3 Covariate distribution optimisation
In all scenarios, covariate distribution optimisation was performed using PGD. The step-size α was constant and set to 0.001 and the stopping condition was that the objective function should not change by more than ϵ = 10−7 between two iterations.
In univariate cases, optimisation was done for various values of the covariate effects, while the other PK parameter values were the same as those given in Table 2.
Different constraints were applied depending on the scenario.
3.3.1 SEX only
The optimal balance between Male and female was explored for βV,Sex ∈ [−1.20; 0.00] with a 0.05 step, with no other constraint than that the proportions should be within [0; 1] and sum to 1.
3.3.2 BMI only
Optimisation between the three BMI intervals was explored for βV,BMI ∈ [0.00; 1.00] with a 0.05 step, and for two optimisation settings. First without any other constraints than proportion definition. Then, optimisation among the three BMI intervals was performed with the additional constraint that each interval should represent at least 5% of the population. This setting is hereafter referred to as ‘Lower constraints’.
3.3.3 CLCR only
Optimisation between the four CLCR intervals was explored for βCl,CLCR ∈ [0.00; 1.00] with a 0.05 step. First, the two same settings as BMI only were explored. In addition, because Severe RF subjects are rare in the population, a third optimisation scenario was explored by adding to the lower limit constraint an upper limit set at 10% for Severe RF in the trial population. This setting is hereafter referred to as ‘Lower and Upper constraints’.
3.3.4 Three covariates
For the case with the three covariates, the three same optimisation settings as for CLCR only were explored.
3.4 Optimised distributions evaluation
The initial distribution between the SEX, CLCR and BMI intervals corresponds to the observed proportions in the selected subset from the NHANES database.
For each optimised distribution, the D-Efficiency relative to the design with the initial distribution was computed, in addition with the NSN to conclude with a 80% power on the statistical significance for each covariate effects, and the NSN to conclude with a 80% power on the relevance (or non relevance) of relationship for the effect of being a Female on V, of having Obesity on V and of having Severe RF on Cl.
4 Results
4.1 Copula fitting
The performance of copula model was evaluated for each covariate combination.
4.1.1 BMI only
For the scenario with BMI only, mean, standard deviation, median and 5th and 95th percentiles of the simulated population agreed with the observed one, with a small RE within ±2.5% (Appendix A.1.1, Figure A.1.1). Visual predicted checks (VPC) were also in accordance with the observed percentiles (Appendix A.1.1, Figure A.1.2).
4.1.2 CLCR only
Similarly, with CLCR only, RE on summary statistics was satisfactory, within ±5% for all the intervals (Appendix A.1.2, Figure A.1.3) and VPC were also in accordance with the observed percentiles (Appendix A.1.2, Figure A.1.4).
4.1.3 Three covariates
As shown in Appendix A.1.3, Figure A.1.5, RE was within ±10% for all the summary statistics for almost all the intervals of BMI and CLCR. Indeed the RE on the 5th or 95th percentiles of the simulations were a little higher for sd on BMI for Male, with Obesity and Moderate RF; for sd on BMI, P 5% and P 50% on CLCR for both Male and Female with Moderate RF. The simulated correlations from the copula models were very similar to observed correlations between BMI and CLCR (Appendix A.1.3, Figure A.1.7) and overlap between simulated and observed join distribution for BMI and CLCR was always above 80%, except for Female with Severe RF, for which the overlap was above 80% in only in three quarters of the simulations (Appendix A.1.3, Figure A.1.6). Moreover, the VPC given in Figure 2 show a a good match between simulated and observed level lines, for all combinations.
Three Covariates - Visual predictive checks based on the contours of the bivariate density between BMI and CLCR. For each of the 20 covariate combination, the virtual population was simulated 100 times from the copula and the 99% prediction intervals of percentile contours of the joint distribution was derived and compared to the contours observed in the NHANES database.
The ribbon areas correspond to the 99% prediction interval of 5th , 50th
and 95th
percentile contours of the joint distribution of BMI and CLCR. The lines corresponds to the observed 5th
, 50th
and 95th
percentile contours.
Overall, copula fits were judged to be very satisfactory.
4.2 FIM evaluation using quadrature
4.2.1 BMI only
As shown in Figure 3, quadrature method was much faster than MC. Indeed, with 25 nodes, the FIM computation was perform in 1.7 seconds, whereas even with only 100 MC samples,the average run time was 6.4 seconds, and it increased to 63.8 and 126.8 seconds for respectively 1000 and 2000 MC samples.
BMI only - Run time for ϕD computation (top) and relative error (RE) in ϕD computation (bottom), with FIM integration using on the left copula and MC and on the right copula and GLQ. The target value was the average ϕD obtained with n(MC) = 1000 and n(MC) = 2000.
The green line corresponds to 0. The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. The diamond corresponds to the mean. ϕD: refers to the D-criterion, FIM: Fisher Information Matrix MC: Monte-Carlo, GLQ: Gauss-Legendre Quadrature
Regarding the accuracy in ϕD computation, the first observation is that MC integration led to a non-negligible uncertainty when the number of MC samples n(MC) is low (Figure 3 and Figure A.2.1 given in Appendix A.2.1). In our example, the uncertainty is however very similar between 500, 1000 and 2000 MC sample, suggesting that 500 MC sample are acceptable for the FIM computation. For the quadrature approach, the RE to the target, defined as the average ϕD over the repetitions with 1000 or 2000 MC sample, was bellow 1% starting from 4 quadrature nodes and higher (Figure 3).
4.2.2 CLCR only
Once again, GQ method was much faster than MC with a run time of 2.3 seconds with 25 nodes whereas even with only 100 MC samples, the average run time was 8.6 seconds, and it increased to 86.0 and 170.4 seconds for respectively 1000 and 2000 MC samples (Figure A.2.3 given in Appendix A.2.2).
The uncertainty in ϕD computation with the MC approach was also very similar between 500, 1000 and 2000 MC sample, suggesting that 500 MC sample are acceptable for this FIM computation (Figure A.2.2 given in Appendix A.2.2).
With GQ, the relative error to the target was bellow 1% starting from 3 quadrature nodes and higher (Figure A.2.3 given in Appendix A.2.2).
4.2.3 Three Covariates
In this scenario, as the number of covariate combinations was higher, the run time was longer for n(Q) = 25, 533.0 seconds than the average run time with 500 MC samples: 412.7 seconds (sd = 2.1). Both methods gave very similar results with an average D-criterion of 62.5 with 500 MC samples (sd = 0.2) and 62.5 with n(Q) = 25 (Figure A.2.4 given in Appendix A.2.3).
4.3 Covariate distribution optimisation
4.3.1 SEX only
Optimal balance between Male and Female for the D-criteria and as a function of βV,SEX is given in Figure 4. The optimal distribution did not vary a lot and remained close to the 50-50 balance. For βV,SEX = 0, the optimal balance was perfect equilibrium between Male and Female. When βV,SEX decreased, the optimal weight for Female increased up to 67.8% for βV,SEX = −1.20. For the value used in the scenario with the three covariates, i.e. βV,SEX = −0.35, the optimal Female proportion in this univariate case was 62.0%.
SEX only - Optimal proportions for the covariate distributions for the D-criterion (top), ϕD, D-Efficiency and NSN to reach 80% power (bottom), as a function of βV,SEX
The purple vertical line corresponds to the value used in the Three covariates scenario. The grey area corresponds to the non-relevance interval. ϕD: refers to the D-criterion, NSN: Number of subjects needed
In Figure 4 are given the D-criterion, the D-Efficiency relative to the initial design, the NSN to conclude on the significance of βV,SEX, and the NSN to conclude on the relevance/non-relevance (when appropriate) of βV,SEX are given as a function of βV,SEX for both the initial design (51% of Females) and the Optimal design. In this context, the benefit of using the optimal covariate distribution was very small. As expected, NSNsignif (βV,SEX) and NSNrelev(V, SEX, (F emale)) decreased when βV,SEX get further from 0. For the value used in the scenario with the three covariates, the relative D-Efficiency was quite low: only 1.01. The optimal design increased slightly the uncertainty on βV,SEX therefore NSNsignif (βV,SEX) which initially was 28 became 30, and similarly NSNrelev(V, SEX, (F emale)) has been increased from 165 to 176.
4.3.2 BMI only
The optimal distribution between the three BMI intervals as a function of βV,BMI is given in Figure 5. With no constraints on the representation of all the statuses, we can see that the balance was only between the extremes: when βV,BMI = 0, the optimal was having 53.7% of Obese and 46.3% of Healthy Weight. When the BMI effect increased, the optimal proportion of Obese individual decreased, being 36.0% for βV,BMI = 1. With the lower constraint of having at least 5% of each interval, the optimal proportion of Overweight was always the minimum of 5%. With this lower constraint, the proportion of Obesity decreased less as the BMI effect increased (-2.2% when βV,BMI = 0 and -1.7% when βV,BMI = 1).
BMI only - Optimal proportions for the covariate distributions for the D-criterion (top), ϕD, D-Efficiency and NSN to reach 80% power (bottom), as a function of βV,BMI
The purple vertical line corresponds to the value used in the Three covariates scenario. The grey area corresponds to the relevance interval. ϕD: refers to the D-criterion, NSN: Number of subjects needed
For the value used in the scenario with the three covariates, i.e. βV,BMI = 0.5, the optimal Obesity proportion without constraint was 39.7% and with the Lower constraints, 38.0%.
The higher βV,BMI, the more there was benefit in using optimal distribution, no matter the constraint, in terms of overall estimation as the D-Efficiency increased with βV,BMI, from 1.05 when βV,BMI = 0 to respectively 1.15 and 1.14 when βV,BMI = 1 without and with lower constraints (Figure 5). Moreover, the addition of the lower constraints, which makes the distribution more realistic in terms of clinical requirements, was still very valuable, as the ϕD curve for the lower constraints was much closer to the ϕD curve without constraints than to the curve with the initial distribution. In the same way, the NSN to reach 80% in statistical tests with the lower constraints were closer to those without constraint than to those with the initial distribution. For high effect, the NSN to conclude of the significance of βV,BMI was all ready low (18 for βV,BMI = 1) and decreased with optimal distribution (respectively 14 and 15 without and with constraints). Bellow , the ratio of change caused by the Obesity on V was bellow 1.25, and obviously the NSN needed to conclude to the non-relevance of this effect decreased with βV,BMI, and was minimal in 0. With the initial distribution, NSNnonrelev(V, BMI, (Obesity)) = 28, and it decreased to 20 and 21 without and with constraints.
For the value used in the scenario with the three covariates, NSNsignif (βV,BMI) was respectively 70, 52 and 55 for the initial, without constraint and with lower constraints distributions, while the NSNnonrelev(V, BMI, (Obesity)) was 130, 97, 101.
4.3.3 CLCR only
With CLCR only, a similar pattern as for BMI, with allocation to the extreme statuses was observed without constraint (Figure 6). Thus, for βCl,CLCR = 0, the optimal distribution was having 52.2% of Severe RF and 47.8% of Normal RF. When the CLCR effect increased, the optimal proportion of Severe individual decreased, first slowly until βCl,CLCR = 0.85 (26.7%) then fell to 4.2% for βCl,CLCR = 1. In the same time, while the proportion of Moderate RF was null for βCl,CLCR = 0.85, it increased up to 19.3% for βCl,CLCR = 1. In other words, when the effect was stronger, less extreme small CLCR values were sufficient and better than the smaller ones.
CLCR only - Optimal proportions for the covariate distributions for the D-criterion (top), ϕD, D-Efficiency and NSN to reach 80% power (bottom), as a function of βCl,CLCR
The purple vertical line corresponds to the value used in the Three covariates scenario. The grey area corresponds to the non-relevance interval. ϕD: refers to the D-criterion, NSN: Number of subjects needed
When adding the lower constraint of having at least 5% of each CLCR interval, until βCl,CLCR = 0.85, 5% was the optimal for both Mild and Moderate RF. With a greater effect and as previously, the optimal proportion of Moderate increased up to 15.1 for βCl,CLCR = 1.
With the additional upper constraint of 10% for the Severe RF, there was a shift from Severe to Moderate, while Mild remained at the lower limit of 5%.
For the value used in the scenario with the three covariates, i.e. βCl,CLCR = 0.27, the optimal distribution Normal - Mild - Moderate - Severe were respectively 65.0 - 0 - 0 35.0 %, 58.6 - 5.0 - 5.0 - 31.4 % and 69.4 - 5 - 15.6 - 10.0 % without constraint, with lower constraints, and with lower and upper constraints.
As in the scenario with BMI only, adding the lower constraint on the proportions did not deteriorate too much the performance of the optimal distribution in terms of ϕD, D-efficiency and NSN . Constraining Severe RF being bellow 10% reduced the information and therefore increased the NSN but was still valuable compared to the initial distribution. The greater difference in D-Efficiency when using an optimal distribution compared to the initial one was observed for a null effect; because it was in this case than the optimal proportion of Severe RF was the higher while there were only 1.2% Severe in the initial distribution. Above, , the ratio of change caused by the Severe RF on CLCR was above 1.25, and logically the NSN needed to conclude to the relevance of this effect decrease with the increasing in βCl,CLCR.
For the value used in the scenario with the three covariates, NSNsignif (βCl,CLCR) was respectively 121, 32, 35 and 62 for the initial, without constraint, with lower constraints and with lower and upper constraints distributions, while the NSNrelev(Cl, CLCR, (Severe)) was respectively 684, 179, 194 and 348. Therefore, the use of the more constrained optimal distribution still made it possible to reduce by almost 50% the NSN to conclude on the relevance of the effect of having a Severe RF on Cl.
4.3.4 Three Covariates
The optimal distribution between the 20 covariate combinations in the scenario with three covariates are given in Figure 7 for the initial distribution and for the three optimisation settings. The first observation is that theses distributions were very sparse in the sense that many categories had a weight of 0.
Three Covariates - Optimal proportions for the covariate distributions for the D-criterion for the different sets of constraints
Because being a Female reduces V while V increases with the BMI, in optimal distributions, Female were more associated with Healthy Weight while Male were associated with Obesity. Indeed, this balance allows to maximise the difference in the theoretical concentration curves, higher for Healthy Female than for Obese Male. In the same way, because having a Severe RF decreases Cl and therefore slows the elimination and leads to higher concentration curves, in optimal distribution, only Female Severe RF were present. We thus find the same tendency to distribute between extremes as in the univariate analysis.
Marginal distributions are given in Appendix A.3 in Table 4. Regarding the SEX, no matter the constraint settings, the optimal balance in close to 66.3% Female, not so far from the 62.0% observed in the univariate case for SEX only.
For the BMI intervals, even without constraint, there were Overweight in the optimal distribution because of the pool of the BMI statuses in the Severe RF combinations. Nonetheless, the proportions of Obesity were close to the univariate scenario: 42.0% without constraint and 39.7% in univariate for the optimisation without constraint, 40.6% and 38.0% with lower constraints, and finally 43.0% with additional upper constraint on Severe RF.
Once again for the CLCR intervals, when there were no constraint, Mild and Moderate RF were absent from the optimal distribution, and the balance Normal - Severe was 68.7 - 31.3% therefore close to 65.0 - 35.0 % from the univariate case. With lower constraints, Mild and Moderate were still to their minimum and with the additional upper constraint, Severe RF was replaced by Moderate RF.
Regarding performances, having both lower and upper constraints remained close to the two other optimisation settings in terms of D-Efficiency and still allowed to reduce the NSN . Indeed the higher NSN to conclude on the significance of the three covariates effects, on the relevance of the effect of being a Female on V and of the effect of having Severe RF on Cl, and on the non-relevance of having Obesity on V was 596 with the initial design and fell to respectively 197 for both without constraint and with lower constraint optimisation and to 229 with the additional upper bound on Severe. Thus, even the more constrained optimisation allowed to reduce the NSN by more than 60%.
5 Discussion
From an application point of view, we proposed in this work to optimise the allocation of the covariates among the patients to be including in a future study. The main objective was to improve the estimation, but we were particularly interested in the effects of the covariates and the power of the statistical tests relating to them, as well as the NSN to conclude on their relevance with a sufficient degree of confidence. For relevance assessment, we used the equivalence region [0.80; 1.25] which is the one used in bioequivalence studies to define whether a dosing adjustment is required but alternative methods have been proposed to chose equivalence regions appropriate to a specific context [2]. This optimisation was carried out between categories for discrete covariates, while for continuous ones, we introduced a segmentation of their domains into intervals and optimised the proportion between them. In terms of methods, we have introduced a new way of integrating the FIM for the covariate distribution by leveraging copula modelling and GLQ, and we solved the optimisation problem using the PGD algorithm. All these methods have been implemented in R based on the PFIM 6.1 package, and the source code is available online at the following link: https://doi.org/10.5281/zenodo.14778034.
We applied the proposed workflow on a PK example with one discrete covariate and two continuous ones. First, univariate analysis showed classical and intuitive results, namely an equal distribution between the more extreme classes and a disappearance of intermediate values in cases where there are no constraints. Constraints were proposed to better reflect real life conditions for patients’ inclusion in clinical studies. Even with these constraints, we showed that in the scenarios with the three covariate effects, the optimal distribution with the most realistic constraints allowed to reduce by more than 60% the NSN to conclude on the clinical relevance (or non-relevance) of the relationships. The challenge in implementing this optimal distribution is the need to include 23 females with Severe RF, while with the initial distribution, it was just 5 subjects without restriction on the SEX. Nevertheless, as the reduction in the sample size is otherwise significant, it may be worth concentrating on this difficult inclusion.
We have shown that using GLQ greatly reduces the computation time compared with MC, while providing good accuracy in FIM computation. However, the choice of the quadrature settings has not been examined in depth. Indeed, we compared the results depending on the number of quadrature nodes with the ones obtained with MC simulations; but using this approach in practice negates all the benefits of run time reduction. In practice, a suggestion can be to increase progressively the number of nodes and just look for a stabilization of the D-criterion without reference. In addition, we have seen that the computation time remains quite low even with a high number of nodes, in our case 25. Nonetheless, this holds for three covariates but if the number of continuous covariates C increases, due to the quadrature of each of them, the number of FIM evaluations (n(Q))C increases exponentially. This difficulty could be countered, for example, by refining the number of nodes required for each distribution rather than using the same number for all the covariates, or by using adaptive methods that require fewer nodes (see for instance a review in [38]). However, this point should be qualified by the fact that, in practice, it seems unlikely that the number of covariates to be optimised will be very large. Moreover, it would still be faster than CTS, which requires simulation and then estimation on a large number of data sets.
We note that we did not explore the PGD hyper-parameters such as the step size nor some of performance-improving variants because the need was not felt in our situation, but it can be an option in more complex cases. However, in the current version of this algorithm, only linear constraints can be used for an easy projection on the constraint space, while in practice some non-linear ones may be useful (e.g., to impose the power of a test or a SE level on a parameter). To address this, a penalty term could be added to the objective function or another class of algorithms could be explored. Regarding the optimisation criteria, we only used the D-criterion while others exist such as the DS-criterion which allows focus on a specific subset of the parameters (see for instance [13] for an application). In our case, the covariates effects could have been the only parameters of interest. This change of criterion is not straightforward in the sense that its derivative has to be calculated, but apart from this calculation, it presents no particular difficulty.
In this work, we have assumed the same elementary design for all the subjects, but the computations can easily be extended to the case where there are several. It would therefore certainly be more beneficial to adapt the elementary design to the covariate combination. Then, an interesting path may be to jointly optimize the elementary designs and the covariates. If discrete optimization is performed for elementary designs, it can be combined with the covariates and the global process remains the same, only the number of FIM evaluations increases. We can easily imagine that this joint optimisation would enable us to improve the designs, for example in our case, for subjects with poor elimination, having a higher final observation time would guarantee a better estimation. On the other hand, for continuous optimization, the extension is harder. In both cases, it will first be necessary to consider the practical and ethical constraints and discuss the actual implementation with experts.
In our example we used a covariate database and the goodness of fit diagnosis suggested that a large amount of data is required but copulas could have been used directly if available, which is one of their great interests in pharmaceutical applications as they can be made public without threatening anonymity. In addition, the optimization approach can be coupled with another method for FIM computation if preferred.
Regarding FIM computation, we used a linearization approach but methods avoiding this approximation have already been explored using MC and GQ [39] [40]; and future work could extend them to include the integration of the covariate distribution.
Furthermore, we need to point out the possible difficulties in implementing these optimal design distributions. For example, it seems implausible not to include patients because they are not in the most extreme interval, when inclusion processes are already long and costly. The spirit of this optimisation would therefore be rather to aim towards the optimal in practice and above all to be aware of the conditions on the distribution of covariates and on the number of subjects that would be needed to reach a conclusion.
Last but not least, this approach assumes that the model and its parameters are known, including the effects of the covariates, whereas the trial to be designed aims to better estimate them, and is not robust to misspecifications. In the univariate example, we computed optimal allocation and NSN for a range of β and this could be extended into a robust approach as already explored in the NLMEM context [10–13]. Moreover, this methodology could be used to design a Phase 3 study based on Phase 2 results; therefore, the SE from the covariate effects estimates based on the Phase 2 data may be used to specify a prior distribution on the β. This absence of robustness also holds for the covariate distribution assumed for FIM computation and covariate optimisation. In this case, Adaptive Designs and especially Two-stage approaches [10, 29, 41] could be a good avenue to explore for refining the distribution of covariates halfway through inclusion.
Data Availability
All the R scripts are available at the following link: https://doi.org/10.5281/zenodo.14778034
A Appendix
A.1 Copula diagnostics
A.1.1 BMI only
BMI only - Relative error (RE) of distribution metrics (mean, standard deviation and percentiles) of BMI for the three intervals, as compared to the statistics of the NHANES database. For each BMI interval, the virtual population was simulated 100 times from the copula.
The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. sd refers to the standard deviation, and P 5%, P 50% and P 95% to the 5th, 50th and 95th percentiles respectively.
BMI only - Visual predictive checks based on the percentile of the BMI distribution. The virtual population was simulated 100 times from the copula and the 99% prediction intervals of the percentiles the distribution was derived and compared to the percentiles observed in the NHANES database.
The ribbon areas correspond to the 99% prediction interval of 5th , 50th
and 95th
percentiles of the BMI distribution. The lines correspond to the observed 5th
, 50th
and 95th
percentiles of the CLCR distribution.
A.1.2 CLCR only
CLCR only - Relative error (RE) of distribution metrics (mean, standard deviation and percentiles) of CLCR for the four intervals, as compared to the statistics of the NHANES database. For each CLCR interval, the virtual population was simulated 100 times from the copula.
The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. sd refers to the standard deviation, and P 5%, P 50% and P 95% to the 5th, 50th and 95th percentiles respectively.
CLCR only - Visual predictive checks based on the percentile of the CLCR distribution. The virtual population was simulated 100 times from the copula and the 99% prediction intervals of the percentiles the distribution was derived and compared to the percentiles observed in the NHANES database.
The ribbon areas correspond to the 99% prediction interval of 5th , 50th
and 95th
percentiles of the CLCR distribution. The lines correspond to the observed 5th
, 50th
and 95th
percentiles of the CLCR distribution.
A.1.3 Three covariates
Three Covariates - Relative error (RE) of marginal metrics (mean, standard deviation and percentiles) of continuous covariates for the 20 covariate combinations, as compared to the statistics of the NHANES database. For each combination, the virtual population was simulated 100 times from the copula.
The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. sd refers to the standard deviation, and P 5%, P 50% and P 95% to the 5th, 50th and 95th percentiles respectively.
Three Covariates - Overlap metric of 95th density contours of virtual population relative simulated 100 times from the copula to the NHANES database.
Overlap metric of 95th density contours of virtual population relative to observed population. Virtual population was simulated 100 times. The gray dashed line indicates 85% overlap percentage.
Three Covariates - Correlations of logBM I and logCLCR pair in the NHANES database (red diamond) and in virtual population relative simulated 100 times from the copula (boxplot).
The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. sd refers to the standard deviation, and P 5%, P 50% and P 95% to the 5th, 50th and 95th percentiles respectively.
A.2 FIM integration: MC vs GQ
A.2.1 BMI only
BMI only - ϕD computation with FIM integration using copula and MC (left) and copula and GLQ (right). The green line corresponds to the average ϕD obtained with n(MC) = 1000 and n(MC) = 2000. The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. The diamond corresponds to the mean. ϕD: refers to the D-criterion, FIM: Fisher Information Matrix MC: Monte-Carlo, GLQ: Gauss-Legendre Quadrature
A.2.2. CLCR only
CLCR only - ϕD computation with FIM integration using copula and MC (left) and copula and GLQ (right). The green line corresponds to the average ϕD obtained with n(MC) = 1000 and n(MC) = 2000. The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. The diamond corresponds to the mean. ϕD: refers to the D-criterion, FIM: Fisher Information Matrix MC: Monte-Carlo, GLQ: Gauss-Legendre Quadrature
CLCR only - Run time for ϕD computation (top) and relative error (RE) in ϕD computation (bottom), with FIM integration using on the left copula and MC and on the right copula and GLQ. The target value was the average ϕD obtained with n(MC) = 1000 and n(MC) = 2000. The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. The diamond corresponds to the mean. ϕD: refers to the D-criterion, FIM: Fisher Information Matrix MC: Monte-Carlo, GLQ: Gauss-Legendre Quadrature
A.2.3 Three Covariates
Three covariates - ϕD with FIM integration using copula and MC (left) and copula and GLQ (right). The boxplot displays the median, the 25th and 75th percentiles, while the whiskers are 5th and 95th percentiles. The diamond corresponds to the mean. ϕD: refers to the D-criterion, FIM: Fisher Information Matrix MC: Monte-Carlo, GLQ: Gauss-Legendre Quadrature
A.3 Optimisation results: marginal distributions
Footnotes
Funding statement This work was financed by a CIFRE agreement (Conventions Industrielles de Formation par la Recherche) of the ANRT (Association Nationale de la Recherche et de la Technologie). The CIFRE agreement is a partnership between a public laboratory and a company, here the UMR (Unité Mixte de Recherche) 1137 and Ipsen, respectively.