Random-effect based test for multinomial logistic regression: choice of the reference level and its impact on the testing
=========================================================================================================================

* Qianchuan He
* Yang Liu
* Meiling Liu
* Michael C. Wu
* Li Hsu

## ABSTRACT

Random-effect score test has become an important tool for studying the association between a set of genetic variants and a disease outcome. While a number of random-effect score test approaches have been proposed in the literature, similar approaches for multinomial logistic regression have received less attention. In a recent effort to develop random-effect score test for multinomial logistic regression, we made the observation that such a test is not invariant to the choice of the reference level. This is intriguing because binary logistic regression is well-known to possess the invariance property with respect to the reference level. Here, we investigate why the multinomial logistic regression is not invariant to the reference level, and derive analytic forms to study how the choice of the reference level influences the power. Then we consider several potential procedures that are invariant to the reference level, and compare their performance through numerical studies. Our work provides valuable insights into the properties of multinomial logistic regression with respect to random-effect score test, and adds a useful tool for studying the genetic heterogeneity of complex diseases.

Keywords
*   Genetic variants set
*   invariance property
*   multinomial logistic regression
*   random-effect score test
*   score statistics
*   statistical power

Random-effect based score test has been widely used to investigate the association between a set of genetic variants and a health outcome/trait (Wu et al., 2011; Maity et al., 2012; Sun et al., 2013). While various outcomes/traits have been considered for random-effect based score test, the multinomial outcome has received little attention until recently. Multinomial outcome analysis has important practical applications, such as the subtype analysis which concerns the association between genetic variants and multiple subtypes of a disease (Eckel-Passow et al., 2019). In multinomial analysis, one level is specified as the reference level, and the other levels are compared to this level to examine the association between the outcome and genotypes. It is generally anticipated that a statistical test should be invariant to the choice of the reference level. However, in a recent study, we made the observation that such a test in general is not invariant to the choice of the reference level (Liu et al., 2021). This is intriguing, because the logistic regression model -a model often considered as a special case of the multinomial logistic regression model -has long been observed to possess the invariance property. Moreover, the lack of invariance property for multinomial logistic regression is highly undesirable in practice, because practitioners may make potentially contradictory conclusions due to different choices of reference levels. Here, we elaborate this issue and conduct investigations to understand the fundamental cause of the problem. We first explain why the considered test is not invariant to the choice of the reference level, and then derive the analytical form of the power function when a given level is used as the reference. We next use simulations to compare several potential ways to deal with the non-invariance issue, and then provide practical guidelines at the end of the letter.

Consider a multinomial logistic regression model with *J* levels and *n* subjects. For *j* = 1, …, *J* and *i* = 1, …, *n*, let *Y**ji* = 1 if the *i*th person belongs to *j*th level, and *Y**ji* = 0 otherwise. Let *X**i* be the adjusting covariates with the first element being the intercept and *G**i* be the genotypes of *p* variants. Assume that the *J* th level is the reference level, then the model can be written as ![Graphic][1]</img>, for *j* = 1, *…, J* − 1, where *α**j* and *β**j* are the regression parameters. Let *P* (*Y**ji* = 1) = *µ**ji* for *j* = 1, *…, J*. Note that ![Graphic][2]</img> and ![Graphic][3]</img>. Then under *H* : *β*1 = *…* = *β*(*J*−1) = 0, the log-likelihood can be written as ![Formula][4]</img>  where ![Graphic][5]</img>. Let ![Graphic][6]</img> be the maximal likelihood estimator of (*α*1, *…, α*(*J*−1)) under *H*. Then, the estimated *µ**ji*’s are ![Graphic][7]</img>. Let *G* = (*G*1, *…, G**n*)T, *Y**j* = (*Y**j*1, *…, Y**jn*)T and ![Graphic][8]</img>. Then, the half score of the random effects for *β**j*(*j* = 1, *…, J* − 1) can be derived as ![Formula][9]</img>  Let I*J*−1 = diag(1, *…*, 1)(*J*−1)*×*(*J*−1), *X* = (*X*1, *…, X**n*)T, 𝕏= I*J*−1 ⊗ *X*, 𝔾 = I*J*−1 ⊗ *G*, and ![Graphic][10]</img>, where ![Formula][11]</img>  and ![Formula][12]</img>  Then the score statistic ![Graphic][13]</img>, where *λ**j*’s are eigenvalues of *V*. Let the p-value of this score statistic be *P**J*.

Suppose that we now wish to consider a different level as the reference level. Not to lose generality, let us consider the first level as the reference level. The model can be written as ![Graphic][14]</img>, for *j* = *J*, 2, *…, J* − 1 and *j′* = 1, 2, *…, J* − 1, where *γ**j′* and *ξ**j′* are the regression parameters. Then under the null hypothesis that *ξ*1 = *…* = *ξ*(*J*−1) = 0, the log-likelihood has a similar form as equation (1) and can be written as ![Formula][15]</img>  where ![Graphic][16]</img>. Since the likelihood in (2) is equal to that in (1), one can show that the parameter estimators ![Formula][17]</img>  Therefore ![Graphic][18]</img> based on equation (2) is the same as that based on equation (1). Then the half score of the random effects for *ξ**j*(*j* = 1, *…, J* − 1) can be written as ![Formula][19]</img>  It follows that ![Formula][20]</img>  Then the score statistic ![Graphic][21]</img>, where *ψ**j*’s are eigenvalues of *V*∗, where *V*∗ is the counterpart of *V*. Let the p-value of this score statistic be *P*1. In a similar manner, one can obtain *Q**j* and *P**j* when *j*th level is chosen as the reference level.

Recall that *S*1, *…, S**J*−1 are the scores when *J* th level is set as the reference, while *R*1, *…, R**J*−1 are the scores when the first level is set as the reference. The above derivation shows that there is a close relationship between *S*1, *…, S**J*−1 and *R*1, *…, R**J*−1. Indeed, using these results, we can further derive that ![Formula][22]</img>  matrix, then it follows that *R* = *AS* and the covariance matrix of *R, Cov*(*R*) = *ACov*(*S*)*A*T. Therefore we obtain the key results that *R*T*R* = *S*T*A*T*AS* and *V*∗ = *AV A*T.

The above results indicate that, when a different level is chosen as the reference level, the random-effect score statistics (*Q**j* and *Q**j**′*) will have different values, and the covariance matrices for the scores will also differ. Thus, *P**j* in general is not equal to *P**j**′*. In other words, the p-values of the described statistics are not invariant to the choice of the reference level. Then an interesting question arises, that is, why does the logistic regression model, which is a special case of the multinomial logistic regression model, indeed have the invariance property? It turns out that for *J* = 2, one has *A* = −I*p* and *R* = *AS* = −*S*. Then it follows that *R*T*R* = *S*T*S* and *V*∗ = *V*. Thus, when *J* = 2, i.e., in the case of logistic regression, the *p*-value remains the same regardless of which level is chosen as the reference level.

Since the p-value varies with the choice of reference level, we investigate how this choice influences the statistical power. For ease of presentation, let us consider *J* = 3, i.e., three levels for the outcome. Using the relationship between *R* and *S*, we have ![Formula][23]</img>  To facilitate presentation, define ![Graphic][24]</img>, then ![Graphic][25]</img>. Subsequently, we have that

*   the score statistic using *Y*3 as the reference is ![Graphic][26]</img>;

*   the score statistic using *Y*1 as the reference is ![Graphic][27]</img>;

*   the score statistic using *Y*2 as the reference is ![Graphic][28]</img>.

To study the asymptotical distribution of the test statistic *Q**j* under the alternative hypothesis, let us consider a special case: *X* = 1*n* ≡ (1, *…*, 1)*T*. Then it can be shown that ![Graphic][29]</img> and ![Graphic][30]</img>. Thus ![Formula][31]</img>  where ![Graphic][32]</img>. It is known that asymptotically *G**T* *HY*1 ∼ *N* (*G**T* *Hµ*1, Δ1 = *G**T* *H*Σ1*HG*), where Σ1 = diag(*µ*1(1 − *µ*1)). It follows that ![Graphic][33]</img> asymptotically follows a mixed noncentral chi-squared distribution: ![Formula][34]</img>  where *λ*1*r*’s are the eigenvalues of ![Graphic][35]</img> is the noncentral parameter, and *u*1*r*’s are the corresponding eigenvectors of Δ1. Similarly, let Σ2 = diag(*µ*2(1 − *µ*2)) and Δ2 = *G**T* *H*Σ2*HG*, then ![Formula][36]</img>  asymptotically, where *λ*2*r*’s are the eigenvalues of Δ2, and *u*2*r*’s are the corresponding eigenvectors of Δ2. To find the distribution for ![Graphic][37]</img>, let *µ*12 = (*G**T* *Hµ*1, *G**T* *Hµ*2)*T*, Σ12 = *Cov*(*Y*1, *Y*2) = diag(−*µ*1*µ*2), and ![Formula][38]</img>  Then one can derive that ![Formula][39]</img>  asymptotically. Therefore, ![Graphic][40]</img> asymptotically, where *λ**r*’s are eigenvalues of Δ12, and *u**r*’s are the corresponding eigenvectors of Δ12. In a similar manner, we can derive the asymptotical distributions of *Q*1 and *Q*2, respectively.

Recall that the power function is Ψ*j*(*Q**j* *≥ c**j*), where Ψ*j* is the cumulative distribution function of *Q**j* under *H*1, and *c**j* is the critical value determined by the distribution of *Q**j* under *H*. It is tempting to directly compare *Q**j*’s power functions using the above derived asymptotical distributions, but it is challenging to do so. This is because when the reference level is replaced, the *Q**j*’s asymptotical distributions under both the null and the alternative hypotheses will change, making it extremely difficult to compare the power across difference reference levels. On the other hand, when the *J* subtypes have similar proportions among the *n* subjects and there is no adjusting covariate, one can show that the asymptotical distributions of the *Q**j*’s are approximately equal to each other under the null hypothesis. Then it follows that the larger the statistic *Q**j* is, the more likely one will reject the *H*. Recall that ![Graphic][41]</img>. This suggests that, to maximize the power, the level with the smallest ![Graphic][42]</img> should be chosen as the reference. Our simulation studies confirmed this derivation (see Supplementary Material). The size of ![Graphic][43]</img> has practical interpretations. Recall that *S**j* is the inner product between ![Graphic][44]</img> and *G*. Thus, ![Graphic][45]</img> can be roughly seen as the correlation between *Y**j* and genotype. This suggests the level that has the weakest correlation with the genotype should be chosen as the reference level, which well matches intuitions.

The above analysis provides theoretical insights into the power of the random effects score test. However, in practice, it is generally unknown which ![Graphic][46]</img> is the smallest among the *J* levels. This can be seen from the following. Taking *S*1 as an example, we have ![Formula][47]</img>  where *P* (*Y*1 = 1|*G, X*) is a *n*-length vector with each element being ![Formula][48]</img>  Clearly, *E*(*S*1|*G, X*) is a quantity related to *G, X, α*1, *α*2, *β*1, *β*2. Since *α*1, *α*2, *β*1, *β*2 are unknown parameters, it is difficult to evaluate the size of ![Graphic][49]</img> accordingly. Hence, practical data analysis will need statistical tests that are invariant to the choice of the reference level. In the following, we consider three procedures to tackle this issue, and compare the performance of the three methods through simulation studies.

## I. A Bonferroni procedure

We use each of the *J* levels as the reference level, and based on *Q**j*’s, obtain the corresponding p-values *P*1, *…, P**J*. Then, use ![Graphic][50]</img> as the final p-value. The multiplication of *J* is a Bonferroni correction to ensure that correct type I error is maintained.

## II. A Cauchy procedure

We propose to adapt to a Cauchy procedure (Liu and Xie, 2020) to combine the *J* p-values, *P*1, *…, P**J*. Specifically, let ![Graphic][51]</img>, where ![Graphic][52]</img> and *c**j* is the pre-specified weight to accommodate prior knowledge on *j*th level. When there is no prior knowledge on the *J* levels, all *c**j* = 1*/J*. Then under the null hypothesis, the p-value of *T* can be approximated by (1*/*2 − (arctan *T*)*/π*) based on the Cauchy distribution.

## III. An integrative procedure

Consider a statistic *L* = (*WDS*)*T* (*WDS*), where *W* = diag(*w*1, *…, w**J*) ⊗ I*p*, *D* = (I*J*−1, −**1***J*−1)*T* ⊗ I*p*, and *w**j* is a pre-specified weight for *j*th level. When *w**j*’s are all equal, this statistic reduces to a statistic in Liu et al. (2021). Using the relationship that ![Graphic][53]</img>, we can show that ![Graphic][54]</img>, which is invariant to the choice of the reference level. Alternatively, *L* can be written as ![Graphic][55]</img>, where ![Graphic][56]</img> is a weighted version of *Q**j*. Thus, *L* can be seen as an integrative statistic that consists of all the *Q**j*’s. *L* asymptotically follows ![Graphic][57]</img>, where *λ**r* are the eigen values of *WDV D*T*W* T, and and ![Graphic][58]</img> are independent ![Graphic][59]</img> random variables.

We conducted simulation studies to examine the type I errors of these procedures. We considered three levels for the response variable, and generated an adjusting covariate from *N* (0, 1). The regression coefficients for the intercept and the adjusting covariate were set as *γ*1 = (0.3, 1.2)T and *γ*2 = (0.3, 0.9)T. Next we simulated a *p*-vector of mutations with each element generated from a Bernoulli(0.05). To examine the type I error, we set *ξ**j* = **** for *j* = 1, 2 and considered *n ∈* {300, 500, 1000} for *p* = 10, 15. We evaluated the type I error at significance level *α* = 10−3. A total of 106 simulated datasets were generated for each setting. As shown in Table 1, all considered procedures are able to control the type I error. Next, we examined the power of these procedures. We considered two scenarios:

View this table:
[Table 1:](http://medrxiv.org/content/early/2021/04/19/2021.04.13.21255272/T1)

Table 1: 
Empirical type I error (*×*10−3)

1.  60% of *ξ**js*’s were generated from Uniform(0.3, 1.5), and 40% of *ξ**js*’s were generated from Uniform(−1.5, −0.3).

2.  60% of *ξ**js*’s were generated from *N* (0, 1.42), and 40% of *ξ**js*’s were set to 0.

*ξ**js*’s were fixed over all replicates. Each scenario was replicated 104 times. The power for scenarios I and II is summarized in Tables 2 and 3. The Bonferroni procedure has the lowest power, due to its conservativeness in controlling type I error. The integrative procedure properly accounts for the correlations among the *J* statistics, and tends to have the best performance among the considered procedures. Thus, we recommend the integrative procedure for practical data analysis.

View this table:
[Table 2:](http://medrxiv.org/content/early/2021/04/19/2021.04.13.21255272/T2)

Table 2: 
Power for scenario I

View this table:
[Table 3:](http://medrxiv.org/content/early/2021/04/19/2021.04.13.21255272/T3)

Table 3: 
Power for scenario II

In summary, we have shown that the random-effect score test for multinomial logistic regression is not invariant to the choice of the reference level. Our results provide analytical explanation to this issue, and simulation studies confirmed that the choice of the reference level influences the statistical power. We considered several procedures that can yield p-values (or statistics) that are not dependent upon the reference level, and the integrative procedure appears to have a more favorable performance. Overall, our study provides new insights into the random-effect score test for multinomial logistic regression, and will aid in the ongoing study of genetic heterogeneity for complex diseases.

## Supporting information

Supplementary Figure S1 [[supplements/255272_file02.pdf]](pending:yes)

## Data Availability

Data used in this manuscript are simulated.

*   Received April 13, 2021.
*   Revision received April 13, 2021.
*   Accepted April 19, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## REFERENCES

1.  Eckel-Passow, J.E., Decker, P.A., Kosel, M.L., Kollmeyer, T.M., Molinaro, A.M., Rice, T., Caron, A.A., Drucker, K.L., Praska, C.E., Pekmezci, M. and Hansen, H.M. (2019). Using germline variants to estimate glioma and subtype risks. Neurooncology, 21(4), 451–461.
    
    
2.  Liu, M., Liu, Y., Wu, M.C., Hsu, L. and He, Q. (2021). A Method for Subtype Analysis with Somatic Mutations. Bioinformatics, doi: 10.1093/bioinformatics/btaa1090. Online ahead of print.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btaa1090&link_type=DOI) 

3.  Liu, Y. and Xie, J. (2020). Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association, 115(529), 393–402.
    
    
4.  Maity, A., Sullivan, P.F. and Tzeng, J.I. (2012). Multivariate phenotype association analysis by marker-set kernel machine regression. Genetic epidemiology, 36(7), 686–695.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21663&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22899176&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F19%2F2021.04.13.21255272.atom) 

5.  Sun, J., Zheng, Y. and Hsu, L. (2013). A unified mixed-effects model for rare-variant association in sequencing studies. Genetic epidemiology, 37(4), 334–344.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21717&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23483651&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F19%2F2021.04.13.21255272.atom) 

6.  Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1), 82–93.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2011.05.029&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21737059&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F19%2F2021.04.13.21255272.atom)

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: /embed/inline-graphic-3.gif
 [4]: /embed/graphic-1.gif
 [5]: /embed/inline-graphic-4.gif
 [6]: /embed/inline-graphic-5.gif
 [7]: /embed/inline-graphic-6.gif
 [8]: /embed/inline-graphic-7.gif
 [9]: /embed/graphic-2.gif
 [10]: /embed/inline-graphic-8.gif
 [11]: /embed/graphic-3.gif
 [12]: /embed/graphic-4.gif
 [13]: /embed/inline-graphic-9.gif
 [14]: /embed/inline-graphic-10.gif
 [15]: /embed/graphic-5.gif
 [16]: /embed/inline-graphic-11.gif
 [17]: /embed/graphic-6.gif
 [18]: /embed/inline-graphic-12.gif
 [19]: /embed/graphic-7.gif
 [20]: /embed/graphic-8.gif
 [21]: /embed/inline-graphic-13.gif
 [22]: /embed/graphic-9.gif
 [23]: /embed/graphic-10.gif
 [24]: /embed/inline-graphic-14.gif
 [25]: /embed/inline-graphic-15.gif
 [26]: /embed/inline-graphic-16.gif
 [27]: /embed/inline-graphic-17.gif
 [28]: /embed/inline-graphic-18.gif
 [29]: /embed/inline-graphic-19.gif
 [30]: /embed/inline-graphic-20.gif
 [31]: /embed/graphic-11.gif
 [32]: /embed/inline-graphic-21.gif
 [33]: /embed/inline-graphic-22.gif
 [34]: /embed/graphic-12.gif
 [35]: /embed/inline-graphic-23.gif
 [36]: /embed/graphic-13.gif
 [37]: /embed/inline-graphic-24.gif
 [38]: /embed/graphic-14.gif
 [39]: /embed/graphic-15.gif
 [40]: /embed/inline-graphic-25.gif
 [41]: /embed/inline-graphic-26.gif
 [42]: /embed/inline-graphic-27.gif
 [43]: /embed/inline-graphic-28.gif
 [44]: /embed/inline-graphic-29.gif
 [45]: /embed/inline-graphic-30.gif
 [46]: /embed/inline-graphic-31.gif
 [47]: /embed/graphic-16.gif
 [48]: /embed/graphic-17.gif
 [49]: /embed/inline-graphic-32.gif
 [50]: /embed/inline-graphic-33.gif
 [51]: /embed/inline-graphic-34.gif
 [52]: /embed/inline-graphic-35.gif
 [53]: /embed/inline-graphic-36.gif
 [54]: /embed/inline-graphic-37.gif
 [55]: /embed/inline-graphic-38.gif
 [56]: /embed/inline-graphic-39.gif
 [57]: /embed/inline-graphic-40.gif
 [58]: /embed/inline-graphic-41.gif
 [59]: /embed/inline-graphic-42.gif