Covid-19 automated diagnosis and risk assessment through Metabolomics and Machine-Learning

Jeany Delafiori; Luiz Claudio Navarro; Rinaldo Focaccia Siciliano; Gisely Cardoso de Melo; Estela Natacha Brandt Busanello; José Carlos Nicolau; Geovana Manzan Sales; Arthur Noin de Oliveira; Fernando Fonseca Almeida Val; Diogo Noin de Oliveira; Adriana Eguti; Luiz Augusto dos Santos; Talia Falcão Dalçóquio; Adriadne Justi Bertolin; João Carlos Cardoso Alonso; Rebeca Linhares Abreu-Netto; Rocio Salsoso; Djane Baía-da-Silva; Vanderson Souza Sampaio; Carla Cristina Judice; Fabio Trindade Maranhão Costa; Nelson Durán; Mauricio Wesley Perroud; Ester Cerdeira Sabino; Marcus Vinicius Guimarães Lacerda; Leonardo Oliveira Reis; Wagner José Fávaro; Wuelton Marcelo Monteiro; Anderson Rezende Rocha; Rodrigo Ramos Catharino

doi:10.1101/2020.07.24.20161828

ABSTRACT

COVID-19 is still placing a heavy health and financial burden worldwide. Impairments in patient screening and risk management play a fundamental role on how governments and authorities are directing resources, planning reopening, as well as sanitary countermeasures, especially in regions where poverty is a major component in the equation. An efficient diagnostic method must be highly accurate, while having a cost-effective profile.

We combined a machine learning-based algorithm with instrumental analysis using mass spectrometry to create an expeditious platform that discriminate COVID-19 in plasma samples within minutes, while also providing tools for risk assessment, to assist healthcare professionals in patient management and decision-making. A cross-sectional study with 728 patients (369 confirmed COVID-19 and 359 controls) was enrolled from three Brazilian epicentres (São Paulo capital, São Paulo countryside and Manaus) in the months of April, May, June and July 2020.

We were able to elect and identify 21 molecules that are related to the disease’s pathophysiology and 26 features to patient’s health-related outcomes. With specificity >97% and sensitivity >83% from blinded data, this screening approach is understood as a tool with great potential for real-world application.

INTRODUCTION

Coronaviruses (CoVs) are enveloped, single-stranded positive RNA viruses from the Coronaviridae family (1). The recent pandemic, caused by a newly discovered strand of coronavirus, SARS-CoV-2, was denominated COVID-19 (2), a disease that disseminated fast and is responsible for hundreds of thousands of deaths worldwide. Measures to control disease spread have led most countries to adopt social distancing and population screening (3). Given its global economic, sanitary and social impact, thousands of new studies aiming to understand viral pathology and targets for virus dissemination control are being conducted, which directly impact in strategies to provide treatments, vaccines, screening tests and patient prognosis.

Special efforts have been directed towards the development of alternatives for mass testing with population-wide capabilities. Currently, available tests are based on the direct detection of

SARS-CoV-2 virus through antigens or RNA amplification (RT-PCR), serological tests to evaluate patient immunity, and the combination of RT-PCR and chest CT (computed-tomography). COVID-19 testing urgency comprises the need for medical decision-making tools for patient’s risk stratification and management, which is poorly achieved by standard methodologies. Even though the basis for these procedures are well-documented in the literature, there are increased concerns about test’s sensitivity and specificity achieved on the field, time and costs associated with procedures, reagents and trained personnel availability, and the testing window (4-6).

Difficulties for an accurate diagnosis of SARS-CoV-2 and patient’s risk categorization are consequences of COVID-19 complexity. SARS-CoV-2 infection pathophysiology reflects in a broad spectrum of patient symptoms, ranging from mild flu-like manifestations, such as fever, cough, and fatigue, to life-threatening acute respiratory distress syndrome (ARDS), vascular dysfunction, and sepsis (2, 7). In an effort to eliminate the pathogen, the body response to SARS-CoV-2 severe pulmonary infection involves the reduction of natural killer (NK) cells, increased pro-inflammatory cytokines (IL-6, IFN-γ, TNFα and others) and lung infiltration, especially by macrophages and monocytes (2, 7, 8), possibly resulting in tissue damage and organ injury (8, 9).

Furthermore, changes in lipid homeostasis, a common characteristic of viral infections, have been associated with SARS-CoV-2 pathology (9-11). In lipidomic and metabolomic profiling of plasma samples, Song et. al. (2020) suggested that exosomes enriched with monosialodihexosyl ganglioside (GM3) are associated with the severity of COVID-19. In the same study, the decrease of circulating acyl-carnitines indicates disturbance in oxidative stress and cellular energy support (9). Moreover, Fan et al. (2020) proposed the relationship between progressive decrease in serum low-density lipoprotein (LDL) and cholesterol within deceased patients (10). Moreover, individual susceptibility to COVID-19 symptoms are not fully understood, thereby hampering any potential outcome prediction.

Panels of biomarkers that translate disease pathophysiology and contribute to SARS-CoV-2 detection may be proposed through “omics” techniques (9, 11, 12). The current trend in associating artificial intelligence-explained algorithms and “omics” techniques has yielded platforms involving machine learning (ML) to analyze mass spectrometry (MS) data, aiming at biomarker identification of diseases, including COVID-19 severity assessment (11, 13). However, applying traditional untargeted mass spectrometry for diagnostic purposes is laborious, since it requires further method development and validation steps (13, 14).

Considering that the testing tool for COVID-19 introduced in this contribution is based on metabolites from actual patients, it may be considered a new approach for SARS-CoV-2 screening. The proposed end-to-end mass spectrometry and machine learning combination aims at predictively identifying and modeling putative biomarkers for COVID-19 identification and risk assessment. This is critical for effective implementation on a real-world setting, adding robustness to the model in spite of variations in the input data; issues due to noise and minor different variations in acquisition conditions will, therefore, not play a major interference in the final output. Therefore, using the potential of MS-ML techniques in COVID-19 fighting (15), we enrolled a cohort of 728 individuals for the development of this independent platform that simultaneously functions as an automated screening test using plasma samples with high specificity and sensitivity, and provides metabolic information related to the presence and severity risk for the disease.

METHODS

Study design and patient recruitment

Participants were recruited from selected sites with proven expertise in research and high volume of patients with COVID-19 to increase data variability: Central Institute of the Clinical Hospitals, University of São Paulo Medical School (localized in São Paulo, capital of the São Paulo State), Sumaré State Hospital (localized in the state of São Paulo inland), and Hospital Delphina Rinaldi Abdel Aziz (localized in Manaus, capital of Amazonas State localized in the North of the country). The study was conducted according to principles expressed in Declaration of Helsinki and approved by local Ethics Committees (CAAE 32077020.6.0000.0005, CAAE 31049320.7.1001.5404 and CAAE 30299620.7.0000.0068). Inclusion criteria for COVID-19 group (CV) were adult patients with one or more clinical symptoms of SARS-CoV-2 infection in the last seven days (fever, dry cough, malaise and/or dyspnea) and positive SARS-CoV-2 RT-PCR in nasopharyngeal samples, following local hospital testing protocols based on Charité protocol and WHO recommendations (16). A control group (CT) was formed by symptomatic RT-PCR-negative participants (SN) with SARS-CoV-2 discarded by clinical and tomographic picture, and non-infected controls (AS).

In this study, 728 participants were included, classified according to symptoms, RT-PCR testing results and respective risk (Figure 1a). CV was composed of 487 plasma samples from 369 symptomatic SARS-CoV-2 confirmed cases upon hospital arrival, and 118 samples representing a second collection from hospitalized patients (median 11 days, SD 3.8) that recovered (R) or deceased (D). The high-risk group (HRSP) comprised patients with moderate and severe symptoms that required hospitalization (n = 197) and the low-risk (LRSP) category (n = 172) contained those with mild symptoms redirected to home care. Gender, age, and fasting restrictions were not applied, to simulate real-world conditions and to provide results with no patient bias. CT group was formed by 29 SN and 330 AS, totaling 359 individuals Table S1 (supplementary material) shows detailed demographic information and participant breakdown.

Figure 1

End to end process for putative biomarkers determination and diagnosis test generation. a) Based on clinical symptoms and diagnosis results, subjects were grouped in low-risk symptomatic positive (LRSP), high-risk symptomatic positive (HRSP), recovered (R) or deceased (D), symptomatic negative (SN) and asymptomatic negative (AN). Samples were prepared injected in a high-resolution mass spectrometry (HR-MS) equipment for data acquisition and datasets generated for data analysis according to the partitions; b) Sequential steps of machine learning data analysis and metabolomics biomarkers determination were followed for diagnosis model generation and deployment.

Mass spectrometry sample preparation

Plasma samples from peripheral venous blood were frozen at -80°C until analysis. A 20-µL aliquot of each participant plasma was diluted in 200 µL of tetrahydrofuran, followed by homogenization for 30 seconds at room temperature. Thus, 780 µL of methanol was added followed by a second homogenization for 30 seconds and centrifugation for 5 min, 3400 x rpm at 4°C. An aliquot of 5 µL of the supernatant was diluted in 495 µL of methanol and positively ionized by the addition of formic acid (0·1% final concentration) prior to direct infusion in a high-resolution mass spectrometer.

Mass spectrometry analysis and biomarker elucidation

Samples from CT and CV groups were randomized for data acquisition intra- and inter-daily. Samples were directly infused in a HESI-Q-Orbitrap®-MS (Thermo Scientific, Bremen, Germany) and scanned with 140,000 FWHM of mass resolution on positive ion mode. MS parameters were set as follows: m/z range 150-1,700, 10 mass spectral acquisition per sample, sheath gas flow rate five units, capillary temperature 320°C, aux gas heater temperature 33°C, spray voltage 3·70 kV, automatic gain control (AGC) at 1 × 10⁶, S-lens RF level 50, and injection time < 2 ms. After machine leaning modeling, the presence of each discriminant m/z determined by the algorithm was confirmed in mass spectra using Xcalibur 3.0 software (Thermo, Bremen, Germany). Molecule identification was proposed using METLIN (Scripps Center for Metabolomics, https://metlin.scripps.edu), HMDB (Human Metabolome Database, http://www.hmdb.ca/) and LIPIDMAPS (Lipidomics Gateway, https://lipidmaps.org) databases and literature search with mass accuracy ≤ 5 ppm.

Biomarker pathway analysis and meaning were attributed based on Kegg database (Kyoto Encyclopedia of Genes and Genomes, https://www.genome.jp/kegg/) information and scientific literature.

Machine learning data analysis

The MS-ML platform presented in this study for COVID-19 automated diagnosis and risk determination consists of two primary data analysis phases. The first phase comprises developing a machine-learning model (ML) using a classification algorithm over MS data to determine potential m/z biomarkers for diagnosis and risk determination. The second phase entails a prediction model for diagnosing and determining a high-risk versus low-risk program, which will be used for individuals screening in the field.

Data processing is divided into the sequential steps described in Figure 1. First, mass spectrometric data are pre-processed for ion annotation (intensity, width, resolution, and m/z values), alignment, normalization, and denoising. Three different partitions of resulting data are segregated according to the best practices of machine learning, consisting of a fitting partition (training and validation, shuffled in all ten rounds of ML experiments), test partition, and blind test. The final classification results are reported using the blind partition (see process in Figure 1a).

The most discriminant features are determined using the ML algorithms (ADA Tree Boosting (ADA), Gradient Tree Boosting (GDB), Random Forest (RF), and Extreme Random Forest (XRF), which are based on decision trees. In addition, we also explored the Partial Least Squares (PLS) method, which is a linear space transformation (17-19)), in which a recursive fitting is applied to training and validation data (see Figure 1b), with the annotation of averaging and computing the related standard deviation of selected performance metrics.

In all experiments, we adopted the performance metrics defined in Table S2 (supplementary material) for each round of validation (optimized through accuracy, F1score, MCC). After the observation of performance metrics versus ranked features length, discriminant m/z features are evaluated through ΔJ importance (see Table S2 and Figure 2a) and selected for metabolomics biomarkers identification (see section Mass Spectrometry Analysis and Biomarker Elucidation). The marker importance is given by a cumulative distribution function (CDF) analysis: for a specific m/z, a CDF of the feature values for the negative samples (CT group) is compared with the CDF of positive samples (CV group) used in the fitting partition.

Figure 2

Putative biomarkers elucidation and related class/ pathways. a) Recursive fitting of mass spectra data followed by model optimization processes allowed the determination of putative biomarkers ranked by DeltaJ importance and group contribution. b) Proposed role of identified biomarkers in COVID-19 pathophysiology. Abbreviations: ARDS – acute respiratory distress syndrome, COX-2 – cyclooxygenase-2, DeoxyGU, deoxyguanosine, LPCAT1 - lysophosphatidylcholine acyltransferase 1, LysoPC – lysophosphatidylcholine, PC – phosphatidylcholine, PLA₂ – Phospholipase A₂

The CDF comparison uses first the Kolmogorov-Smirnov (KS-test) two samples equality hypothesis test to determine whether those distributions are different (failed on equality hypothesis). Then the ⍰J metric defined in Table S2 is used to determine if the features contribute negatively ⍰J < 0, which means the negative samples (CT) have a higher probability of presenting higher values over the median of negatives, or positively ⍰J < 0, which means that the positive samples (CV) have a higher probability of presenting higher values over the median of positives. Features are discarded if CDFs are equal according to KS-test or ⍰J = 0. The selected biomarkers undergo a second round of training and validation with the algorithms mentioned above with the development software (Figure 1b, see testing results in Tables 2 and 3). As putative biomarkers are validated through the development process, they are submitted to the second phase of the machine-learning process targeting the final model to deliver an applied untargeted metabolomics diagnosis software. In this phase, a pairwise model is created (Figure 1b), where the relationship between the putative biomarkers are used instead of their intensity (or relative abundance) provided in each spectrum.

RESULTS

COVID-19 testing through MS-ML platform: modeling and performance

The full dataset resulting from the spectrometer acquisition has 846 biological samples, with ten replicates each, on average. Table 1 shows the data preparation for the fitting process (shuffled in 10 rounds of training and validation), and testing.

View this table:

Table 1

Dataset subdivisions for model fitting (training and validation), testing and blind test for COVID-19 diagnosis and risk assessment.

In this study, we employed a novel sequential processing of metabolomics data with Machine-Learning algorithms for building a model divided into two phases. First, a predictive modeling for putative biomarker identification. Then, a combination of biomarker features into relative pairs, composing the predictive model used by the diagnosis and risk assessment in the field (recursive fitting shown in Figure 1b).

The analysis for diagnosis was performed with the full dataset, while the risk assessment relied on 369 COVID-19 positive subjects, as this is a second-stage analysis. Out of the COVID-19 positive subjects, 197 achieved local clinical criteria for hospitalization while the remaining 172 individuals were forwarded to homecare. Tables 2 and 3 show results for the pairwise features for COVID-19 automated diagnosis and risk assessment classifiers, respectively. The best results were obtained with Gradient Tree Boosting (GDB): COVID-19 automated diagnosis with 97.6% of specificity and 83.8% of sensitivity, and risk assessment with 76.2% of specificity and 87.2% of sensitivity, both in the blind test.

View this table:

Table 2

Performance metrics for diagnostics model using pairwise features on the 10 validation tests with 6 different classifier algorithms for Covid-19 positive/negative diagnosis, final development testing and deployed software blind test. Numbers correspond to individual’s classification average and standard deviations in parenthesis.

View this table:

Table 3

Performance metrics for the risk assessment model using pairwise features on the 10 validation tests with six different classifier algorithms, final development testing, and deployed software blind test. Numbers correspond to individual’s classification average and standard deviations in parenthesis.

Panel of discriminant metabolites for COVID-19 patients using untargeted metabolomics

Thirty ions were selected by the ML method and used for COVID-19 diagnosis using the introduced pairwise model (see Table 3 for metrics) and further validated through mass spectrometric data. From those, we proposed 21 discriminant biomarkers for COVID-19 condition, divided into ten with positive (mean values higher for the positive group) and 11 with a negative contribution to the condition. Out of 21 molecules, eight belong to the glycerophospholipid class, three glycerolipids, three fatty acids, two cholesterol derivatives, one purine metabolite, one prostanoid, one plasmalogen, and two unknown peptides. The remaining ten molecules have not yet been identified, a common element of non-targeted metabolomics (14). Valid biomarkers and unknown features are available in Table 4.

View this table:

Table 4

Proposed biomarkers to m/z discriminant features elected by Machine Learning algorithm group first by model contribution (COVID-19 diagnosis and risk assessment), followed by metabolic function and deltaJ.

For risk assessment, 26 ions were used to achieve the metrics displayed in Table 4. Among them, nine biomarkers contributed to the COVID-19 higher risk condition and 17 biomarkers contributed to lower risk. The main findings shown in Table 4 pointed to a relative reduction of certain species of lysophosphatidylcholine (LysoPC), phospholipids, cholesteryl ester (CE) and triacylglycerols (TG) in moderate/severe cases in comparison to patients with mild symptoms (Figure 2a). In Table 4 the biomarkers were first grouped by type of contribution, followed by metabolic class/function and importance reflected through ⍰J metric. A representation of biomarkers class and ΔJ metrics are displayed in Figure 2a.

DISCUSSION

MS-ML elected biomarkers and COVID-19 pathophysiology

The use of AI-explained algorithms allowed us to create reliable models that facilitate decision-making in clinics and the investigation of the pathophysiological meaning of the distinct biomarker’s levels. Viral recognition is an essential step for initial host immune response, and the rapid course and cytokine storm associated with SARS-CoV infection may be involved with the guanosine- and uridine-rich (GU) single-strand RNA potential role as PAMP (pathogen-associated molecular patterns) (1). Deoxyguanosine [268·1050, [M+H]⁺), a metabolite from purine metabolism (Kegg hsa00230), triggers the enhanced signalling of TLR7 in the presence of ssRNA, inducing cytokine secretion in macrophages (20). Therefore, further investigations are required to understand the potential role of deoxyguanosine in SARS-CoV-2 immune hyperactivation and pathology.

The main lipidic findings pointed to a remodelling of glycerophospholipid metabolism. We identified enhanced presence of phosphatidylglycerol (PG) [PG(35:4), PG(35:1), PG(33.1)] and phosphatidylethanolamine (PE) [PE(38:4)], and a diminishment of lysophosphatidylcholines (LysoPC) [LysoPC(16:0), LysoPC(16:1), LysoPC(18:0), LysoPC(18:2)] and phospatidylserine plasmalogens (PS-PL) (21) [PS(O-36:2) and/or PS(P-36:1)] in COVID-19 positive patients, as illustrated in Figure 2a by glycerophospholipid pathway recurrence.

LysoPCs [LysoPC(16:0) and LysoPC(18:2)] were also found as negative contributors in plasma samples from patients who required hospitalization (moderate and severe cases). Cell responses to various stimuli may be mediated by phospholipids, which actively participates in inflammation processes. The relative intensities decrease of Lysophosphatidylcholines in positive and, and some of them, in moderate to severely-ill patients, are in accordance with recent studies of metabolic changes in acute respiratory distress syndrome (ARDS) and sepsis (22, 23), important characteristics of COVID-19 severity (2, 7).

LysoPC is formed through the cleavage of PC mediated by phospholipase A₂, (PLA₂), whose modulation has a crucial role in inflammation processes (see LysoPCs’ related pathways in Figure 2b). PLA₂ up-regulation promotes fatty acids formation, precursors of eicosanoids, and LysoPCs (24). Data show that SARS-CoV nucleocapsid protein stimulates the expression of Ciclooxygenase-2 (COX-2), an essential enzyme in the catalyses of prostanoids production from fatty acids, as those found at m/z 407.1821 in positive group (25). Although we identified an ion correlated to eicosanoid biosynthesis that indicates PLA₂ and COX-2 activity in positive patients, LysoPCs were relatively decreased in this group. The availability of LysoPCs is also finely regulated by the acyltransferase activity of LCAT (Lysophosphatidylcholine Acyltransferase 1), which may promote the restoration of PCs via Lands cycle. The most abundant lipid species found in alveolar surfactant formed by LCAT1 activity over LysoPC is Dipalmitoylphosphatidylcholine (DPPC, PC(16:0/16:0)). This molecule corresponds to 70-80% of surfactant lipid composition, and the dysregulation of surfactant film is directly related to lung injury and ARDS (24). Since DPPC formation is dependent on the availability of lipid substrates and the Lands cycle functioning, interferences in this process may disturb LysoPC availability. In a metabolomic study, Ferrarini et al (2017) described a decrease in LysoPC species and increased MG(18:1) [m/z 379.2807] in serum of patients with ARDS derived from Influenza infection and sepsis, reinforcing our findings (22).

Moreover, COVID-19 pathophysiology seems to impair cholesterol homeostasis (9, 10). We found cholesteryl ester (CE) associated with mild symptoms, which was similarly reported by Song et. al (2020). They demonstrated the correlation between CE abundance and BMP(38:5), a lipid that influences cellular exportation of cholesterol from endosomes. During recovering progression, it was found an increased alveolar macrophages BMP with enhanced CEs (9). Cholesterol and LDL (low-density lipoprotein) lowering was also observed in clinical practice associated with COVID-19 poor prognosis (10), such as triacylglycerol in ARDS (26).

Herein, based on the proposed m/z ions we discriminated COVID-19 patients using a diagnostic and risk assessment classifier generated from a MS-ML combination. Although the proposed biomarkers correlates COVID-19 pathophysiology to the mathematical process, a more comprehensive biomarker evaluation is needed to better understand their contribution to COVID-19, and identify the unknowns.

Use of untargeted metabolomics and ML for automated COVID-19 diagnosis and risk assessment

The combination of artificial intelligence algorithms for biomarker mining in complex data is a common approach for problem-solving and implementing new technologies in health sciences. The use of machine learning as a mean for the discrimination of diseases from mass spectrometric data aims to develop diagnostic and prognostic biomarkers, treatment targets and patient management systems (13).

Our methodology introduced the pairwise m/z analysis, an essential advance in untargeted metabolomics application. By combining different m/z, this approach supports the spectra acquired by different mass spectrometers, including the robust use of flow-injection mass spectrometry (FI-MS), in an effort to overcome the ion competition effect (27).

The model optimization with pairwise features can be easily transferred to an independent diagnosis platform. Given that the process key is available from biological sample “ion-fishing”, this approach does not require chromatography and biomarkers quantitation for independent diagnosis. Moreover, the proposed MS-ML platform for COVID-19 presented reliable qualitative results, with specificity of 97·6% and sensitivity of 83·8% (in a blind test data), similar or even better in performance when compared to available serology (5) and RT-PCR methods (6). Our analysis also brings molecular information about disease pathophysiology that may aid in prognostic markers and treatment targets for COVID-19. Overall, our test aggregates, in one solution, an alternative for populational COVID-19 screening and guidance for public health efforts through risk classification. The same approach may be applied to other diseases involved with patient management during the pandemic and contribute to the COVID-19 MS Coalition’s collective effort (15) by consolidating the combination of mass spectrometry and artificial intelligence in a real-world setting.

FUNDING

This work was supported by São Paulo Research Foundation (FAPESP) [2019/05718-3 to J.D., 2018/10052-1 to W.J.F., 2020/04705-2 to R.F.S., 2020/05369-6 to F.M.T.C., and 2020/04305-2 to J.C.N. and T.F.D.], Amazonas State Government, Superintendence of the Manaus Free Trade Zone (SUFRAMA), Coordination for the Improvement of Higher Education Personnel (CAPES), Department of Science and Technology (DECIT) -Brazilian Ministry of Health (MS), Ministry of Science, Technology and Innovation - National Council for Scientific and Technological Development (CNPq) [grant 403253/2020 to M.V.G.L.]

AUTHOR CONTRIBUTIONS

R.S.F., J.C.N., T.F.D., A.J.B., E.C.S., L.O.R., W.J.F., M.W.P.J., M.V.G.L., G.C.M., W.M.M., F.F.A.V., D.B.S., V.S.S., J.D., and R.R.C were involved with study design and ethics approval. R.S.F., T.F.D., A.J.B., R.S., E.C.S., A.E., L.O.R., W.J.F., N.D., L.A.S, J.C.C.A., M.V.G.L., G.C.M., W.M.M., F.F.A.V., R.L.A.N., D.B.S. and V.S.S. contributed to patient data acquisition and analysis, clinical support and network feasibility. L.C.N., J.D., A.R.R. and R.R.C., conceived and developed the MS-ML method. J.D., E.N.BB., G.M.S., A.N.O., D.N.O. performed mass spectrometry experiments and data interpretation. J.D., L.C.N., R.F.S. prepared tables, figures and wrote the manuscript. E.N.B.B., G.M.S., A.N.O., D.N.O., F.T.M.C., N.D., W.F.J., L.O.R., D.B.S., F.F.A.V., W.M.M., G.C.M., J.C.N., A.R.R. and R.R.C. revised the manuscript. RRC idealized the project and managed the research group. All authors read and approved the manuscript.

ADDITIONAL INFORMATION

Competing Interests

The authors declare no competing interests.

Data availability

The data generated in this study will be anonymized to attend patient’s privacy restrictions and will be available from corresponding authors upon request after publication.

ACKNOWLEDGMENTS

The authors would like to thank the network involved in sample collection, clinical and diagnosis support, and Thermo Scientific and LADETEC (UFRJ) for technology support.

REFERENCES

1.↵
Li Y, Chen M, Cao H, Zhu Y, Zheng J, Zhou H. Extraordinary GU-rich single-strand RNA identified from SARS coronavirus contributes an excessive innate immune response. Microbes and infection. 2013;15(2):88–95.
OpenUrl Google Scholar
2.↵
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506.
OpenUrl CrossRef PubMed Google Scholar
3.↵
Coronavirus Disease (COVID-19) Dashboard [Internet]. World Health Organization. 2020 [cited 2020/7/4]. Available from: https://covid19.who.int/.
Google Scholar
4.↵
La Marca A, Capuzzo M, Paglia T, Roli L, Trenti T, Nelson SM. Testing for SARS-CoV-2 (COVID-19): a systematic review and clinical guide to molecular and serological in-vitro diagnostic assays. Reproductive BioMedicine Online. 2020.
Google Scholar
5.↵
Döhla M, Boesecke C, Schulte B, Diegmann C, Sib E, Richter E, et al. Rapid point-of-care testing for SARS-CoV-2 in a community screening setting shows low sensitivity. Public health. 2020.
Google Scholar
6.↵
Li Y, Yao L, Li J, Chen L, Song Y, Cai Z, et al. Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19. Journal of medical virology. 2020.
Google Scholar
7.↵
Li H, Liu L, Zhang D, Xu J, Dai H, Tang N, et al. SARS-CoV-2 and viral sepsis: observations and hypotheses. The Lancet. 2020.
Google Scholar
8.↵
Zhang W, Zhao Y, Zhang F, Wang Q, Li T, Liu Z, et al. The use of anti-inflammatory drugs in the treatment of people with severe coronavirus disease 2019 (COVID-19): The experience of clinical immunologists from China. Clinical Immunology. 2020:108393.
Google Scholar
9.↵
Song J-W, Lam SM, Fan X, Cao W-J, Wang S-Y, Tian H, et al. Omics-driven systems interrogation of metabolic dysregulation in COVID-19 pathogenesis. Cell Metabolism. 2020.
Google Scholar
10.↵
Fan J, Wang H, Ye G, Cao X, Xu X, Tan W, et al. Low-density lipoprotein is a potential predictor of poor prognosis in patients with coronavirus disease 2019. Metabolism. 2020;In press:154243.
Google Scholar
11.↵
Shen B, Yi X, Sun Y, Bi X, Du J, Zhang C, et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell. 2020.
Google Scholar
12.↵
Ihling C, Tänzler D, Hagemann S, Kehlen A, Hüttelmaier S, Arlt C, et al. Mass spectrometric identification of SARS-CoV-2 proteins from gargle solution samples of COVID-19 patients. Journal of Proteome Research. 2020.
Google Scholar
13.↵
Liebal UW, Phan AN, Sudhakar M, Raman K, Blank LM. Machine Learning Applications for Mass Spectrometry-Based Metabolomics. Metabolites. 2020;10(6):243.
OpenUrl Google Scholar
14.↵
Naz S, Vallejo M, García A, Barbas C. Method validation strategies involved in non-targeted metabolomics. Journal of Chromatography A. 2014;1353:99–105.
OpenUrl PubMed Google Scholar
15.↵
Struwe W, Emmott E, Bailey M, Sharon M, Sinz A, Corrales FJ, et al. The COVID-19 MS Coalition—accelerating diagnostics, prognostics, and treatment. The Lancet. 2020.
Google Scholar
16.↵
Corman V, Bleicker T, Brünink S, Drosten C, Zambon M. Diagnostic detection of 2019-nCoV by real-time RT-PCR. World Health Organization, Jan. 2020;17.
Google Scholar
17.↵
Bishop CM. Pattern recognition and machine learning: springer; 2006.
Google Scholar
18.
Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems: O’Reilly Media; 2019.
Google Scholar
19.↵
Murphy K. Machine learning: a probabilistic perspective. Cambridge, Mass.[ua]. MIT Press; 2013.
Google Scholar
20.↵
Davenne T, Bridgeman A, Rigby RE, Rehwinkel J. Deoxyguanosine is a TLR7 agonist. European journal of immunology. 2020;50(1):56–62.
OpenUrl Google Scholar
21.↵
Ivanova PT, Milne SB, Brown HA. Identification of atypical ether-linked glycerophospholipid species in macrophages by mass spectrometry. Journal of Lipid Research. 2010;51(6):1581–90.
OpenUrl Abstract/FREE Full Text Google Scholar
22.↵
Ferrarini A, Righetti L, Martínez MP, Fernández-López M, Mastrangelo A, Horcajada JP, et al. Discriminant biomarkers of acute respiratory distress syndrome associated to H1N1 influenza identified by metabolomics HPLC-QTOF-MS/MS platform. Electrophoresis. 2017;38(18):2341–8.
OpenUrl Google Scholar
23.↵
Drobnik W, Liebisch G, Audebert F-X, Fröhlich D, Glück T, Vogel P, et al. Plasma ceramide and lysophosphatidylcholine inversely correlate with mortality in sepsis patients. Journal of lipid research. 2003;44(4):754–61.
OpenUrl Abstract/FREE Full Text Google Scholar
24.↵
Cañadas O, Olmeda B, Alonso A, Pérez-Gil J. Lipid–Protein and Protein–Protein Interactions in the Pulmonary Surfactant System and Their Role in Lung Homeostasis. International Journal of Molecular Sciences. 2020;21(10):3708.
OpenUrl Google Scholar
25.↵
Yan X, Hao Q, Mu Y, Timani KA, Ye L, Zhu Y, et al. Nucleocapsid protein of SARS-CoV activates the expression of cyclooxygenase-2 by binding directly to regulatory elements for nuclear factor-kappa B and CCAAT/enhancer binding protein. The international journal of biochemistry & cell biology. 2006;38(8):1417–28.
OpenUrl Google Scholar
26.↵
Maile MD, Standiford TJ, Engoren MC, Stringer KA, Jewell ES, Rajendiran TM, et al. Associations of the plasma lipidome with mortality in the acute respiratory distress syndrome: a longitudinal cohort study. Respiratory research. 2018;19(1):1–8.
OpenUrl Google Scholar
27.↵
Sarvin B, Lagziel S, Sarvin N, Mukha D, Kumar P, Aizenshtein E, et al. Fast and sensitive flow-injection mass spectrometry metabolomics by analyzing sample-specific ion distributions. Nature Communications. 2020;11(1):1–11.
OpenUrl Google Scholar

Posted July 27, 2020.

Download PDF

Author Declarations

Supplementary Material

Data/Code

Citation Tools

Get QR code

Tweet Widget

COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv

Subject Area

Infectious Diseases (except HIV/AIDS)

Subject Areas

All Articles

Addiction Medicine (408)
Allergy and Immunology (721)
Anesthesia (214)
Cardiovascular Medicine (3047)
Dentistry and Oral Medicine (345)
Dermatology (258)
Emergency Medicine (456)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1080)
Epidemiology (12965)
Forensic Medicine (12)
Gastroenterology (849)
Genetic and Genomic Medicine (4780)
Geriatric Medicine (441)
Health Economics (747)
Health Informatics (3032)
Health Policy (1097)
Health Systems and Quality Improvement (1117)
Hematology (405)
HIV/AIDS (953)
Infectious Diseases (except HIV/AIDS) (14281)
Intensive Care and Critical Care Medicine (873)
Medical Education (450)
Medical Ethics (119)
Nephrology (492)
Neurology (4553)
Nursing (243)
Nutrition (673)
Obstetrics and Gynecology (838)
Occupational and Environmental Health (760)
Oncology (2367)
Ophthalmology (667)
Orthopedics (264)
Otolaryngology (332)
Pain Medicine (297)
Palliative Medicine (86)
Pathology (512)
Pediatrics (1232)
Pharmacology and Therapeutics (516)
Primary Care Research (516)
Psychiatry and Clinical Psychology (3916)
Public and Global Health (7141)
Radiology and Imaging (1589)
Rehabilitation Medicine and Physical Therapy (945)
Respiratory Medicine (941)
Rheumatology (456)
Sexual and Reproductive Health (469)
Sports Medicine (396)
Surgery (506)
Toxicology (63)
Transplantation (218)
Urology (188)

Comments

medRxiv aims to provide a venue for anyone to comment on a medRxiv preprint. Comments are moderated for offensive or irrelevant content (this can take ~24 h). Please avoid duplicate submissions and read our Comment Policy before commenting. The content of a comment is not endorsed by medRxiv.

Community Reviews

medRxiv aims to inform readers about online discussion of this preprint occurring elsewhere. The content at the links below is not endorsed by either medRxiv or the preprint's authors.

Community reviews for this article:

There are no community reviews for this paper.

Automated Evaluations

Certain services provide automated analysis of preprints. Analyses invited by the authors are displayed at the top of this tab. Those done independently of authors are shown underneath . None of these analyses is endorsed by medRxiv.

Automated Evaluations:

There are no automated evaluations for this paper.

[1] 1.↵
Li Y, Chen M, Cao H, Zhu Y, Zheng J, Zhou H. Extraordinary GU-rich single-strand RNA identified from SARS coronavirus contributes an excessive innate immune response. Microbes and infection. 2013;15(2):88–95.
OpenUrl Google Scholar

[2] 2.↵
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506.
OpenUrl CrossRef PubMed Google Scholar

[3] 3.↵
Coronavirus Disease (COVID-19) Dashboard [Internet]. World Health Organization. 2020 [cited 2020/7/4]. Available from: https://covid19.who.int/.
Google Scholar

[4] 4.↵
La Marca A, Capuzzo M, Paglia T, Roli L, Trenti T, Nelson SM. Testing for SARS-CoV-2 (COVID-19): a systematic review and clinical guide to molecular and serological in-vitro diagnostic assays. Reproductive BioMedicine Online. 2020.
Google Scholar

[5] 5.↵
Döhla M, Boesecke C, Schulte B, Diegmann C, Sib E, Richter E, et al. Rapid point-of-care testing for SARS-CoV-2 in a community screening setting shows low sensitivity. Public health. 2020.
Google Scholar

[6] 6.↵
Li Y, Yao L, Li J, Chen L, Song Y, Cai Z, et al. Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19. Journal of medical virology. 2020.
Google Scholar

[7] 7.↵
Li H, Liu L, Zhang D, Xu J, Dai H, Tang N, et al. SARS-CoV-2 and viral sepsis: observations and hypotheses. The Lancet. 2020.
Google Scholar

[8] 8.↵
Zhang W, Zhao Y, Zhang F, Wang Q, Li T, Liu Z, et al. The use of anti-inflammatory drugs in the treatment of people with severe coronavirus disease 2019 (COVID-19): The experience of clinical immunologists from China. Clinical Immunology. 2020:108393.
Google Scholar

[9] 9.↵
Song J-W, Lam SM, Fan X, Cao W-J, Wang S-Y, Tian H, et al. Omics-driven systems interrogation of metabolic dysregulation in COVID-19 pathogenesis. Cell Metabolism. 2020.
Google Scholar

[10] 10.↵
Fan J, Wang H, Ye G, Cao X, Xu X, Tan W, et al. Low-density lipoprotein is a potential predictor of poor prognosis in patients with coronavirus disease 2019. Metabolism. 2020;In press:154243.
Google Scholar

[11] 11.↵
Shen B, Yi X, Sun Y, Bi X, Du J, Zhang C, et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell. 2020.
Google Scholar

[12] 12.↵
Ihling C, Tänzler D, Hagemann S, Kehlen A, Hüttelmaier S, Arlt C, et al. Mass spectrometric identification of SARS-CoV-2 proteins from gargle solution samples of COVID-19 patients. Journal of Proteome Research. 2020.
Google Scholar

[13] 13.↵
Liebal UW, Phan AN, Sudhakar M, Raman K, Blank LM. Machine Learning Applications for Mass Spectrometry-Based Metabolomics. Metabolites. 2020;10(6):243.
OpenUrl Google Scholar

[14] 14.↵
Naz S, Vallejo M, García A, Barbas C. Method validation strategies involved in non-targeted metabolomics. Journal of Chromatography A. 2014;1353:99–105.
OpenUrl PubMed Google Scholar

[15] 15.↵
Struwe W, Emmott E, Bailey M, Sharon M, Sinz A, Corrales FJ, et al. The COVID-19 MS Coalition—accelerating diagnostics, prognostics, and treatment. The Lancet. 2020.
Google Scholar

[16] 16.↵
Corman V, Bleicker T, Brünink S, Drosten C, Zambon M. Diagnostic detection of 2019-nCoV by real-time RT-PCR. World Health Organization, Jan. 2020;17.
Google Scholar

[17] 17.↵
Bishop CM. Pattern recognition and machine learning: springer; 2006.
Google Scholar

[18] 18.
Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems: O’Reilly Media; 2019.
Google Scholar

[19] 19.↵
Murphy K. Machine learning: a probabilistic perspective. Cambridge, Mass.[ua]. MIT Press; 2013.
Google Scholar

[20] 20.↵
Davenne T, Bridgeman A, Rigby RE, Rehwinkel J. Deoxyguanosine is a TLR7 agonist. European journal of immunology. 2020;50(1):56–62.
OpenUrl Google Scholar

[21] 21.↵
Ivanova PT, Milne SB, Brown HA. Identification of atypical ether-linked glycerophospholipid species in macrophages by mass spectrometry. Journal of Lipid Research. 2010;51(6):1581–90.
OpenUrl Abstract/FREE Full Text Google Scholar

[22] 22.↵
Ferrarini A, Righetti L, Martínez MP, Fernández-López M, Mastrangelo A, Horcajada JP, et al. Discriminant biomarkers of acute respiratory distress syndrome associated to H1N1 influenza identified by metabolomics HPLC-QTOF-MS/MS platform. Electrophoresis. 2017;38(18):2341–8.
OpenUrl Google Scholar

[23] 23.↵
Drobnik W, Liebisch G, Audebert F-X, Fröhlich D, Glück T, Vogel P, et al. Plasma ceramide and lysophosphatidylcholine inversely correlate with mortality in sepsis patients. Journal of lipid research. 2003;44(4):754–61.
OpenUrl Abstract/FREE Full Text Google Scholar

[24] 24.↵
Cañadas O, Olmeda B, Alonso A, Pérez-Gil J. Lipid–Protein and Protein–Protein Interactions in the Pulmonary Surfactant System and Their Role in Lung Homeostasis. International Journal of Molecular Sciences. 2020;21(10):3708.
OpenUrl Google Scholar

[25] 25.↵
Yan X, Hao Q, Mu Y, Timani KA, Ye L, Zhu Y, et al. Nucleocapsid protein of SARS-CoV activates the expression of cyclooxygenase-2 by binding directly to regulatory elements for nuclear factor-kappa B and CCAAT/enhancer binding protein. The international journal of biochemistry & cell biology. 2006;38(8):1417–28.
OpenUrl Google Scholar

[26] 26.↵
Maile MD, Standiford TJ, Engoren MC, Stringer KA, Jewell ES, Rajendiran TM, et al. Associations of the plasma lipidome with mortality in the acute respiratory distress syndrome: a longitudinal cohort study. Respiratory research. 2018;19(1):1–8.
OpenUrl Google Scholar

[27] 27.↵
Sarvin B, Lagziel S, Sarvin N, Mukha D, Kumar P, Aizenshtein E, et al. Fast and sensitive flow-injection mass spectrometry metabolomics by analyzing sample-specific ion distributions. Nature Communications. 2020;11(1):1–11.
OpenUrl Google Scholar

Covid-19 automated diagnosis and risk assessment through Metabolomics and Machine-Learning

ABSTRACT

INTRODUCTION