Abstract
Hematophagous arthropods serve as crucial vectors for numerous viruses, posing significant public health risks due to their potential for zoonotic spillover. Despite the advances in metagenomics expanding our understanding of arbovirus diversity, traditional phylogenetic approaches often miss the pathogenic potential of viruses not yet identified in humans. Here, we curated two datasets: one with 294 viruses and 36 epidemiological characteristics (including virus properties, vector hosts, and non-vector hosts), and another with 71,622 viral sequences focusing on pathogenic traits. Using these datasets, we developed a regression model and a prediction model to assess and predict viral pathogenicity. Using these datasets, we developed a regression model and a prediction model to assess and predict viral pathogenicity. Our regression model (R2 = 90.6%) reveals a strong correlation between non-vector host diversity, especially within Perissodactyla and Carnivora orders, and virus pathogenicity. The prediction model (F1 score = 96.79%) identifies key pathogenic functions such as “Viral adhesion” and “Host xenophagy” as enhancers of pathogenic potential, while the “Viral invasion” function was associated with an inverse effect. Validation against an external independent dataset confirmed the models’ ability to identify pathogenic viruses and revealed the potential threat posed by Palma and Zaliv Terpeniya viruses, previously undetected in humans. These findings highlight the necessity of integrating predictive models with metagenomic data to provide early warnings of potential zoonotic viruses carried by hematophagous vectors at the strain level, enhancing public health responses and preparedness.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The laboratory conducting this research is funded by grants from the National Key Research and Development Program of China (2019YFC1200501).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵# These authors contributed equally to this work and are listed as co-first authors
This manuscript adds another known prediction model, Zoonotic Rank, for comparison in the section on validating model prediction accuracy; Updated graphs and tables in the text.
Data Availability
The raw dataset supporting the findings of this study is available on Figshare at https://doi.org/10.6084/m9.figshare.22154573.v5. Additionally, the processed data used and/or analyzed during the current study can be obtained by contacting the corresponding author. Requests for access to the processed data will be promptly addressed.