Data Availability
All data used in the paper, including the rare disease knowledge graph, simulated and MyGene2 cohorts, and the final and intermediate results of the analyses are shared with research community at https://zitniklab.hms.harvard.edu/projects/SHEPHERD. While the Undiagnosed Diseases Network (UDN) dataset cannot be released in its entirety due to privacy concerns, anonymized UDN data has been deposited in dbGaP (accession phs001232) and PhenomeCentral. Phenotypes and causal variants and genes related to UDN diagnoses are also shared publicly in ClinVar at https://www.ncbi. nlm.nih.gov/clinvar/submitters/505999. The UDN study is approved by the NIH IRB Protocol 15HG0130. All patients accepted to the UDN provide written informed consent to share their data across the UDN.
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TZTPFL
https://github.com/mims-harvard/SHEPHERD