Downloads
Download MyTxGNN prediction data for your research.
Available Datasets
Knowledge Graph Predictions
| File | Description | Format | Size |
|---|---|---|---|
repurposing_candidates.csv |
KG-based predictions | CSV | ~4.7 MB |
Columns:
license_id: NPRA registration numberbrand_name: Product nameingredient: Active ingredientdrugbank_id: DrugBank identifierdisease_name: Predicted indicationdisease_id: Disease identifiersource: Prediction sourceis_new: New indication flag
Statistics:
- 41,560 predictions
- 508 unique drugs
- 940 unique diseases
DrugBank Mapping
| File | Description | Format | Size |
|---|---|---|---|
drugbank_mapping.csv |
Drug ID mappings | CSV | ~1.8 MB |
Columns:
ingredient: Normalized ingredient namedrugbank_id: DrugBank identifiermatch_score: Mapping confidence
Deep Learning Predictions
| File | Description | Format | Size |
|---|---|---|---|
txgnn_checkpoint.csv |
DL model predictions | CSV | ~756 MB |
Columns:
drugbank_id: DrugBank identifierdrug_name: Drug namedisease_name: Predicted indicationtxgnn_score: Confidence score (0-1)
Statistics:
- 9,968,985 total predictions
- 176,021 high-confidence (score ≥ 0.7)
- 585 unique drugs
- 17,041 unique diseases
How to Download
Via GitHub
All data files are available in the project repository:
git clone https://github.com/yao-care/MyTxGNN.git
cd MyTxGNN/data/processed
Direct Links
FHIR Resources
FHIR R4 formatted resources are also available:
| Resource Type | Count | Location |
|---|---|---|
| MedicationKnowledge | 508 | /fhir/MedicationKnowledge/ |
| ClinicalUseDefinition | 41,560 | /fhir/ClinicalUseDefinition/ |
Access via:
GET /fhir/MedicationKnowledge/{drugbank_id}.json
GET /fhir/ClinicalUseDefinition/{drug}-{indication}.json
Data Dictionary
Prediction Score Interpretation
| Score Range | Meaning |
|---|---|
| 0.95 - 1.00 | Very high confidence |
| 0.90 - 0.95 | High confidence |
| 0.70 - 0.90 | Moderate-high confidence |
| 0.50 - 0.70 | Moderate confidence |
| < 0.50 | Low confidence |
Disease Naming Convention
Disease names follow the TxGNN knowledge graph ontology, which is derived from:
- MONDO Disease Ontology
- Disease Ontology (DO)
- Human Phenotype Ontology (HPO)
Usage Examples
Python
import pandas as pd
# Load KG predictions
kg = pd.read_csv('repurposing_candidates.csv')
# Filter for a specific drug
metformin = kg[kg['ingredient'] == 'METFORMIN']
# Get high-score DL predictions
dl = pd.read_csv('txgnn_checkpoint.csv')
high_conf = dl[dl['txgnn_score'] >= 0.7]
R
library(readr)
# Load KG predictions
kg <- read_csv('repurposing_candidates.csv')
# Filter by drug
prednisolone <- kg %>% filter(ingredient == 'PREDNISOLONE')
# Summary statistics
kg %>% group_by(disease_name) %>% summarise(n = n()) %>% arrange(desc(n))
Terms of Use
By downloading this data, you agree to:
- Attribution: Cite MyTxGNN and TxGNN (Huang et al., 2023) in publications
- Research Use: Data is provided for research purposes
- No Clinical Decisions: Do not use predictions for clinical decisions without validation
- Compliance: Follow applicable data protection regulations
Citation
When using this data, please cite:
@article{huang2023txgnn,
title={A foundation model for clinician-centered drug repurposing},
author={Huang, Kexin and others},
journal={Nature Medicine},
year={2023},
doi={10.1038/s41591-023-02233-x}
}
Disclaimer
Prediction data is provided for research purposes only. Predictions have not been clinically validated and should not be used for medical decisions.
Prediction data is provided for research purposes only. Predictions have not been clinically validated and should not be used for medical decisions.