Downloads

Download MyTxGNN prediction data for your research.

Available Datasets

Knowledge Graph Predictions

File	Description	Format	Size
`repurposing_candidates.csv`	KG-based predictions	CSV	~4.7 MB

Columns:

license_id: NPRA registration number
brand_name: Product name
ingredient: Active ingredient
drugbank_id: DrugBank identifier
disease_name: Predicted indication
disease_id: Disease identifier
source: Prediction source
is_new: New indication flag

Statistics:

41,560 predictions
508 unique drugs
940 unique diseases

DrugBank Mapping

File	Description	Format	Size
`drugbank_mapping.csv`	Drug ID mappings	CSV	~1.8 MB

Columns:

ingredient: Normalized ingredient name
drugbank_id: DrugBank identifier
match_score: Mapping confidence

Deep Learning Predictions

File	Description	Format	Size
`txgnn_checkpoint.csv`	DL model predictions	CSV	~756 MB

Columns:

drugbank_id: DrugBank identifier
drug_name: Drug name
disease_name: Predicted indication
txgnn_score: Confidence score (0-1)

Statistics:

9,968,985 total predictions
176,021 high-confidence (score ≥ 0.7)
585 unique drugs
17,041 unique diseases

How to Download

Via GitHub

All data files are available in the project repository:

git clone https://github.com/yao-care/MyTxGNN.git
cd MyTxGNN/data/processed

Direct Links

FHIR Resources

FHIR R4 formatted resources are also available:

Resource Type	Count	Location
MedicationKnowledge	508	`/fhir/MedicationKnowledge/`
ClinicalUseDefinition	41,560	`/fhir/ClinicalUseDefinition/`

Access via:

GET /fhir/MedicationKnowledge/{drugbank_id}.json
GET /fhir/ClinicalUseDefinition/{drug}-{indication}.json

Data Dictionary

Prediction Score Interpretation

Score Range	Meaning
0.95 - 1.00	Very high confidence
0.90 - 0.95	High confidence
0.70 - 0.90	Moderate-high confidence
0.50 - 0.70	Moderate confidence
< 0.50	Low confidence

Disease Naming Convention

Disease names follow the TxGNN knowledge graph ontology, which is derived from:

MONDO Disease Ontology
Disease Ontology (DO)
Human Phenotype Ontology (HPO)

Usage Examples

Python

import pandas as pd

# Load KG predictions
kg = pd.read_csv('repurposing_candidates.csv')

# Filter for a specific drug
metformin = kg[kg['ingredient'] == 'METFORMIN']

# Get high-score DL predictions
dl = pd.read_csv('txgnn_checkpoint.csv')
high_conf = dl[dl['txgnn_score'] >= 0.7]

R

library(readr)

# Load KG predictions
kg <- read_csv('repurposing_candidates.csv')

# Filter by drug
prednisolone <- kg %>% filter(ingredient == 'PREDNISOLONE')

# Summary statistics
kg %>% group_by(disease_name) %>% summarise(n = n()) %>% arrange(desc(n))

Terms of Use

By downloading this data, you agree to:

Attribution: Cite MyTxGNN and TxGNN (Huang et al., 2023) in publications
Research Use: Data is provided for research purposes
No Clinical Decisions: Do not use predictions for clinical decisions without validation
Compliance: Follow applicable data protection regulations

Citation

When using this data, please cite:

@article{huang2023txgnn,
  title={A foundation model for clinician-centered drug repurposing},
  author={Huang, Kexin and others},
  journal={Nature Medicine},
  year={2023},
  doi={10.1038/s41591-023-02233-x}
}

Disclaimer
Prediction data is provided for research purposes only. Predictions have not been clinically validated and should not be used for medical decisions.