Methodology
From AI prediction to evidence validation
Overall Pipeline
TxGNN Prediction → Data Collection (Bundle) → Evidence Grading → Report Generation
Step 1: TxGNN Prediction
Two Prediction Approaches
MyTxGNN uses two complementary approaches:
Knowledge Graph (KG) Prediction
Based on existing relationships in the TxGNN biomedical knowledge graph:
- Knowledge Graph Construction
- 17,080 nodes (drugs, diseases, genes, proteins)
- Complex inter-node relationships from multiple sources
- Relationship Inference
- Identifies drugs with similar relationship patterns
- Predicts potential new drug-disease associations
- Output
- Drug-disease pairs based on knowledge graph patterns
- 41,560 predictions for 508 drugs
Deep Learning (DL) Prediction
Using TxGNN’s neural network model:
- Graph Neural Network
- Learns hidden relationships between nodes
- Predicts new drug-disease associations
- Confidence Scoring
- Each drug-disease pair receives a prediction score
- Higher scores indicate higher confidence
- Output
- 9.97 million total predictions
- 176,021 high-confidence predictions (score ≥ 0.7)
Prediction Parameters
| Parameter | KG Method | DL Method |
|---|---|---|
| Score Threshold | Based on graph distance | ≥ 0.7 (high confidence) |
| Exclude Known Indications | Yes | Yes |
| DrugBank Mapping Required | Yes | Yes |
Step 2: Data Collection (Bundle)
For each predicted drug-disease pair, we automatically collect supporting evidence:
Clinical Trials
- Source: ClinicalTrials.gov
- Search Strategy: Drug name + Disease name
- Fields: Trial ID (NCT), Phase, Status, Enrollment
Academic Literature
- Source: PubMed
- Search Strategy: Drug name + Disease name
- Fields: PMID, Title, Year, Journal, Abstract
Drug Information
- Source: DrugBank
- Fields: Mechanism of action, Pharmacology, Indications
Malaysia Regulatory
- Source: NPRA (via data.gov.my)
- Fields: Registration number, Product name, Status, Active ingredients
Step 3: Evidence Level Assessment
Based on collected evidence, we assign evidence levels:
Level Definitions
| Level | Definition | Criteria |
|---|---|---|
| L1 | Multiple Phase 3 RCTs / Systematic Reviews | ≥2 Phase 3 trials or systematic reviews |
| L2 | Single RCT or Multiple Phase 2 Trials | 1 RCT or ≥2 Phase 2 trials |
| L3 | Observational Studies / Large Case Series | Observational studies or case series |
| L4 | Preclinical / Mechanistic / Case Reports | Basic research or limited cases |
| L5 | Model Prediction Only | No clinical evidence found |
Assessment Flow
Phase 3 RCT or Systematic Review?
→
L1
Phase 2 RCT?
→
L2
Observational Study?
→
L3
Preclinical Research?
→
L4
Model Prediction Only
→
L5
Step 4: Decision Recommendations
Based on evidence level and other factors, we provide decision recommendations:
| Decision | Description | Recommended Action |
|---|---|---|
| Go | Strong evidence support | Proceed to evaluation or trial planning |
| Proceed | Sufficient evidence support | Further evaluate feasibility |
| Consider | Some evidence exists | Consider with caution |
| Explore | Worth exploring | Gather more data |
| Hold | Insufficient evidence | Not recommended to proceed |
Factors Affecting Decisions
- Evidence strength and consistency
- NPRA registration and approval status
- Relationship between predicted and original indications
- Safety considerations
Quality Control
Data Validation
- Clinical trial ID format verification
- PubMed ID validity check
- Registration status confirmation
Manual Review
- High evidence level (L1-L2) reports manually confirmed
- Decision recommendation reasonability check
Limitations & Caveats
- Prediction ≠ Causation: TxGNN predictions are based on associations, not causal relationships
- Limited Evidence Collection: Only searches public databases; may miss some evidence
- Language Limitation: Primarily searches English-language data
- Timeliness: Data updates over time; reports reflect status at generation time
This methodology is continuously improved. Feedback is welcome.