Analysis Methodology
Computational approaches for discovering rejuvenation mechanisms in human myoblasts
Overview
Comprehensive analysis of NANOG-mediated senescence reversal in human skeletal muscle myoblasts
Key Findings
NANOG treatment reverses key senescence hallmarks including genomic instability, loss of proteostasis, and mitochondrial dysfunction
Restoration of DNA damage response via upregulation of DNA repair proteins (BRCA2, SIRT6)
Recovery of heterochromatin marks via upregulation of histones
Reactivation of autophagy and mitochondrial energetics via AMPK upregulation
67.7% of genes are late responders, requiring sustained treatment for maximal effect
Reverses senescence without need for full reprogramming to pluripotent state
Experimental Design
Model System
Young Myoblasts
Early passage (baseline)
Senescent
Late passage
NANOG Treatment
Temporal series
Biological Context
Progenitor Type: Myogenic progenitors (satellite cell-derived myoblasts)
Regeneration Role: Critical for skeletal muscle regeneration
Senescence Impact: Senescence impedes muscle regeneration capacity
Key Finding: NANOG reverses senescence without full reprogramming
Intervention Details
Factor: NANOG
Method: Overexpression via lentiviral transduction
Goal: Reverse replicative senescence in muscle progenitors
Analysis Pipeline Overview
Data Processing & Quality Control
Loaded normalized counts, converted to log2 scale, filtered low-expression genes, calculated differential expression across age groups and treatment timepoints
UMAP Dimensionality Reduction & Clustering
Applied UMAP (3D) to reduce expression space, then k-means clustering (k=20) to identify gene modules with coordinated expression patterns across senescence and treatment
Biological Characterization of Clusters
Characterized each cluster by: senescence enrichment (S vs Y), treatment response patterns, dose-response kinetics, and reversal patterns
Cluster Selection for Deep Analysis
Selected 8 clusters (14, 6, 1, 9, 15, 4, 11, 7) based on biological relevance: senescence markers, treatment response, and reversal patterns
Cluster Stability Validation
Validated selected clusters through bootstrap iterations with different random seeds. Calculated stability metrics and identified core vs variable genes
Co-expression Network Analysis
Built co-expression networks (r > 0.6) within each cluster. Identified hub genes using degree, weighted degree, and betweenness centrality. Detected communities using Louvain algorithm
Philosophy: This analysis emphasizes discovery over confirmation. Rather than testing known pathways, we use unsupervised methods to identify novel gene relationships that standard analyses miss.
Multi-Evidence Scoring System
Network
40% weightNetwork centrality (hub score, connections, betweenness)
Reversal
30% weightShows differentiation reversal pattern
Stability
20% weightCluster membership reliability across bootstrap runs
Statistical
10% weightStatistical strength (effect sizes)
Interpretation
Priority Score: Higher scores suggest potential importance, not experimental success
Score Range: All scores normalized to 0-1 scale
- Weights are arbitrary
- No validation for predictive accuracy
- Experimental validation required
⚠️ Important: These weights are analytical choices, not validated standards. Different research priorities may justify different weight combinations.
Clustering Methodology
K-means Clustering
Why k-means? K-means partitions genes into a predefined number of clusters based on expression similarity. It's fast, interpretable, and works well when cluster count is known or can be estimated. For this analysis, 20 clusters were chosen to balance granularity and interpretability.
Machine Learning Classification
In addition to clustering, we apply supervised ML to classify genes based on their senescence marker strength and NANOG response characteristics.
Gene Type Classification
Type A (Reversing Markers - Therapeutic Success): Strong senescence markers (top 10% F-statistic) + strong NANOG responders (top 15%) with opposite direction. These are genes NANOG successfully reverses.
Type B (Off-Target Responders): Strong NANOG responders (top 15%) but NOT strong senescence markers. NANOG changes these but they aren't core senescence genes - potential off-target effects.
Type C (Resistant Markers - Therapeutic Targets): Strong senescence markers (top 10% F-statistic) but NOT strong NANOG responders OR wrong direction. Senescence genes that resist NANOG treatment - combination therapy targets.
Other: Does not meet Type A/B/C classification criteria.
Response Kinetics
Early: Significant response within 5 days (>15% reversal)
Late: Requires sustained exposure, peaks at 15 days (>20% reversal)
Plateau: Peak response at intermediate timepoint (10 days)
Non-responder: No significant temporal response pattern
⚠️ Important: These ML classifications are exploratory and help prioritize genes for validation. They represent statistical patterns, not mechanistic proof. Wet-lab validation is required to confirm functional roles.
Co-Expression Network Analysis
Guilt by Association: Genes that co-express (correlate highly) often share biological functions. This "guilt by association" principle helps us infer functions for unknown genes based on their well-studied neighbors.
Complete Analysis Pipeline
Data Processing & Quality Control
Loaded normalized counts, converted to log2 scale, filtered low-expression genes, calculated differential expression across age groups and treatment timepoints
UMAP Dimensionality Reduction & Clustering
Applied UMAP (3D) to reduce expression space, then k-means clustering (k=20) to identify gene modules with coordinated expression patterns across senescence and treatment
Biological Characterization of Clusters
Characterized each cluster by: senescence enrichment (S vs Y), treatment response patterns, dose-response kinetics, and reversal patterns
Cluster Selection for Deep Analysis
Selected 8 clusters (14, 6, 1, 9, 15, 4, 11, 7) based on biological relevance: senescence markers, treatment response, and reversal patterns
Cluster Stability Validation
Validated selected clusters through bootstrap iterations with different random seeds. Calculated stability metrics and identified core vs variable genes
Co-expression Network Analysis
Built co-expression networks (r > 0.6) within each cluster. Identified hub genes using degree, weighted degree, and betweenness centrality. Detected communities using Louvain algorithm
Statistical Validation of Reversal Patterns
Validated senescence reversal patterns using statistical tests. Genes showing opposite effects in senescence (S vs Y) and treatment (SN15 vs S) were identified
Pathway & GO Enrichment
Tested for over-representation of KEGG pathways and GO terms using Enrichr. Identified functional themes for each cluster
Hub Gene Deep Characterization
Analyzed top hub genes in each cluster. Examined expression patterns across conditions, neighbor annotations for context
Multi-Evidence Gene Scoring
Integrated multiple evidence types (network topology, reversal patterns, stability, statistics) into composite priority scores
Cross-Cluster Integration
Identified genes that bridge multiple clusters. Ranked all genes across clusters by combined evidence
Temporal Dose-Response Analysis
Analyzed gene expression trajectories across treatment timepoints (SN5→SN10→SN15). Classified genes by response kinetics: early responder, late responder, progressive
Senescence Marker vs Responder Analysis
Distinguished senescence markers (S vs Y predictive) from Nanog responders (SN15 vs S predictive). Identified overlap genes that are both markers and responders
Gene Type Classification (A/B/C)
Classified genes by therapeutic response pattern: Type A (reversing markers - therapeutic successes where NANOG reverses senescence), Type B (off-target responders - NANOG-responsive but not senescence genes), Type C (resistant markers - senescence genes that resist NANOG treatment)
🎯 Analysis Philosophy
Discovery over confirmation: Rather than testing known pathways (GSEA), we use unsupervised learning to discover novel gene relationships in senescence reversal.
Multi-evidence integration: No single analysis is perfect. We combine network topology, biological patterns, statistical validation, ML classification, and temporal dynamics for robust gene prioritization.
Transparent about limitations: We document what worked and what failed. Computational predictions require wet-lab validation. Temporal patterns represent correlation, not causation.
Arbitrary choices acknowledged: Many bioinformatics "thresholds" are researcher's choice (correlation r>0.6, Type B top 15%, etc.). We document these transparently and encourage sensitivity analyses.
Temporal resolution matters: With 5/10/15 day timepoints, we capture dose-response dynamics but miss earlier events. Gene kinetics classifications are exploratory and should guide, not replace, time-course experiments.
