Analysis Methodology

Computational approaches for discovering rejuvenation mechanisms in human myoblasts

15 samples samples

4 timepoints timepoints

14,848 genes genes

Unsupervised ML approach

Overview

Comprehensive analysis of NANOG-mediated senescence reversal in human skeletal muscle myoblasts

Key Findings

NANOG treatment reverses key senescence hallmarks including genomic instability, loss of proteostasis, and mitochondrial dysfunction

Restoration of DNA damage response via upregulation of DNA repair proteins (BRCA2, SIRT6)

Recovery of heterochromatin marks via upregulation of histones

Reactivation of autophagy and mitochondrial energetics via AMPK upregulation

67.7% of genes are late responders, requiring sustained treatment for maximal effect

Reverses senescence without need for full reprogramming to pluripotent state

Experimental Design

Model System

SPECIES

Homo sapiens

CELL TYPE

Human skeletal muscle myoblasts

INTERVENTION

NANOG overexpression

Young Myoblasts

Early passage (baseline)

Y: 3 samples

Early passage myoblasts (young skeletal muscle progenitors)

Senescent

Late passage

S: 3 samples

Late passage myoblasts (replicative senescence)

NANOG Treatment

Temporal series

SN5: 3 samples • 5 days

Senescent myoblasts + Nanog overexpression for 5 days

SN10: 3 samples • 10 days

Senescent myoblasts + Nanog overexpression for 10 days

SN15: 3 samples • 15 days

Senescent myoblasts + Nanog overexpression for 15 days

Biological Context

Progenitor Type: Myogenic progenitors (satellite cell-derived myoblasts)

Regeneration Role: Critical for skeletal muscle regeneration

Senescence Impact: Senescence impedes muscle regeneration capacity

Key Finding: NANOG reverses senescence without full reprogramming

Intervention Details

Factor: NANOG

Method: Overexpression via lentiviral transduction

Goal: Reverse replicative senescence in muscle progenitors

Analysis Pipeline Overview

Data Processing & Quality Control

Loaded normalized counts, converted to log2 scale, filtered low-expression genes, calculated differential expression across age groups and treatment timepoints

UMAP Dimensionality Reduction & Clustering

Applied UMAP (3D) to reduce expression space, then k-means clustering (k=20) to identify gene modules with coordinated expression patterns across senescence and treatment

Biological Characterization of Clusters

Characterized each cluster by: senescence enrichment (S vs Y), treatment response patterns, dose-response kinetics, and reversal patterns

Cluster Selection for Deep Analysis

Selected 8 clusters (14, 6, 1, 9, 15, 4, 11, 7) based on biological relevance: senescence markers, treatment response, and reversal patterns

Cluster Stability Validation

Validated selected clusters through bootstrap iterations with different random seeds. Calculated stability metrics and identified core vs variable genes

Co-expression Network Analysis

Built co-expression networks (r > 0.6) within each cluster. Identified hub genes using degree, weighted degree, and betweenness centrality. Detected communities using Louvain algorithm

Philosophy: This analysis emphasizes discovery over confirmation. Rather than testing known pathways, we use unsupervised methods to identify novel gene relationships that standard analyses miss.

Multi-Evidence Scoring System

Combined Priority Score:

Priority = 0.4×Network + 0.3×Reversal + 0.2×Stability + 0.1×Statistical

Priority = weighted sum of evidence categories

🕸️

Network

40% weight

Network centrality (hub score, connections, betweenness)

Rationale: Highly connected genes are more likely functionally important

🔄

Reversal

30% weight

Shows differentiation reversal pattern

Rationale: Genes reversing differentiation are relevant to reprogramming

⚖️

Stability

20% weight

Cluster membership reliability across bootstrap runs

Rationale: Stable clustering indicates reliable co-expression patterns

📊

Statistical

10% weight

Statistical strength (effect sizes)

Rationale: Confidence in observed effects

Interpretation

Priority Score: Higher scores suggest potential importance, not experimental success

Score Range: All scores normalized to 0-1 scale

Limitations:

Weights are arbitrary
No validation for predictive accuracy
Experimental validation required

⚠️ Important: These weights are analytical choices, not validated standards. Different research priorities may justify different weight combinations.

Clustering Methodology

K-means Clustering

Algorithm

K-MEANS

Partitioning-based clustering

Distance Metric

Euclidean

On normalized expression values

Number of Clusters

Pre-specified cluster count

Quality Metric

Silhouette Score

Measures cluster separation

Why k-means? K-means partitions genes into a predefined number of clusters based on expression similarity. It's fast, interpretable, and works well when cluster count is known or can be estimated. For this analysis, 20 clusters were chosen to balance granularity and interpretability.

Machine Learning Classification

In addition to clustering, we apply supervised ML to classify genes based on their senescence marker strength and NANOG response characteristics.

Gene Type Classification

Type A (Reversing Markers - Therapeutic Success): Strong senescence markers (top 10% F-statistic) + strong NANOG responders (top 15%) with opposite direction. These are genes NANOG successfully reverses.

Type B (Off-Target Responders): Strong NANOG responders (top 15%) but NOT strong senescence markers. NANOG changes these but they aren't core senescence genes - potential off-target effects.

Type C (Resistant Markers - Therapeutic Targets): Strong senescence markers (top 10% F-statistic) but NOT strong NANOG responders OR wrong direction. Senescence genes that resist NANOG treatment - combination therapy targets.

Other: Does not meet Type A/B/C classification criteria.

Response Kinetics

Early: Significant response within 5 days (>15% reversal)

Late: Requires sustained exposure, peaks at 15 days (>20% reversal)

Plateau: Peak response at intermediate timepoint (10 days)

Non-responder: No significant temporal response pattern

⚠️ Important: These ML classifications are exploratory and help prioritize genes for validation. They represent statistical patterns, not mechanistic proof. Wet-lab validation is required to confirm functional roles.

Co-Expression Network Analysis

Correlation Threshold

|r| > 0.6

Pearson correlation

Network Metrics

Degree, Betweenness, Hub Score

Centrality measures

Community Detection

Louvain Algorithm

Sub-modules within clusters

Guilt by Association: Genes that co-express (correlate highly) often share biological functions. This "guilt by association" principle helps us infer functions for unknown genes based on their well-studied neighbors.

Complete Analysis Pipeline

Data Processing & Quality Control

Loaded normalized counts, converted to log2 scale, filtered low-expression genes, calculated differential expression across age groups and treatment timepoints

✓ completed

UMAP Dimensionality Reduction & Clustering

Applied UMAP (3D) to reduce expression space, then k-means clustering (k=20) to identify gene modules with coordinated expression patterns across senescence and treatment

✓ completed

Biological Characterization of Clusters

Characterized each cluster by: senescence enrichment (S vs Y), treatment response patterns, dose-response kinetics, and reversal patterns

✓ completed

Cluster Selection for Deep Analysis

Selected 8 clusters (14, 6, 1, 9, 15, 4, 11, 7) based on biological relevance: senescence markers, treatment response, and reversal patterns

✓ completed

Cluster Stability Validation

Validated selected clusters through bootstrap iterations with different random seeds. Calculated stability metrics and identified core vs variable genes

✓ completed

Co-expression Network Analysis

Built co-expression networks (r > 0.6) within each cluster. Identified hub genes using degree, weighted degree, and betweenness centrality. Detected communities using Louvain algorithm

Key findings: Top hubs coordinate cluster-specific biological processes; Multiple sub-communities per cluster reveal functional organization; High external connectivity indicates inter-cluster coordination

✓ completed

Statistical Validation of Reversal Patterns

Validated senescence reversal patterns using statistical tests. Genes showing opposite effects in senescence (S vs Y) and treatment (SN15 vs S) were identified

✓ completed

Pathway & GO Enrichment

Tested for over-representation of KEGG pathways and GO terms using Enrichr. Identified functional themes for each cluster

✓ completed

Hub Gene Deep Characterization

Analyzed top hub genes in each cluster. Examined expression patterns across conditions, neighbor annotations for context

✓ completed

Multi-Evidence Gene Scoring

Integrated multiple evidence types (network topology, reversal patterns, stability, statistics) into composite priority scores

✓ completed

Cross-Cluster Integration

Identified genes that bridge multiple clusters. Ranked all genes across clusters by combined evidence

✓ completed

Temporal Dose-Response Analysis

Analyzed gene expression trajectories across treatment timepoints (SN5→SN10→SN15). Classified genes by response kinetics: early responder, late responder, progressive

Key findings: Genes reaching maximum change at SN5; Genes requiring sustained treatment (peak at SN15); Monotonic improvement over time; 67.7% classified as late responders

✓ completed

Senescence Marker vs Responder Analysis

Distinguished senescence markers (S vs Y predictive) from Nanog responders (SN15 vs S predictive). Identified overlap genes that are both markers and responders

Key findings: Genes defining senescent state; Genes responding to treatment; Genes that are both markers and treatment targets

✓ completed

Gene Type Classification (A/B/C)

Classified genes by therapeutic response pattern: Type A (reversing markers - therapeutic successes where NANOG reverses senescence), Type B (off-target responders - NANOG-responsive but not senescence genes), Type C (resistant markers - senescence genes that resist NANOG treatment)

✓ completed

🎯 Analysis Philosophy

Discovery over confirmation: Rather than testing known pathways (GSEA), we use unsupervised learning to discover novel gene relationships in senescence reversal.

Multi-evidence integration: No single analysis is perfect. We combine network topology, biological patterns, statistical validation, ML classification, and temporal dynamics for robust gene prioritization.

Transparent about limitations: We document what worked and what failed. Computational predictions require wet-lab validation. Temporal patterns represent correlation, not causation.

Arbitrary choices acknowledged: Many bioinformatics "thresholds" are researcher's choice (correlation r>0.6, Type B top 15%, etc.). We document these transparently and encourage sensitivity analyses.

Temporal resolution matters: With 5/10/15 day timepoints, we capture dose-response dynamics but miss earlier events. Gene kinetics classifications are exploratory and should guide, not replace, time-course experiments.