Analysis Methodology

Computational approaches for discovering rejuvenation mechanisms in human myoblasts

samples
15 samples samples
timepoints
4 timepoints timepoints
genes
14,848 genes genes
approach
Unsupervised ML approach

Overview

Comprehensive analysis of NANOG-mediated senescence reversal in human skeletal muscle myoblasts

Key Findings

NANOG treatment reverses key senescence hallmarks including genomic instability, loss of proteostasis, and mitochondrial dysfunction

Restoration of DNA damage response via upregulation of DNA repair proteins (BRCA2, SIRT6)

Recovery of heterochromatin marks via upregulation of histones

Reactivation of autophagy and mitochondrial energetics via AMPK upregulation

67.7% of genes are late responders, requiring sustained treatment for maximal effect

Reverses senescence without need for full reprogramming to pluripotent state

Experimental Design

Model System

Human
SPECIES
Homo sapiens
CELL TYPE
Human skeletal muscle myoblasts
INTERVENTION
NANOG overexpression
Young

Young Myoblasts

Early passage (baseline)

Y: 3 samples
Early passage myoblasts (young skeletal muscle progenitors)
Senescent

Senescent

Late passage

S: 3 samples
Late passage myoblasts (replicative senescence)
Treatment

NANOG Treatment

Temporal series

SN5: 3 samples • 5 days
Senescent myoblasts + Nanog overexpression for 5 days
SN10: 3 samples • 10 days
Senescent myoblasts + Nanog overexpression for 10 days
SN15: 3 samples • 15 days
Senescent myoblasts + Nanog overexpression for 15 days

Biological Context

Progenitor Type: Myogenic progenitors (satellite cell-derived myoblasts)

Regeneration Role: Critical for skeletal muscle regeneration

Senescence Impact: Senescence impedes muscle regeneration capacity

Key Finding: NANOG reverses senescence without full reprogramming

Intervention Details

Factor: NANOG

Method: Overexpression via lentiviral transduction

Goal: Reverse replicative senescence in muscle progenitors

Analysis Pipeline Overview

1

Data Processing & Quality Control

Loaded normalized counts, converted to log2 scale, filtered low-expression genes, calculated differential expression across age groups and treatment timepoints

2

UMAP Dimensionality Reduction & Clustering

Applied UMAP (3D) to reduce expression space, then k-means clustering (k=20) to identify gene modules with coordinated expression patterns across senescence and treatment

3

Biological Characterization of Clusters

Characterized each cluster by: senescence enrichment (S vs Y), treatment response patterns, dose-response kinetics, and reversal patterns

4

Cluster Selection for Deep Analysis

Selected 8 clusters (14, 6, 1, 9, 15, 4, 11, 7) based on biological relevance: senescence markers, treatment response, and reversal patterns

5

Cluster Stability Validation

Validated selected clusters through bootstrap iterations with different random seeds. Calculated stability metrics and identified core vs variable genes

6

Co-expression Network Analysis

Built co-expression networks (r > 0.6) within each cluster. Identified hub genes using degree, weighted degree, and betweenness centrality. Detected communities using Louvain algorithm

Philosophy: This analysis emphasizes discovery over confirmation. Rather than testing known pathways, we use unsupervised methods to identify novel gene relationships that standard analyses miss.

Multi-Evidence Scoring System

Combined Priority Score:
Priority = 0.4×Network + 0.3×Reversal + 0.2×Stability + 0.1×Statistical
Priority = weighted sum of evidence categories
🕸️

Network

40% weight

Network centrality (hub score, connections, betweenness)

Rationale: Highly connected genes are more likely functionally important
🔄

Reversal

30% weight

Shows differentiation reversal pattern

Rationale: Genes reversing differentiation are relevant to reprogramming
⚖️

Stability

20% weight

Cluster membership reliability across bootstrap runs

Rationale: Stable clustering indicates reliable co-expression patterns
📊

Statistical

10% weight

Statistical strength (effect sizes)

Rationale: Confidence in observed effects

Interpretation

Priority Score: Higher scores suggest potential importance, not experimental success

Score Range: All scores normalized to 0-1 scale

Limitations:
  • Weights are arbitrary
  • No validation for predictive accuracy
  • Experimental validation required

⚠️ Important: These weights are analytical choices, not validated standards. Different research priorities may justify different weight combinations.

Clustering Methodology

K-means Clustering

Algorithm
K-MEANS
Partitioning-based clustering
Distance Metric
Euclidean
On normalized expression values
Number of Clusters
20
Pre-specified cluster count
Quality Metric
Silhouette Score
Measures cluster separation

Why k-means? K-means partitions genes into a predefined number of clusters based on expression similarity. It's fast, interpretable, and works well when cluster count is known or can be estimated. For this analysis, 20 clusters were chosen to balance granularity and interpretability.

Machine Learning Classification

In addition to clustering, we apply supervised ML to classify genes based on their senescence marker strength and NANOG response characteristics.

Gene Type Classification

Type A (Reversing Markers - Therapeutic Success): Strong senescence markers (top 10% F-statistic) + strong NANOG responders (top 15%) with opposite direction. These are genes NANOG successfully reverses.

Type B (Off-Target Responders): Strong NANOG responders (top 15%) but NOT strong senescence markers. NANOG changes these but they aren't core senescence genes - potential off-target effects.

Type C (Resistant Markers - Therapeutic Targets): Strong senescence markers (top 10% F-statistic) but NOT strong NANOG responders OR wrong direction. Senescence genes that resist NANOG treatment - combination therapy targets.

Other: Does not meet Type A/B/C classification criteria.

Response Kinetics

Early: Significant response within 5 days (>15% reversal)

Late: Requires sustained exposure, peaks at 15 days (>20% reversal)

Plateau: Peak response at intermediate timepoint (10 days)

Non-responder: No significant temporal response pattern

⚠️ Important: These ML classifications are exploratory and help prioritize genes for validation. They represent statistical patterns, not mechanistic proof. Wet-lab validation is required to confirm functional roles.

Co-Expression Network Analysis

Correlation Threshold
|r| > 0.6
Pearson correlation
Network Metrics
Degree, Betweenness, Hub Score
Centrality measures
Community Detection
Louvain Algorithm
Sub-modules within clusters

Guilt by Association: Genes that co-express (correlate highly) often share biological functions. This "guilt by association" principle helps us infer functions for unknown genes based on their well-studied neighbors.

Complete Analysis Pipeline

1

Data Processing & Quality Control

Loaded normalized counts, converted to log2 scale, filtered low-expression genes, calculated differential expression across age groups and treatment timepoints

✓ completed
2

UMAP Dimensionality Reduction & Clustering

Applied UMAP (3D) to reduce expression space, then k-means clustering (k=20) to identify gene modules with coordinated expression patterns across senescence and treatment

✓ completed
3

Biological Characterization of Clusters

Characterized each cluster by: senescence enrichment (S vs Y), treatment response patterns, dose-response kinetics, and reversal patterns

✓ completed
4

Cluster Selection for Deep Analysis

Selected 8 clusters (14, 6, 1, 9, 15, 4, 11, 7) based on biological relevance: senescence markers, treatment response, and reversal patterns

✓ completed
5

Cluster Stability Validation

Validated selected clusters through bootstrap iterations with different random seeds. Calculated stability metrics and identified core vs variable genes

✓ completed
6

Co-expression Network Analysis

Built co-expression networks (r > 0.6) within each cluster. Identified hub genes using degree, weighted degree, and betweenness centrality. Detected communities using Louvain algorithm

Key findings: Top hubs coordinate cluster-specific biological processes; Multiple sub-communities per cluster reveal functional organization; High external connectivity indicates inter-cluster coordination
✓ completed
7

Statistical Validation of Reversal Patterns

Validated senescence reversal patterns using statistical tests. Genes showing opposite effects in senescence (S vs Y) and treatment (SN15 vs S) were identified

✓ completed
8

Pathway & GO Enrichment

Tested for over-representation of KEGG pathways and GO terms using Enrichr. Identified functional themes for each cluster

✓ completed
9

Hub Gene Deep Characterization

Analyzed top hub genes in each cluster. Examined expression patterns across conditions, neighbor annotations for context

✓ completed
10

Multi-Evidence Gene Scoring

Integrated multiple evidence types (network topology, reversal patterns, stability, statistics) into composite priority scores

✓ completed
11

Cross-Cluster Integration

Identified genes that bridge multiple clusters. Ranked all genes across clusters by combined evidence

✓ completed
12

Temporal Dose-Response Analysis

Analyzed gene expression trajectories across treatment timepoints (SN5→SN10→SN15). Classified genes by response kinetics: early responder, late responder, progressive

Key findings: Genes reaching maximum change at SN5; Genes requiring sustained treatment (peak at SN15); Monotonic improvement over time; 67.7% classified as late responders
✓ completed
13

Senescence Marker vs Responder Analysis

Distinguished senescence markers (S vs Y predictive) from Nanog responders (SN15 vs S predictive). Identified overlap genes that are both markers and responders

Key findings: Genes defining senescent state; Genes responding to treatment; Genes that are both markers and treatment targets
✓ completed
14

Gene Type Classification (A/B/C)

Classified genes by therapeutic response pattern: Type A (reversing markers - therapeutic successes where NANOG reverses senescence), Type B (off-target responders - NANOG-responsive but not senescence genes), Type C (resistant markers - senescence genes that resist NANOG treatment)

✓ completed

🎯 Analysis Philosophy

Discovery over confirmation: Rather than testing known pathways (GSEA), we use unsupervised learning to discover novel gene relationships in senescence reversal.

Multi-evidence integration: No single analysis is perfect. We combine network topology, biological patterns, statistical validation, ML classification, and temporal dynamics for robust gene prioritization.

Transparent about limitations: We document what worked and what failed. Computational predictions require wet-lab validation. Temporal patterns represent correlation, not causation.

Arbitrary choices acknowledged: Many bioinformatics "thresholds" are researcher's choice (correlation r>0.6, Type B top 15%, etc.). We document these transparently and encourage sensitivity analyses.

Temporal resolution matters: With 5/10/15 day timepoints, we capture dose-response dynamics but miss earlier events. Gene kinetics classifications are exploratory and should guide, not replace, time-course experiments.