生信分析工作室
WGCNA
免疫浸润
LASSO/RF/SVM-RFE
biomarker
骨质疏松

GSE56815 单核细胞转录组实战(下)

WGCNA 共表达、免疫信号打分、机器学习筛 biomarker

14 分钟阅读
GSE56815 单核细胞转录组实战(下)

上一篇把数据带到了通路层面:BMD 的差异基因虽然不算多,但通路富集非常聚焦地指向 TNFα/NF-κB。这一篇我们把分析继续推到三个更高阶的方向:(1)WGCNA 看看哪些基因『成团』、哪一团与 BMD 直接相关;(2)免疫和单核细胞亚群信号打分,看看 High vs Low BMD 的免疫状态是否真有差异;(3)用机器学习从 DEG 里拿出一个最小化的诊断 panel,再用 ROC 评估它的判别能力。

第五步 · WGCNA:共表达模块 + 模块-性状相关

WGCNA 把基因按表达模式聚成『模块』,再把每个模块的『代表表达值』(module eigengene)拿去和临床性状做相关,得到的就是 module-trait 关系热图。我们在 top-5000 方差基因上构网,signed 网络保留正负向。

R · 05_wgcna.R
sft <- pickSoftThreshold(datExpr, powerVector = 1:20, networkType = "signed")
net <- blockwiseModules(datExpr, power = sft$powerEstimate,
                        networkType = "signed", TOMType = "signed",
                        minModuleSize = 30, mergeCutHeight = 0.25)
MEs   <- moduleEigengenes(datExpr, labels2colors(net$colors))$eigengenes
modTraitCor <- cor(MEs, traits, use = "p")

5.1 软阈值与基因模块

Soft-threshold scan — scale-free topology R² and mean connectivity.
Soft-threshold scan — scale-free topology R² and mean connectivity.
Gene dendrogram and module colour assignment (turquoise, blue, brown, yellow, green, red, black, pink, magenta + grey).
Gene dendrogram and module colour assignment (turquoise, blue, brown, yellow, green, red, black, pink, magenta + grey).

5.2 模块-性状相关热图

Module-trait correlation. Number above = correlation r, number below = p-value. Colours: blue = negative, coral = positi
Module-trait correlation. Number above = correlation r, number below = p-value. Colours: blue = negative, coral = positive.

5.3 BMD 相关模块的 Hub 基因

Top-20 hub genes ranked by |kME|. VPS35, DPP8, ADSL, PSMA7, LPIN1, TRIM44 占据前列。
Top-20 hub genes ranked by |kME|. VPS35, DPP8, ADSL, PSMA7, LPIN1, TRIM44 占据前列。

第六步 · 免疫与单核细胞亚群信号打分

数据本身是分选过的单核细胞,所以传统 CIBERSORT 那种『反卷积细胞比例』并不合适。我们换一个角度:用 ssGSEA 在 21 套精选签名上给每个样本打分 —— 包括三种单核细胞亚群(classical / intermediate / non-classical)、破骨细胞前体、M1/M2 巨噬、T/B/NK/DC,以及 TNF/NF-κB/IFN/TGF-β 等核心通路。

R · 06_immune.R
par   <- GSVA::ssgseaParam(as.matrix(expr), immune_signatures)
score <- GSVA::gsva(par)
pvals <- apply(score, 1, function(x)
  wilcox.test(x[pheno$BMD=="High"], x[pheno$BMD=="Low"])$p.value)

6.1 总览热图

Sample-by-signature ssGSEA z-score heatmap with BMD / Menopause annotations.
Sample-by-signature ssGSEA z-score heatmap with BMD / Menopause annotations.

6.2 BMD 显著差异的 Top 9 个签名

Top 9 BMD-differential signatures (Wilcoxon, ranked by raw p). NF-κB signalling, antigen presentation and neutrophil sco
Top 9 BMD-differential signatures (Wilcoxon, ranked by raw p). NF-κB signalling, antigen presentation and neutrophil score are the strongest.

6.3 签名之间的相关结构

Spearman correlation between the 21 signatures. NF-κB / Inflammation / IFN-γ form a tight pro-inflammatory cluster.
Spearman correlation between the 21 signatures. NF-κB / Inflammation / IFN-γ form a tight pro-inflammatory cluster.

第七步 · 机器学习筛 biomarker:三算法共识 + ROC

把第三步的 BMD DEG(不足时回落到 top-200 raw P)放进三套截然不同的特征选择算法:LASSO(线性稀疏)、随机森林(非线性树形)、SVM-RFE(递归消除),最后取交集 / 并集做共识 panel。70/30 分层切训练-测试集,再在 panel 上拟合一个简单的逻辑回归做 ROC。

R · 07_ml.R
trainIdx <- createDataPartition(y, p = 0.7, list = FALSE)
cvfit  <- cv.glmnet(Xtr, ytr, family = "binomial", alpha = 1, nfolds = 5)
rf     <- randomForest(Xtr, ytr, ntree = 1000, importance = TRUE)
svmrfe <- rfe(Xtr, ytr, sizes = c(5,10,15,20,30,50,80),
              method = "svmLinear",
              rfeControl = rfeControl(functions = caretFuncs, method = "cv", number = 5))
panel  <- Reduce(intersect, list(lasso_feat, rf_top, svm_feat))

7.1 LASSO 调参与系数路径

LASSO 5-fold cross-validation; minimum-deviance λ marked.
LASSO 5-fold cross-validation; minimum-deviance λ marked.
LASSO coefficient paths across log(λ).
LASSO coefficient paths across log(λ).

7.2 随机森林变量重要性 + SVM-RFE

Random forest mean-decrease-in-accuracy (top 20).
Random forest mean-decrease-in-accuracy (top 20).
SVM-RFE feature-size scan — accuracy peaks at moderate panel size.
SVM-RFE feature-size scan — accuracy peaks at moderate panel size.

7.3 三算法共识

Feature counts by method and the consensus intersection.
Feature counts by method and the consensus intersection.

7.4 ROC:训练 + 测试

Biomarker panel ROC on the held-out 30 % test set (blue = train, coral = test).
Biomarker panel ROC on the held-out 30 % test set (blue = train, coral = test).

7.5 Panel 基因表达直观对比

Per-gene boxplots of the panel by BMD, with Wilcoxon p shown.
Per-gene boxplots of the panel by BMD, with Wilcoxon p shown.
Panel expression z-score heatmap across all 80 samples.
Panel expression z-score heatmap across all 80 samples.

两篇总结 · 一份可复现的范式

把两篇拼起来,GSE56815 给我们的整体画面是这样的:

① BMD 在外周单核细胞上的转录组信号偏弱但有方向性,绝经后被显著放大;

② 通路层面高度聚焦在 TNFα / NF-κB 轴,并伴随补体、p53、雌激素响应同向变化;

③ ssGSEA 在免疫信号层面再次验证了 NF-κB / 抗原递呈 / 中性粒在 High BMD 升高;

④ LASSO + RF + SVM-RFE 共识给出 7 基因 panel:CDC42EP3、TRIM44、NCOA1、FOXO3、NBEAL2、ZEB2、HIRA,其中 TRIM44 同时是 WGCNA Hub。

整套流程脚本(config + utils + 01~07 + run_all + install)已经模板化,换一个 GEO ID + 改一下表型抽取的正则,就能跑下一个数据集。如果你也想要这样一份『丢矩阵进去就出图』的复现性 pipeline,或者需要把它定制到自己的数据 / 期刊配色 / 报告风格上:

本文相关服务

Bulk 转录组分析

想把类似的分析跑在你自己的数据上?可以直接看服务详情或发起咨询。

更多案例