单细胞geneNMF分析流程学习
非负矩阵分解是一种用于分析高维数据的方法,它可以从一组非负数据向量中提取稀疏且有意义的特征。该方法非常适合分解单细胞RNA测序数据,有效地将大型的复杂矩阵(基因数量乘以细胞数量)分解为几个可解释的基因程序。
非负矩阵相关内容请参阅既往推文:
- 单细胞非负矩阵分解分析python版(cNMF)学习:
- 转录组非负矩阵分解(NMF)/一致性聚类(ConsensusClusterPlus):
本次内容就直接跟着官方流程进行实操。
geneNMF分析流程
1.导入
代码语言:javascript代码运行次数:0运行复制rm(list = ls())
library(GeneNMF)
library(Seurat)
library(ggplot2)
library(UCell)
library(patchwork)
library(Matrix)
library(RcppML)
library(viridis)
library(qs)
load("scRNA.Rdata")
DimPlot(scRNA)
2.数据预处理(NMF降维)
NMF可以将数据从成千上万的基因维度降低到几个维度
代码语言:javascript代码运行次数:0运行复制seu <- scRNA
ndim <- 15
seu <- FindVariableFeatures(seu, nfeatures = length(rownames(scRNA))) #官方是1000
seu <- runNMF(seu, k = ndim, assay="RNA") # 官方是SCT
seu@reductions$NMF
# A dimensional reduction object with key NMF_
# Number of dimensions: 15
# Number of cells: 4031
# Projected dimensional reduction calculated: FALSE
# Jackstraw run: FALSE
# Computed using assay: RNA
在二维空间展示NMF结果
代码语言:javascript代码运行次数:0运行复制seu <- RunUMAP(seu,
reduction = "NMF",
dims=1:ndim,
reduction.name = "NMF_UMAP",
reduction.key = "nmfUMAP_")
DimPlot(seu, reduction = "NMF_UMAP",
group.by = "celltype", label=T) +
theme(aspect.ratio = 1,axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) +
ggtitle("NMF UMAP") + NoLegend()
3.多样本间一致的NMF程序
识别稳健的基因programs需要跨样本检测它们,并考虑输入参数的变异性。对于NMF来说,最关键的参数可能是维度k,它对应于低维矩阵中的程序数量。为了确定稳健的程序,可以对多个k值运行NMF,并确定在这些运行中始终存在的程序。
代码语言:javascript代码运行次数:0运行复制seu.list <- SplitObject(seu, split.by = "orig.ident")
geneNMF.programs <- multiNMF(seu.list,
assay="RNA",
slot="data",
k=4:9,
nfeatures = length(rownames(scRNA))) # 官方是1000
# 可以将多个样本和数量k中识别出的基因程序组合成元程序(MPs),即在不同 NMF 运行中稳健识别的共识程序。在此,我们将定义10个 MPs
geneNMF.metaprograms <- getMetaPrograms(geneNMF.programs,
nMP=10,
weight.explained = 0.7,
max.genes=100)
4.可视化结果
可视化组成元程序的各个基因程序之间的成对相似度(以余弦相似度或Jaccard指数表示)可能很有用。可以看到对应于数据集和k值中高度相似的基因程序的“块”。然后,可以将相似度树在给定高度处切割,以找到相似程序块并为每个块推导出共识基因特征。例如,这里将树切割到对应于10个程序簇(即10个MP)。
代码语言:javascript代码运行次数:0运行复制ph <- plotMetaPrograms(geneNMF.metaprograms)
数据提取
还可以检查一些有用的统计数据,例如“样本覆盖率”(MP 在多少样本中被检测到);或者轮廓系数(一个MP中的个体程序相对于其他MP中的程序有多相似——数值越高越好)。基于这些指标,使用者可以去决定删除一些程序,例如,如果仅针对少数样本(样本覆盖率低),或者内部一致性差(轮廓系数和平均相似度低)。
代码语言:javascript代码运行次数:0运行复制geneNMF.metaprograms$metaprograms.metrics
# sampleCoverage silhouette meanSimilarity numberGenes numberPrograms
# MP1 0.8333333 0.694614167 0.760 100 27
# MP2 0.8333333 0.688409044 0.743 100 17
# MP3 0.8333333 0.547281711 0.618 100 20
# MP4 0.8333333 0.512216201 0.574 100 27
# MP5 0.8333333 0.467454460 0.577 100 24
# MP6 0.8333333 0.132801311 0.323 100 21
# MP7 0.8333333 -0.002536882 0.169 37 29
# MP8 0.6666667 0.607938842 0.647 100 20
# MP9 0.6666667 0.536177467 0.608 100 24
# MP10 0.6666667 0.113884487 0.275 100 25
查看MP中的驱动基因
代码语言:javascript代码运行次数:0运行复制lapply(geneNMF.metaprograms$metaprograms.genes, head)
# $MP1
# [1] "S100A8" "S100A9" "G0S2" "CXCL8" "FPR1" "CSF3R"
#
# $MP2
# [1] "MS4A7" "C1QA" "CD163" "C1QB" "C1QC" "F13A1"
#
# $MP3
# [1] "FOSB" "HSPA1B" "DNAJB1" "HSPA1A" "JUN" "SERTAD1"
#
# $MP4
# [1] "FBLN1" "DPT" "APOD" "C1S" "SFRP1" "C3"
#
# $MP5
# [1] "NKG7" "CCL5" "GZMA" "GZMH" "CTSW" "PRF1"
#
# $MP6
# [1] "LINC00861" "DGKA" "OXNAD1" "TRAF3IP3" "SLFN5" "GIMAP7"
#
# $MP7
# [1] "TAGLN" "SSR4" "TPM2" "MYL9" "MYDGF" "IGFBP5"
#
# $MP8
# [1] "BANK1" "MS4A1" "IGHM" "CD79A" "IGHD" "CD22"
#
# $MP9
# [1] "VWF" "CLDN5" "AQP1" "PLVAP" "CAVIN2" "ACKR1"
#
# $MP10
# [1] "PGAP1" "FAM177A1" "UBE2S" "CREM" "SC5D" "CRYBG1"
修改参数调整program
代码语言:javascript代码运行次数:0运行复制# 比如调整weight.explained,min.confidence,nMP等参数
geneNMF.metaprograms <- getMetaPrograms(geneNMF.programs,
nMP=8,
#min.confidence = 0.7,
weight.explained = 0.7,
max.genes=100)
5.对基因programs进行GSEA分析
代码语言:javascript代码运行次数:0运行复制library(msigdbr)
library(fgsea)
top_p <- lapply(geneNMF.metaprograms$metaprograms.genes, function(program) {
runGSEA(program, universe=rownames(seu), category = "H")
})
head(top_p$MP4)
# pathway pval padj overlap size
# <char> <num> <num> <int> <int>
# 1: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 7.297520e-43 3.648760e-41 32 197
# 2: HALLMARK_COAGULATION 4.519321e-08 1.129830e-06 8 124
# 3: HALLMARK_MYOGENESIS 9.133267e-08 1.522211e-06 9 190
# 4: HALLMARK_UV_RESPONSE_DN 1.296139e-07 1.620173e-06 8 142
# 5: HALLMARK_ANGIOGENESIS 1.269939e-05 1.269939e-04 4 35
# 6: HALLMARK_APOPTOSIS 4.801798e-04 4.001499e-03 5 157
# overlapGenes
# <list>
# 1: ABI3BP,CALD1,COL12A1,COL1A1,COL1A2,COL3A1,...
# 2: C1R,C1S,C3,CTSK,F10,FBN1,...
# 3: AEBP1,APOD,COL1A1,COL3A1,COL6A2,COL6A3,...
# 4: COL1A1,COL1A2,COL3A1,COL5A2,EFEMP1,FBLN5,...
# 5: COL3A1,COL5A2,FSTL1,LUM
# 6: DCN,GPX3,IGFBP6,LUM,MMP2
6.基因program的siganture分数
代码语言:javascript代码运行次数:0运行复制library(UCell)
mp.genes <- geneNMF.metaprograms$metaprograms.genes
seu <- AddModuleScore_UCell(seu,
features = mp.genes,
assay="RNA",
ncores=4,
name = "")
VlnPlot(seu, features=names(mp.genes),
group.by = "celltype",
pt.size = 0, ncol=5)
对应MP4的GSEA富集结果,MP4主要富集与EMT等通路,其高表达于成纤维细胞。
映射到UMAP
代码语言:javascript代码运行次数:0运行复制matrix <- seu@meta.data[,names(mp.genes)]
#dimred <- scale(matrix)
dimred <- as.matrix(matrix)
colnames(dimred) <- paste0("MP_",seq(1, ncol(dimred)))
#New dim reduction
seu@reductions[["MPsignatures"]] <- new("DimReduc",
cell.embeddings = dimred,
assay.used = "RNA",
key = "MP_",
global = FALSE)
set.seed(123)
seu <- RunUMAP(seu, reduction="MPsignatures", dims=1:length(seu@reductions[["MPsignatures"]]),
metric = "euclidean", reduction.name = "umap_MP")
library(viridis)
FeaturePlot(seu, features = names(mp.genes), reduction = "umap_MP", ncol=4) &
scale_color_viridis(option="B") &
theme(aspect.ratio = 1, axis.text=element_blank(), axis.ticks=element_blank())
a <- DimPlot(seu,
reduction = "umap_MP",
group.by = "celltype",
label=T) +
theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) +
ggtitle("Original cell types") + NoLegend()
b <- DimPlot(seu,
reduction = "umap_MP",
group.by = "orig.ident",
label=T) +
theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) +
ggtitle("Samples") + NoLegend()
a | b
参考资料:
- Wounding triggers invasive progression in human basal cell carcinoma. Laura Yerly, Massimo Andreatta, Josep Garnica, Jeremy Di Domizio, Michel Gilliet, Santiago J Carmona, Francois Kuonen. bioRxiv 2024 10.1101/2024.05.31.596823 IF: NA NA NA
- geneNMF: .demo/NMF_demo_PBMC.html
- 生信随笔:
- 生信益站:
注:若对内容有疑惑或者有发现明确错误的朋友,请联系后台(欢迎交流)。更多相关内容可关注公众号:生信方舟
- END -
发布评论