单细胞geneNMF分析流程学习

非负矩阵分解是一种用于分析高维数据的方法,它可以从一组非负数据向量中提取稀疏且有意义的特征。该方法非常适合分解单细胞RNA测序数据,有效地将大型的复杂矩阵(基因数量乘以细胞数量)分解为几个可解释的基因程序。

非负矩阵相关内容请参阅既往推文:

  1. 单细胞非负矩阵分解分析python版(cNMF)学习:
  2. 转录组非负矩阵分解(NMF)/一致性聚类(ConsensusClusterPlus):

本次内容就直接跟着官方流程进行实操。

geneNMF分析流程
1.导入
代码语言:javascript代码运行次数:0运行复制
rm(list = ls())
library(GeneNMF)
library(Seurat)
library(ggplot2)
library(UCell)
library(patchwork)
library(Matrix)
library(RcppML)
library(viridis)
library(qs)

load("scRNA.Rdata")
DimPlot(scRNA)
2.数据预处理(NMF降维)

NMF可以将数据从成千上万的基因维度降低到几个维度

代码语言:javascript代码运行次数:0运行复制
seu <- scRNA
ndim <- 15

seu <- FindVariableFeatures(seu, nfeatures = length(rownames(scRNA))) #官方是1000
seu <- runNMF(seu, k = ndim, assay="RNA") # 官方是SCT
seu@reductions$NMF
# A dimensional reduction object with key NMF_ 
#  Number of dimensions: 15 
#  Number of cells: 4031 
#  Projected dimensional reduction calculated:  FALSE 
#  Jackstraw run: FALSE 
#  Computed using assay: RNA 
在二维空间展示NMF结果
代码语言:javascript代码运行次数:0运行复制
seu <- RunUMAP(seu, 
               reduction = "NMF", 
               dims=1:ndim, 
               reduction.name = "NMF_UMAP", 
               reduction.key = "nmfUMAP_")
DimPlot(seu, reduction = "NMF_UMAP", 
        group.by = "celltype", label=T) + 
  theme(aspect.ratio = 1,axis.text = element_blank(),
    axis.title = element_blank(),
    axis.ticks = element_blank()) +
  ggtitle("NMF UMAP") + NoLegend()
3.多样本间一致的NMF程序

识别稳健的基因programs需要跨样本检测它们,并考虑输入参数的变异性。对于NMF来说,最关键的参数可能是维度k,它对应于低维矩阵中的程序数量。为了确定稳健的程序,可以对多个k值运行NMF,并确定在这些运行中始终存在的程序。

代码语言:javascript代码运行次数:0运行复制
seu.list <- SplitObject(seu, split.by = "orig.ident")
geneNMF.programs <- multiNMF(seu.list, 
                             assay="RNA", 
                             slot="data", 
                             k=4:9, 
                             nfeatures = length(rownames(scRNA))) # 官方是1000

# 可以将多个样本和数量k中识别出的基因程序组合成元程序(MPs),即在不同 NMF 运行中稳健识别的共识程序。在此,我们将定义10个 MPs
geneNMF.metaprograms <- getMetaPrograms(geneNMF.programs,
                                        nMP=10,
                                        weight.explained = 0.7,
                                        max.genes=100)
4.可视化结果

可视化组成元程序的各个基因程序之间的成对相似度(以余弦相似度或Jaccard指数表示)可能很有用。可以看到对应于数据集和k值中高度相似的基因程序的“块”。然后,可以将相似度树在给定高度处切割,以找到相似程序块并为每个块推导出共识基因特征。例如,这里将树切割到对应于10个程序簇(即10个MP)。

代码语言:javascript代码运行次数:0运行复制
ph <- plotMetaPrograms(geneNMF.metaprograms)
数据提取

还可以检查一些有用的统计数据,例如“样本覆盖率”(MP 在多少样本中被检测到);或者轮廓系数(一个MP中的个体程序相对于其他MP中的程序有多相似——数值越高越好)。基于这些指标,使用者可以去决定删除一些程序,例如,如果仅针对少数样本(样本覆盖率低),或者内部一致性差(轮廓系数和平均相似度低)。

代码语言:javascript代码运行次数:0运行复制
geneNMF.metaprograms$metaprograms.metrics
#      sampleCoverage   silhouette meanSimilarity numberGenes numberPrograms
# MP1       0.8333333  0.694614167          0.760         100             27
# MP2       0.8333333  0.688409044          0.743         100             17
# MP3       0.8333333  0.547281711          0.618         100             20
# MP4       0.8333333  0.512216201          0.574         100             27
# MP5       0.8333333  0.467454460          0.577         100             24
# MP6       0.8333333  0.132801311          0.323         100             21
# MP7       0.8333333 -0.002536882          0.169          37             29
# MP8       0.6666667  0.607938842          0.647         100             20
# MP9       0.6666667  0.536177467          0.608         100             24
# MP10      0.6666667  0.113884487          0.275         100             25
查看MP中的驱动基因
代码语言:javascript代码运行次数:0运行复制
lapply(geneNMF.metaprograms$metaprograms.genes, head)
# $MP1
# [1] "S100A8" "S100A9" "G0S2"   "CXCL8"  "FPR1"   "CSF3R" 
# 
# $MP2
# [1] "MS4A7" "C1QA"  "CD163" "C1QB"  "C1QC"  "F13A1"
# 
# $MP3
# [1] "FOSB"    "HSPA1B"  "DNAJB1"  "HSPA1A"  "JUN"     "SERTAD1"
# 
# $MP4
# [1] "FBLN1" "DPT"   "APOD"  "C1S"   "SFRP1" "C3"   
# 
# $MP5
# [1] "NKG7" "CCL5" "GZMA" "GZMH" "CTSW" "PRF1"
# 
# $MP6
# [1] "LINC00861" "DGKA"      "OXNAD1"    "TRAF3IP3"  "SLFN5"     "GIMAP7"   
# 
# $MP7
# [1] "TAGLN"  "SSR4"   "TPM2"   "MYL9"   "MYDGF"  "IGFBP5"
# 
# $MP8
# [1] "BANK1" "MS4A1" "IGHM"  "CD79A" "IGHD"  "CD22" 
# 
# $MP9
# [1] "VWF"    "CLDN5"  "AQP1"   "PLVAP"  "CAVIN2" "ACKR1" 
# 
# $MP10
# [1] "PGAP1"    "FAM177A1" "UBE2S"    "CREM"     "SC5D"     "CRYBG1" 
修改参数调整program
代码语言:javascript代码运行次数:0运行复制
# 比如调整weight.explained,min.confidence,nMP等参数
geneNMF.metaprograms <- getMetaPrograms(geneNMF.programs,
                                        nMP=8,
                                        #min.confidence = 0.7,
                                        weight.explained = 0.7,
                                        max.genes=100)
5.对基因programs进行GSEA分析
代码语言:javascript代码运行次数:0运行复制
library(msigdbr)
library(fgsea)

top_p <- lapply(geneNMF.metaprograms$metaprograms.genes, function(program) {
  runGSEA(program, universe=rownames(seu), category = "H")
})

head(top_p$MP4)
#                                      pathway         pval         padj overlap  size
#                                        <char>        <num>        <num>   <int> <int>
# 1: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 7.297520e-43 3.648760e-41      32   197
# 2:                       HALLMARK_COAGULATION 4.519321e-08 1.129830e-06       8   124
# 3:                        HALLMARK_MYOGENESIS 9.133267e-08 1.522211e-06       9   190
# 4:                    HALLMARK_UV_RESPONSE_DN 1.296139e-07 1.620173e-06       8   142
# 5:                      HALLMARK_ANGIOGENESIS 1.269939e-05 1.269939e-04       4    35
# 6:                         HALLMARK_APOPTOSIS 4.801798e-04 4.001499e-03       5   157
#                                     overlapGenes
#                                           <list>
# 1: ABI3BP,CALD1,COL12A1,COL1A1,COL1A2,COL3A1,...
# 2:                  C1R,C1S,C3,CTSK,F10,FBN1,...
# 3:    AEBP1,APOD,COL1A1,COL3A1,COL6A2,COL6A3,...
# 4:  COL1A1,COL1A2,COL3A1,COL5A2,EFEMP1,FBLN5,...
# 5:                       COL3A1,COL5A2,FSTL1,LUM
# 6:                      DCN,GPX3,IGFBP6,LUM,MMP2
6.基因program的siganture分数
代码语言:javascript代码运行次数:0运行复制
library(UCell)
mp.genes <- geneNMF.metaprograms$metaprograms.genes
seu <- AddModuleScore_UCell(seu, 
                            features = mp.genes, 
                            assay="RNA", 
                            ncores=4, 
                            name = "")

VlnPlot(seu, features=names(mp.genes), 
        group.by = "celltype",
        pt.size = 0, ncol=5)

对应MP4的GSEA富集结果,MP4主要富集与EMT等通路,其高表达于成纤维细胞。

映射到UMAP
代码语言:javascript代码运行次数:0运行复制
matrix <- seu@meta.data[,names(mp.genes)]

#dimred <- scale(matrix)
dimred <- as.matrix(matrix)

colnames(dimred) <- paste0("MP_",seq(1, ncol(dimred)))
#New dim reduction
seu@reductions[["MPsignatures"]] <- new("DimReduc",
                                         cell.embeddings = dimred,
                                         assay.used = "RNA",
                                         key = "MP_",
                                         global = FALSE)
set.seed(123)
seu <- RunUMAP(seu, reduction="MPsignatures", dims=1:length(seu@reductions[["MPsignatures"]]),
               metric = "euclidean", reduction.name = "umap_MP")

library(viridis)
FeaturePlot(seu, features = names(mp.genes), reduction = "umap_MP", ncol=4) &
  scale_color_viridis(option="B") &
   theme(aspect.ratio = 1, axis.text=element_blank(), axis.ticks=element_blank())
代码语言:javascript代码运行次数:0运行复制
a <- DimPlot(seu, 
             reduction = "umap_MP", 
             group.by = "celltype", 
             label=T) +
  theme(aspect.ratio = 1,
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank()) + 
  ggtitle("Original cell types") + NoLegend()

b <- DimPlot(seu, 
             reduction = "umap_MP", 
             group.by = "orig.ident", 
             label=T) + 
  theme(aspect.ratio = 1,
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank()) + 
  ggtitle("Samples") + NoLegend()
a | b
参考资料:
  1. Wounding triggers invasive progression in human basal cell carcinoma. Laura Yerly, Massimo Andreatta, Josep Garnica, Jeremy Di Domizio, Michel Gilliet, Santiago J Carmona, Francois Kuonen. bioRxiv 2024 10.1101/2024.05.31.596823 IF: NA NA NA
  2. geneNMF: .demo/NMF_demo_PBMC.html
  3. 生信随笔:
  4. 生信益站:

:若对内容有疑惑或者有发现明确错误的朋友,请联系后台(欢迎交流)。更多相关内容可关注公众号:生信方舟

- END -