AUCell

AUCell单细胞基因集分析

Posted by CHY on September 28, 2020

AUCell 主要用于识别单细胞 RNA 数据中具有活性基因集(gene signatures)的细胞。
AUCell 使用“曲线下面积”(Area Under the Curve,AUC)来计算输入基因集的一个关键子集是否在每个细胞的表达基因中富集。AUC 分数在所有细胞的分布允许探索 signatures 的相对表达。

AUCell 分析步骤

  1. Build the rankings
  2. Calculate the Area Under the Curve(AUC)
  3. Set the assignment thresholds
library(AUCell)
library(clusterProfiler)
library(ggplot2)
library(Seurat)
library(SeuratData)

cd8.seurat <-  pbmc3k.final
cells_rankings <- AUCell_buildRankings(cd8.seurat@assays$RNA@data)

c5 <- read.gmt("c5.cc.v7.1.symbols.gmt")     # load gene set
geneSets <- lapply(unique(c5$ont), function(x){print(x);c5$gene[c5$ont == x]})
names(geneSets) <- unique(c5$ont)

cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=nrow(cells_rankings)*0.1)

length(rownames(cells_AUC@assays@data$AUC))
grep("REG",rownames(cells_AUC@assays@data$AUC),value = T)  # 匹配筛选特定通路

geneSet <- "GO_PERINUCLEAR_REGION_OF_CYTOPLASM"     # 特定通路分析
aucs <- as.numeric(getAUC(cells_AUC)[geneSet, ])
cd8.seurat$AUC <- aucs
df<- data.frame(cd8.seurat@meta.data, cd8.seurat@reductions$umap@cell.embeddings)
class_avg <- df %>%
  group_by( seurat_annotations) %>%
  summarise(
    UMAP_1 = median(UMAP_1),
    UMAP_2 = median(UMAP_2)
  )
ggplot(df, aes(UMAP_1, UMAP_2))  +
  geom_point(aes(colour  = AUC)) + viridis::scale_color_viridis(option="A") +
  ggrepel::geom_label_repel(aes(label = seurat_annotations),
                            data = class_avg,
                            size = 6,
                            label.size = 0,
                            segment.color = NA
  )+   theme(legend.position = "none") + theme_bw()