24 Copy Number Variant analysis plots

Copy Number Variant analysis are another of the common analysis that one can compute on single-cell transcriptomics data. Provided with a reference one can compute, for the rest of the cells, whether there are Copy Number Variations (CNVs) in the cells across the chromosomes. This comes really handy to distinguish between tumor and healthy cells, provided that one has a CNV reference event to rely on. There are many tools to compute such analysis, but one that is highly used and that will be covered in this section is inferCNV.

InferCNV returns many outputs. The most interesting and straightforward one is the final image, such as the one below:

In it, we can observe that, for the different clusters, regions called as chromosome gains are colored as red and regions called as chromosome losses are colored as blue. This image, in the end, is a heatmap, meaning that there is, in the output of inferCNV, a matrix with the scores for each cell and gene, that we can make use of. This object is called, by default, run.final.infercnv_obj. This is the one that we will use.

24.1 Transferring the scores to a FeaturePlot

One of the cool things we can do with this object, is to transfer the inferCNV scores back to our Seurat object and then plot them as a FeaturePlot. This can be achieved with the function SCpubr::do_CopyNumberVariantPlot(). For this function, one needs to provide the Seurat object and the final inferCNV object, together with the chromosome locations. If metacells were computed (not necessary, but used in this example), the mapping of cell-metacell has to be provided as well. Normally, you want to run it with chromosome_focus set to a given chromosome. If not, it computes the results in a chromosome-wise manner:

# This loads "human_chr_locations" into the environment.
utils::data("human_chr_locations", package = "SCpubr")

out <- SCpubr::do_CopyNumberVariantPlot(sample = sample,
                                        infercnv_object = infercnv_object,
                                        using_metacells = TRUE,
                                        metacell_mapping = sample$metacell_mapping,
                                        chromosome_locations = human_chr_locations,
                                        chromosome_focus = "11")
p <- out$`11_umap` |out$`11p_umap` | out$`11q_umap`
p

Scores close to 1 mean no chromosome gain or loss. Higher than one mean gain and lower mean loss. The function automatically computes the plots for all chromosome regions. If we want to restrict the output to a single chromosome, we can do so by stating it in chromosome_focus parameter.

out <- SCpubr::do_CopyNumberVariantPlot(sample = sample,
                                        infercnv_object = infercnv_object,
                                        using_metacells = TRUE,
                                        metacell_mapping = sample$metacell_mapping,
                                        chromosome_locations = human_chr_locations,
                                        chromosome_focus = "11")

24.2 Plotting the scores grouped by a group of interest

Sometimes, however, the FeaturePlot is not sufficient to get the insights we want from these scores. For this, the output of SCpubr::do_CopyNumberVariantPlot() also contains another set of Geyser plots, that showcase the distribution of scores per cells, grouped by a variable of interest.

out <- SCpubr::do_CopyNumberVariantPlot(sample = sample,
                                        infercnv_object = infercnv_object,
                                        using_metacells = TRUE,
                                        metacell_mapping = sample$metacell_mapping,
                                        chromosome_locations = human_chr_locations,
                                        chromosome_focus = "11",
                                        rotate_x_axis_labels = 45)
p <- out$`11p_geyser`
p

Here, we can observe the scores for the different groups, being each dot a cells. Due to the overplotting, we also report the distribution of the data for each group in the center, being the dot the median of the distribution, the thicker lines representing the 66% of the data and the thinner lines the 95%. This way, one can also see where the majority of the cells reside in each group. We can also select other variables to group by.

Tip!

When using enforce_symmetry = TRUE, it can be the case that, for a given chromosome, the Y axis scale is completely driven by a single outlier. This can be fixed by adding your own set of limits using p + ggplot2::ylim(c(y_min, y_max)) and providing the values you want to set the Y axis to.

sample$modified_orig.ident <- sample(x = c("Sample_A", "Sample_B", "Sample_C"), 
                                     size = ncol(sample), 
                                     replace = TRUE, 
                                     prob = c(0.2, 0.7, 0.1))

out <- SCpubr::do_CopyNumberVariantPlot(sample = sample,
                                        infercnv_object = infercnv_object,
                                        using_metacells = TRUE,
                                        group.by = "modified_orig.ident",
                                        metacell_mapping = sample$metacell_mapping,
                                        chromosome_locations = human_chr_locations,
                                        chromosome_focus = "11",
                                        rotate_x_axis_labels = 45)
p <- out$`11p_geyser`
p

24.3 Joint analysis

The ideal use case for such functions is to plot them side by side with a regular UMAP as well:

p1 <- SCpubr::do_DimPlot(sample = sample,
                         plot_cell_borders = TRUE,
                         border.size = 1.5,
                         pt.size = 1)
out <- SCpubr::do_CopyNumberVariantPlot(sample = sample,
                                        infercnv_object = infercnv_object,
                                        using_metacells = TRUE,
                                        metacell_mapping = sample$metacell_mapping,
                                        chromosome_locations = human_chr_locations,
                                        chromosome_focus = "11",
                                        rotate_x_axis_labels = 45)
p2 <- out$`11p_umap`
p3 <- out$`11p_geyser`

p <- (p1 | p2) / p3
p

This way, we can see which clusters are in the UMAP, see their scores and also visualize the distributions!