8  Bar plots

Bar plots are another well known data representation. They are a very handy resource to plot summary statistics for the QC part of any single-cell analysis. Consequently, they are part of SCpubr, available on SCpubr::do_BarPlot(). Let’s say we are interested into plotting the different number of cells in each cluster.

8.1 Basic usage

# Basic bar plot, horizontal.
p1 <- SCpubr::do_BarPlot(sample = sample, 
                         group.by = "seurat_clusters", 
                         legend.position = "none", 
                         plot.title = "Number of cells per cluster")

# Basic bar plot, vertical.
p2 <- SCpubr::do_BarPlot(sample = sample, 
                         group.by = "seurat_clusters", 
                         legend.position = "none",
                         plot.title = "Number of cells per cluster", 
                         flip = TRUE)
p <- p1 | p2
p

Basic bar plot.

Using SCpubr::do_BarPlot() with only group.by yields a simple bar plot which is ordered by descending value. We can also set up the direction of the bars with flip = TRUE/FALSE, which by default is set to be vertical. There is an underlying assumption that is being taken to generate these plots:

  • The values in group.by need to be metadata variables, stored in object@meta.data. They have to be either a character or factor columns.

8.2 Grouping by a second variable

Let’s expand on the previous example on the number of cells per cluster. What if we were interested not only on that, but we would like to profile how many cells from each cluster are present in each of the unique samples present in the Seurat object? For this, we need to provide SCpubr::do_BarPlot() with a second parameter, split.by, that tackles how we want the feature to be grouped:

sample$modified_orig.ident <- sample(x = c("Sample_A", "Sample_B", "Sample_C"), 
                                     size = ncol(sample), 
                                     replace = TRUE, 
                                     prob = c(0.2, 0.7, 0.1))

# Split by a second variable.
p1 <- SCpubr::do_BarPlot(sample,
                         group.by = "seurat_clusters",
                         split.by = "modified_orig.ident",
                         plot.title = "Number of cells per cluster in each sample",
                         position = "stack")

p2 <- SCpubr::do_BarPlot(sample, 
                         group.by = "modified_orig.ident", 
                         split.by = "seurat_clusters",
                         plot.title = "Number of cells per sample in each cluster",
                         position = "stack")
p <- p1 | p2
p

Split the bars by another variable.

As we can see, this nicely yields as many number of bars as unique values in the group.by, and this bars are segmented by as many times as unique values in split.by. At first, this is hard to grasp, but it helps thinking of these two parameters, when used together, as:

  • group.by: What I want to show as different bars, the total number of counts.
  • split.by: Secondary variable on which the bars generated by group.by can be further subdivided.

Another interesting parameter introduced in the last example is position. Position can be either stack or fill. The difference between them is that position = "stack" will yield the total number of cells for each of the unique values in feature, while position = "fill" will bring all bars to the same height and will split each bar into the proportions within each bar of the different groups (only one if group.by = NULL and as many groups if group.by is used). Therefore, it becomes highly recommended to use position = "stack" when group.by is not used and position = "fill" otherwise. This is also warned by the package. If you want to silence the warnings, use verbose = FALSE.

# Position stack and fill with and without split.by.
p1 <- SCpubr::do_BarPlot(sample, 
                         group.by = "seurat_clusters",
                         plot.title = "Without split.by - position = stack",
                         position = "stack",
                         flip = FALSE)

p2 <- SCpubr::do_BarPlot(sample, 
                         group.by = "seurat_clusters",
                         plot.title = "Without split.by - position = fill",
                         position = "fill",
                         flip = FALSE)

p3 <- SCpubr::do_BarPlot(sample, 
                         group.by = "seurat_clusters",
                         split.by = "modified_orig.ident",
                         plot.title = "With split.by - position = stack",
                         position = "stack",
                         flip = FALSE)

p4 <- SCpubr::do_BarPlot(sample, 
                         group.by = "seurat_clusters",
                         split.by = "modified_orig.ident",
                         plot.title = "With split.by - position = fill",
                         position = "fill",
                         flip = FALSE)
p <- (p1 | p2) / (p3 | p4)
p

Use position stack or position fill.