Last updated: 2019-08-15

Checks: 7 0

Knit directory: BUS_notebooks_R/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20181214) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:

Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    BUS_notebooks_R.Rproj
    Ignored:    analysis/figure/
    Ignored:    data/whitelist_v3.txt.gz
    Ignored:    output/out_pbmc1k/

Untracked files:
    Untracked:  ensembl/
    Untracked:  output/neuron10k/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 1cafb2f Lambda Moses 2019-08-16 Rebuilt with Ensembl 97
Rmd 32adc25 Lior Pachter 2019-07-26 Update monocle3.Rmd
html 5ebb547 Lambda Moses 2019-07-23 Build site.
Rmd 5cb1030 Lambda Moses 2019-07-23 Don’t show progress bar of learn_graph
html 2e1a104 Lambda Moses 2019-07-23 Build site.
Rmd 846cd23 Lambda Moses 2019-07-23 Don’t show package loading messages
html 002b84d Lambda Moses 2019-07-23 Build site.
Rmd 84b7653 Lambda Moses 2019-07-23 Monocle 3 notebook


This notebook does pseudotime analysis of the 10x 10k neurons from an E18 mouse using Monocle 3, and starting with the kallisto | bustools workflow. Please refer to the Monocle 3 website for installation instruction. Gdal is required for Monocle 3 due to the sf dependency. The Python package louvain (which requires Python igraph, which requires compilation) is recommended, though not strictly required.

Monocle 3 has a few improvements compared to Monocle 2: Monocle 3 infers trajectory much faster, supports partitioning of the data into separate trajectories in case of different cell lineages, supports large datasets on disk, allows cycles in the trajectory, and supports 3D plotting. Monocle 3 has been used on over 2 million cells for the Mouse Organogenesis Cell Atlas (MOCA).

The gene count matrix of the 10k neuron dataset has already been generated with the kallisto | bustools pipeline and filtered for the Monocle 2 notebook. Cell types have also been annotated with SingleR in that notebook. Please refer to the first 3 main sections of that notebook for instructions on how to use kallisto | bustools, remove empty droplets, and annotate cell types. So this notebook will start with Monocle 3 analysis right away.

The package biomaRt is from Bioconductor. BUSpaRse and monocle3 are on GitHub. Packages tidyverse and ggsci (just for the D3 palette) are on CRAN.


Create a cell_data_set object

The filtered gene count matrix and the cell annotation were saved from the Monocle 2 notebook.

annot <- readRDS("./output/neuron10k/cell_type.rds")
mat_filtered <- readRDS("./output/neuron10k/mat_filtered.rds")

Just to show the structures of those 2 objects:

[1] 23516 11037
[1] "dgCMatrix"
[1] "Matrix"

Row names are Ensembl gene IDs.

[1] "ENSMUSG00000094619.2"  "ENSMUSG00000095646.1"  "ENSMUSG00000092401.3" 
[4] "ENSMUSG00000109389.1"  "ENSMUSG00000092212.3"  "ENSMUSG00000079293.11"
List of 10
 $ scores       : num [1:11037, 1:28] 0.188 0.194 0.182 0.267 0.187 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:28] "Adipocytes" "aNSCs" "Astrocytes" "Astrocytes activated" ...
 $ labels       : chr [1:11037, 1] "NPCs" "NPCs" "NPCs" "NPCs" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
 $ r            : num [1:11037, 1:358] 0.187 0.197 0.179 0.251 0.171 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:358] "ERR525589Aligned" "ERR525592Aligned" "SRR275532Aligned" "SRR275534Aligned" ...
 $ pval         : Named num [1:11037] 0.0428 0.058 0.0383 0.0111 0.0327 ...
 $ labels1      : chr [1:11037, 1] "NPCs" "NPCs" "NPCs" "NPCs" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
 $ labels1.thres: chr [1:11037] "NPCs" "X" "NPCs" "NPCs" ...
 $ quantile.use : num 0.8
 $ types        : chr [1:358] "Adipocytes" "Adipocytes" "Adipocytes" "Adipocytes" ...
 $ method       : chr "single"

Since we will do differential expression and gene symbols are more human readable than Ensembl gene IDs, we will get the corresponding gene symbols from Ensembl.

gns <- tr2g_ensembl(species = "Mus musculus", use_gene_name = TRUE, 
                    ensembl_version = 97)[,c("gene", "gene_name")] %>% 
Querying biomart for transcript and gene IDs of Mus musculus
Note: requested host was redirected from to
This often occurs when connecting to the archive URL for the current Ensembl release
You can check the current version number using listEnsemblArchives()
Cache found

Just like how the Seurat workflow is centered around the Seurat object, Monocle 3 workflow is centered around the cell_data_set object. To create a cell_data_set object, we at least need the gene count matrix, and optionally need cell and gene metadata. Note that this is different from the CellDataSet object for Monocle 2. Cell metadata must have the column cell, for cell names or barcodes. Gene metadata must have the id column for gene ID (here Ensembl gene ID), and a column gene_short_name, for gene symbols, is recommended and is required for some functions. In addition, cell metadata must have cell names as row names and gene metadata must have gene IDs as gene names when initializing the cell_data_set object.

cell_meta <- data.frame(cell = colnames(mat_filtered),
                        cell_type = annot$labels[match(colnames(mat_filtered), annot$cell.names)],
                        stringsAsFactors = FALSE)
rownames(cell_meta) <- colnames(mat_filtered)
gene_meta <- gns %>% 
  filter(gene %in% rownames(mat_filtered), ! %>% 
  rename(id = gene, gene_short_name = gene_name)
rownames(gene_meta) <- gene_meta$id
gene_meta <- gene_meta[rownames(mat_filtered),]
cds <- new_cell_data_set(mat_filtered, cell_metadata = cell_meta, gene_metadata = gene_meta)

Dimension reduction

Part of the analysis is quite similar to a basic Seurat analysis: beginning with data normalization, then PCA, and then clustering and non-linear dimension reduction. Here the preprocess_cds will log1p normalize (default, uses log2) the data and perform PCA (default) or LSI (latent semantic indexing).

# Run PCA
cds <- preprocess_cds(cds, method = "PCA", num_dim = 50)

Version Author Date
2e1a104 Lambda Moses 2019-07-23
plot_cells(cds, reduction_method = "PCA",
           color_cells_by = "cell_type", group_label_size = 3.5,
           label_groups_by_cluster = FALSE) +
  scale_color_d3(palette = "category20b")
No trajectory to plot. Has learn_graph() been called yet?

Version Author Date
2e1a104 Lambda Moses 2019-07-23
# Seed for random initiation of UMAP
cds <- reduce_dimension(cds, reduction_method = "UMAP", preprocess_method = "PCA", init = "random")
plot_cells(cds, color_cells_by = "cell_type", group_label_size = 3.5,
           label_groups_by_cluster = FALSE) +
  scale_color_d3(palette = "category20b")
No trajectory to plot. Has learn_graph() been called yet?

Version Author Date
2e1a104 Lambda Moses 2019-07-23

Monocle 3 does Louvain clustering and partitions the dataset for separate trajectories based on Louvain clustering.

cds <- cluster_cells(cds)
plot_cells(cds, color_cells_by = "partition", group_cells_by = "partition", 
           group_label_size = 4)
No trajectory to plot. Has learn_graph() been called yet?

Version Author Date
2e1a104 Lambda Moses 2019-07-23
plot_cells(cds, color_cells_by = "cluster", group_cells_by = "cluster", 
           group_label_size = 4)
No trajectory to plot. Has learn_graph() been called yet?

Version Author Date
002b84d Lambda Moses 2019-07-23

Trajectory inference

Monocle 3’s trajectory inference is inspired by PAGA.

cds <- learn_graph(cds, verbose = FALSE, 
                   learn_graph_control = list(minimal_branch_len = 7,
                                              geodesic_distance_ratio = 0.5))
Warning in louvain_clustering(data, pd[row.names(data), ], k = k, weight = weight, : RANN counts the point itself, k must be smaller than
the total number of points - 1 (all other points) - 1 (itself)!
plot_cells(cds, color_cells_by = "cell_type", label_groups_by_cluster = FALSE,
           group_label_size = 3.5, graph_label_size = 2) +
  scale_color_d3(palette = "category20b")

Version Author Date
2e1a104 Lambda Moses 2019-07-23

The cells are not ordered on the graph. To compute pseudotime, we should supply a root node or a vector of root cells that we know are the earliest stage of the trajectory. The function order_cells can also open a shiny app to choose root node interactively.

qnscs <- cell_meta$cell[cell_meta$cell_type == "qNSCs"]
cds <- order_cells(cds, root_cells = qnscs)
plot_cells(cds, color_cells_by = "pseudotime", label_branch_points = FALSE, label_leaves = FALSE, label_roots = FALSE)
Cells aren't colored in a way that allows them to be grouped.

Version Author Date
2e1a104 Lambda Moses 2019-07-23

For the trajectory for which pseudotime is computed, Monocle 3 can find genes differentially expressed along the trajectory.

de_res <- graph_test(cds, neighbor_graph = "principal_graph", cores = 3)

These are some of the most significant genes differentially expressed along the trajectory.

genes_plt <- c("Ube2c", "Mrps12", "Hspa12a", "Igfbp3")
plot_cells(cds, genes = genes_plt,
           show_trajectory_graph = FALSE,
           label_cell_groups = FALSE,
           label_leaves = FALSE)

Version Author Date
2e1a104 Lambda Moses 2019-07-23

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] ggsci_2.9                   forcats_0.4.0              
 [3] stringr_1.4.0               dplyr_0.8.3                
 [5] purrr_0.3.2                 readr_1.3.1                
 [7] tidyr_0.8.3                 tibble_2.1.3               
 [9] ggplot2_3.2.1               tidyverse_1.2.1            
[11] BUSpaRse_0.99.18            biomaRt_2.41.7             
[13] monocle3_0.1.2              SingleCellExperiment_1.7.0 
[15] SummarizedExperiment_1.15.5 DelayedArray_0.11.4        
[17] BiocParallel_1.19.0         matrixStats_0.54.0         
[19] GenomicRanges_1.37.14       GenomeInfoDb_1.21.1        
[21] IRanges_2.19.10             S4Vectors_0.23.17          
[23] Biobase_2.45.0              BiocGenerics_0.31.5        

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1         rprojroot_1.3-2         
  [3] XVector_0.25.0           fs_1.3.1                
  [5] proxy_0.4-23             rstudioapi_0.10         
  [7] ggrepel_0.8.1            bit64_0.9-7             
  [9] AnnotationDbi_1.47.0     lubridate_1.7.4         
 [11] xml2_1.2.2               codetools_0.2-16        
 [13] knitr_1.24               zeallot_0.1.0           
 [15] jsonlite_1.6             workflowr_1.4.0         
 [17] Rsamtools_2.1.3          broom_0.5.2             
 [19] dbplyr_1.4.2             uwot_0.1.3              
 [21] compiler_3.6.1           httr_1.4.1              
 [23] backports_1.1.4          assertthat_0.2.1        
 [25] Matrix_1.2-17            lazyeval_0.2.2          
 [27] cli_1.1.0                htmltools_0.3.6         
 [29] prettyunits_1.0.2        tools_3.6.1             
 [31] igraph_1.2.4.1           gtable_0.3.0            
 [33] glue_1.3.1               GenomeInfoDbData_1.2.1  
 [35] RANN_2.6.1               reshape2_1.4.3          
 [37] rappdirs_0.3.1           Rcpp_1.0.2              
 [39] cellranger_1.1.0         vctrs_0.2.0             
 [41] Biostrings_2.53.2        nlme_3.1-141            
 [43] rtracklayer_1.45.2       DelayedMatrixStats_1.7.1
 [45] xfun_0.8                 plyranges_1.5.12        
 [47] rvest_0.3.4              irlba_2.3.3             
 [49] ensembldb_2.9.2          XML_3.98-1.20           
 [51] zlibbioc_1.31.0          scales_1.0.0            
 [53] BSgenome_1.53.0          hms_0.5.0               
 [55] ProtGenerics_1.17.2      AnnotationFilter_1.9.0  
 [57] yaml_2.2.0               curl_4.0                
 [59] memoise_1.1.0            gridExtra_2.3           
 [61] stringi_1.4.3            RSQLite_2.1.2           
 [63] GenomicFeatures_1.37.4   rlang_0.4.0             
 [65] pkgconfig_2.0.2          bitops_1.0-6            
 [67] evaluate_0.14            lattice_0.20-38         
 [69] GenomicAlignments_1.21.4 labeling_0.3            
 [71] bit_1.1-14               tidyselect_0.2.5        
 [73] RcppAnnoy_0.0.12         plyr_1.8.4              
 [75] magrittr_1.5             R6_2.4.0                
 [77] generics_0.0.2           DBI_1.0.0               
 [79] pillar_1.4.2             haven_2.1.1             
 [81] whisker_0.3-2            withr_2.1.2             
 [83] RCurl_1.95-4.12          modelr_0.1.5            
 [85] crayon_1.3.4             BiocFileCache_1.9.1     
 [87] rmarkdown_1.14           viridis_0.5.1           
 [89] progress_1.2.2           grid_3.6.1              
 [91] readxl_1.3.1             data.table_1.12.2       
 [93] blob_1.2.0               git2r_0.26.1            
 [95] digest_0.6.20            openssl_1.4.1           
 [97] RcppParallel_4.4.3       munsell_0.5.0           
 [99] viridisLite_0.3.0        askpass_1.1