文章亮点:
按level来管理和分析数据,文章有不同stage,每个stage有不同subtype,这应该是一个真tree,而不只是一个进化树,文章里出现最多的进化树把所有的stage都整合了。
空间归类和taxonomy的工作,mapped cell types spatially and derived a hierarchical, data-driven taxonomy
神经元按解剖单元和递质类型归类;Neurons were the most diverse and were grouped by developmental anatomical units and by the expression of neurotransmitters and neuropeptides.
背景常识:
Neuronal diversity was driven by genes encoding cell identity, synaptic connectivity, neurotransmission, and membrane conductance.
先去看下科普视频,不然都不认识文章里面的各种名词。
这篇文章的侧重点在于CNS和发育,所以花了很大篇幅来用Dendrogram阐述Taxonomy。但我觉得这个系统树其实是失败的,太着重于subtype,而忽略了stage。其实也是算法的局限。
这种大数据的管理是真的难,管理架构不行,后面分析就举步维艰。本文的数据管理策略值得学习。
Molecular Architecture of the Mouse Nervous System
- Sten Linnarsson
Mouse brain atlas
- 对应的数据库
1. 测了什么细胞?测了多少个细胞?
很明确,这篇文章关注的重点是Mouse Nervous System老鼠神经系统;那么是哪个阶段的呢?In summary, male and female mice were postnatal ages P12-30, as well as 6 and 8 weeks old. 明显得老鼠成熟了才有CNS和PNS,文章称是刚刚成型的老鼠神经系统adolescent mouse nervous system。测了50万个细胞,
最新的4个胚胎发育天E9.5-E13.5测了200万个
。
We performed a comprehensive survey of the adolescent mouse nervous system by scRNA-seq. We dissected the brain and spinal cord into contiguous anatomical regions and further included the peripheral sensory, enteric, and sympathetic nervous system. In total, we analyzed 19 regions (Figure 1A) but omitted at least the retina, the olfactory epithelium, the vomeronasal organ, the inner ear, and the parasympathetic ganglia.
这里不是全部混合测序,建库之前就按解剖学分区了,分成了19个区,但是也忽略了一些特定的区域。
总体而言,测了CNS、PNS和ENS。
这次测序用的是商业的droplet microfluidics (10X Genomics Chromium),估计花了不少钱,后面就自己开发了 SPLiT-seq。
A Molecular Survey of the Mouse Nervous System
In addition, the dataset was affected by a number of technical artifacts, including low-quality cells, batch effects, sex-specific gene expression, neuronal-activity-dependent gene expression, and more.
To overcome these challenges, we developed a multistage analysis pipeline called “cytograph,” which progressively discovers cell types or states while mitigating the impact of technical artifacts
cytograph,我很感兴趣,整合的算法,但是本文没怎么提及,因为还没有发表。在这个R统一了生物信息的时代,他们组还是坚持用python。
这个课题组还是要用loom,确实很不方便,不仅跨平台,而且要学习一个新的工具。
更新:查了下,loom还是有好处的,
它是一个数据库数据结构,适用于超大数据,数据只存储于磁盘上,内存放不下,需要时才会调用,而且是分chunk
。
Loom files are stored on disk and are never loaded entirely. They are more like databases: you connect, retrieve some subset of the data, maybe update some attributes.
loom更像是纯粹的最终读取文件,edit loom文件非常不方便。
一个现有的矛盾就是,我们的所有数据分析都是在R里用R包跑出来的,loom是用Python开发的,对R的支持非常不好。想把loom里面的数据都导出来真的很费力。
以下代码可以在普通电脑上操作,不用担心内存不足。
import loompy
import pandas as pd
ds = loompy.connect("l5_all.loom")
# ds2 <- ds[:, ds.ca.Tissue == "ENS"]
# ds.ca.keys()
# ds.ra['ClusterName', 'Clusters', "CellID"]
data = {"CellID":ds.ca.CellID, "ClusterName":ds.ca.ClusterName, "Clusters":ds.ca.Clusters, "Tissue":ds.ca.Tissue}
df = pd.DataFrame(data)
df.to_csv("cellID.clusterName.csv")
Postnatal Neurogenesis in the Central Nervous System
Astroependymal Cells Are Diverse and Spatially Patterned
Loss of Patterning in the Oligodendrocyte Lineage and Convergence to a Single Brain-wide Intermediate State
Vascular Cells and a Family of Broadly Distributed Mesothelial Fibroblasts
Neural-Crest-Derived Glia and Oligodendrocyte Progenitors
Peripheral Nervous System
Central Nervous Systems Neurons
Spatial Distributions Reflect Molecular Diversity
Drivers of Neuronal and Glial Diversity
里面的postonal embryogenesis的数据对我们很有用!
读取和操作loom数据
R包 loomR 不好用,标准不够统一。
下载我感兴趣的ENS的文件:
http://mousebrain.org/tissues.html
细胞的注释信息则从总的loom文件中提取(如上)。