Many different kinds of cells are needed to carry out the functions of complex organisms. Different cell types express different genes; the genes that are active in a muscle cell in the heart, for example, are different from the genes that are expressed in a neuron in the brain. The identity of a cell can thus be ascertained by analyzing the genes that are active at the single-cell level. It's important to be able to identify the types of cells that are present in a tissue and how they behave in order to have a better understanding of health and disease. Methods for classifying cells work well for cells we already know a lot about, but are not great at finding new kinds of cells.
Reporting in Nature Methods, researchers have now created a tool called Single Cell Clustering Assessment Framework (SCCAF) to overcome this problem. Gene expression patterns can be used to cluster cells together as one type. In this work, the researchers created a clustering algorithm. The computational technique can replicate the time-consuming manual work that has typically been used to characterize cell types and identify new ones.
The method groups cells into clusters, and each cluster is then divided into a training and testing set. A model uses the training set to classify clusters of cells and predict what clusters will probably be found in the testing set.
"The model repeats the training and testing steps for each cell cluster, gradually merging indistinguishable clusters, until its accuracy reaches a good enough level. Finally, our Single Cell Clustering Assessment Framework lists a set of feature genes to characterize each annotated cluster," explained the first author of the study Dr. Zhichao Miao of the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) and the Wellcome Sanger Institute.
The researchers validated the process, which they said is quick.
"We've tested the method on many existing large-scale datasets of human and mouse gene expression, treating human annotation as a gold standard. Our method can reproduce human annotation in an automated manner. By minimizing human involvement in data processing, we solve the most important bottleneck in high-throughput projects, such as the Human Cell Atlas," said the senior study author Dr. Alvis Brazma, a Functional Genomics Senior Team Leader at EMBL-EBI.
"The Human Cell Atlas initiative is a global consortium to map every cell type in the human body, to understand health and disease. The new automated cell-clustering method will enable us to identify cell types much more easily than before, helping us expand our understanding of cellular function and diversity," added Dr. Sarah Teichmann, a senior author from the Wellcome Sanger Institute, and co-chair of the Human Cell Atlas Organizing Committee.