(Correspondent Mingyue Cheng) April 26, 2022, Department of Systems Biology and Bioinformatics, College of Life Science and Technology, Huazhong University of Science and Technology, Professor Kang Ning’s team. The paper entitled “Ontology-Aware Deep Learning Enables Ultrafast and Interpretable Source Tracking among Sub-Million Microbial Community Samples from Hundreds of Niches” was published at the international journal Genome Medicine, Huazhong University of Science and Technology was the first institution. In this paper, they proposed a deep learning method based on biome ontology, which solves the problem of rapid and accurate microbial source tracking among sub-million microbial community samples from hundreds of niches.
The taxonomic structure of microbial community sample is highly habitat-specific, making source tracking possible, allowing identification of the niches where samples originate. Microbial source tracking is an important problem in microbiome research, which plays an important role in human health and environmental monitoring. However, current methods face challenges when source tracking is scaled up. In this case, existing methods have a tradeoff between accuracy and efficiency, which makes it particularly difficult for knowledge discovery in large-scale microbial source tracking. Therefore, it is urgent to develop more effective microbial source tracking methods.
The rapid accumulation of microbial community samples has provided the opportunity to investigate the interactions among microbes, human health and environment. While integrative, large-scale and scalable investigations have been understudied. Such investigation is challenging for reasons: firstly, as the number of samples easily exceeds millions, while the number of niches exceeds hundreds, the microbial source tracking has already become a very complex task. Secondly, the noises that existed in the rich-sourced data might hire important patterns invisible for traditional methods. Coupled with the fact that many biomes are dependent with each other, previous models would be theoretically inapplicable.
In this work, the authors developed an Ontology-aware Neural Network (ONN) deep learning computational model for microbial source tracking, namely ONN4MST. The ONN model can utilize the biome ontology information to model the dependencies between biomes, and estimate the proportion of various biomes in a community sample. ONN4MST uses a large amount of data (125,823 samples from 114 biomes) to train the model, which allows it to be applicable for source tracking samples from many biomes. ONN4MST has provided an ultrafast and accurate solution for searching a sample against dataset containing hundreds of potential biomes and millions of samples, and also outperformed state-of-the-art methods in scalability and stability. The ability of ONN4MST on knowledge discovery is also demonstrated in various source tracking applications, including detection of microbial contaminants and investigation of taxonomic structure of community samples from less-studied biomes.
Doctoral student Yuguo Zha and undergraduate student Hui Chong, both from College of Life Science and Technology, Huazhong University of Science and Technology, are the co-first authors of this paper. Professor Kang Ning from Huazhong University of Science and Technology and Professor Xuefeng Cui from Shandong University are the co-corresponding authors of this paper. This work was supported by the National Natural Science Foundation of China (Grant Nos. 32071465,31871334, 31671374, 81774008, 81573702, and 62072283) and the National Key R&D Program of China (Grant Nos. 2021YFA0910500, 2018YFC0910502).
This work is an important research achievement of Professor Kang Ning’s team in the field of microbiome big data and artificial intelligence. Work in this series also includes: “Microbial Dark Matter: From Discovery to Applications”, published on April 2022 at Genomics Proteomics & Bioinformatics, which proposed the concept of “Microbiome Dark Matter”. This work indicated that microbiome studies have revealed a large number of new genes, species, and community spatial-temporal dynamics, which constitute the dark matter of the microbiome. The paper points out that understanding dark matter in the microbiome is not only a challenge, but also an opportunity for computational microbiologists to explore large data sets with the aim of better understanding the microbiome and finding better solutions to current global concerns for human health and the environment. In addition, another paper entitled “Ontology-aware neural network: a general framework for pattern mining from microbiome data” was published on Jan 2022 at Briefings in Bioinformatics. This paper reviews the advantages of Ontology-aware Neural Network (ONN) as a general framework in microbiome dark matter data mining, and lays a foundation for the next generation of general microbiome big data mining methods.
Professor Ning has been engaged in the research of microbiome big data and artificial intelligence for tens of years, and has published many papers in high-level academic journals such as PNAS, Gut, Genome Biology, Genome Medicine, and Nucleic Acids Research. He is on the editorial board of Microbiology Spectrum, Genomics Proteomics & Bioinformatics, Scientific Reports and other international journals. He is the deputy director of Genome Informatics Branch of Chinese Society of Bioinformatics (preparatory), a member of Computational Biology and Bioinformatics Committee of Chinese Society of Biotechnology, and a member of Bioinformatics Committee of China Computer Federation. Welcome graduate students and postdoctoral to join, team website: http://www.microbioinformatics.org/.