① Tong Xiaolin, personal correspondence.
(3) Integration of disciplines, life science research starts from qualitative description to achieve dynamic, accurate and quantitative interpretation. The complexity of life process is determined by the genetic variation of living system, the dynamic changes of metabolism and regulation in time and space, and the flexibility of living matter. Most of the accumulated knowledge of life science system is the integration of a large number of qualitative fragments. The creation and application of various physical and chemical methods and technical platforms such as super-resolution microscopy, cryo-electron microscopy, flow mass spectrometry, mass spectrometry, magnetic resonance imaging, enhanced Raman spectroscopy, patch clamp, optical tweezers, nanopore sequencing, nano and molecular biosensing, total microanalysis system (µTAS), organ-on-a-chip, and 3D bioprinting It provides increasingly powerful tools for life science research to enable single-cell, visual, high-throughput, spatio-temporal resolution analysis and manipulation. High-resolution brain mapping [26, 27], single-cell transcriptome [28, 29], single-cell proteome [30], embryonic cell lineage [31], protein 3D structure determination in living cells [32], single-particle virus tracing in living cells [33, 34], multi-organ interaction on chip and organoid creation [35, 36] have been obtained. Living systems can be accurately, quantitatively, visually characterized and even successfully simulated at the microscopic level.
(4) Scientific data sharing has become a general rule followed by the life science community. All kinds of life science databases, with gene database and protein structure database as the core, play a great role in modern life science research. Database builders and the scientific community formed a principle: researchers use the database at the same time, the discovery of their own research (gene sequence or protein structure) data stored in the database, thus becoming both users and contributors to the database. Nowadays, the database has become the most reliable record carrier of life history books and a powerful data analysis platform for the entire life science research institute to rely on. For example, since the COVID-19 pandemic, there have been more than 10 million genome sequence data of the novel coronavirus. The data is published in real time through the Global Shared Influenza Data Initiative (GISAID), the National Data Center for Genome Sciences (CNCB), the National Center for Biotechnology Information (NCBI), and the European BioInformatics Institute (EBI), among others. It has provided a basis for the research of pathogen biology and molecular epidemiology, the establishment of detection technology, and the research and development of drugs and vaccines, and played a major role in the global scientific and technological response to the epidemic.
(5) The rise of synthetic biology and artificial intelligence (AI) has provided a new paradigm for life science research. ① The rise of synthetic biology coincides with the 21st century [37, 38], which brings together life science, physics, chemistry, materials science, computer and information science, and combines engineering concepts and automation technology to redesign and synthesize organisms [39]. Its "bottom-up" model, from the characterization of natural biological macromolecules into standardized "components" to the creation of biological components such as "modules" and "circuits" and cell "chassis" to build the intended artificial living system and study the underlying laws of life. This concept has promoted the research strategy of "knowledge from objects" that we are accustomed to to a new height of "knowledge from objects" [40]. However, given the complexity of biological systems, the rational design of biological systems currently relies on high-throughput "trial and error" experiments, which has led to the emergence of "Biofoundary", or automated facilities for biological design and synthesis. Also based on this, another concept of synthetic biology, "creation for use", is giving birth to future biotechnology. Based on big data, algorithms and machine learning, the most typical example of AI applied to life sciences is the prediction of the 3D structure of proteins. For a long time, the progress of protein structure prediction is very slow. For a protein with unknown structure, if there is no structure of its homologous protein, it is necessary to determine its structure information by experiment. After AlphaFold from Google's DeepMind team emerged from the biennial "Critical Testing of Protein Structure Prediction Technology" (CASP), the team shared the AlphaFold2 open source code in Nature in 2021 [41]. At the same time, a team such as the University of Washington in the United States also published a new deep learning tool RoseTTAFold in Science [42]. AlphaFold2 then predicted the 3D structure of 98.5% of human proteins with high accuracy [43]. Further, the DeepMind team announced the AlphaFold protein structure Database, which expanded the structural coverage of the known protein sequence space to an unprecedented extent; The initial version of the database contains more than 360,000 predicted structures across 21 model biological proteomes and will soon be expanded to cover the majority (more than 100 million) representative sequences of the UniRef90 dataset (validated protein sequences) [44]. These advances are disruptive to structural biology technology, reflected in two aspects: (1) protein 3D structure data will grow exponentially, thereby providing a better data basis for machine learning, and will enable the current quality defects of AI structure prediction to be solved one by one; Since protein structure and function are fundamental scientific issues in molecular cell biology, relevant advances will certainly have a profound impact on life sciences.
email:1583694102@qq.com
wang@kongjiangauto.com