1 Life Science century development Trend Overview: 3 "50 years"
In the history of life science over the past 150 years, landmark breakthroughs have occurred about once every half century (Figure 1) and formed three stages of development. In the mid-19th century, Mendel discovered the basic laws of genetics through the experiment of planting peas and proposed the hypothesis of genetic factors (1865). Later, Morgan located genetic factors in chromosomes through the study of fruit flies (1910), and they jointly laid the foundation for classical genetics, which is the first stage. In the middle of the 20th century, Watson and Crick discovered the double helix structure of DNA (1953), ushering in the era of molecular genetics and molecular biology, the second phase. During this time, he discovered the genetic code of life and the central law of DNA-RNA-protein life science, and gave birth to genetic biotechnology for the benefit of mankind. At the turn of the 20th century and the 21st century, with the implementation and completion of the "Human Genome Project", life science entered the era of omics and systems biology, opening the third stage. A large number of complex life processes and disease mechanisms have been clarified, genome sequencing, synthesis and editing, as well as the combination with artificial intelligence, is writing a new chapter of the genome "read - edit - write", scientists began to write synthetic life and accurately regulate life processes.
The above three stages are also interlinked three rounds of life science revolution, which is marked by profound changes in the research paradigm of life science, from the observation and description of biological epigenetic traits and inheritance to the molecular biological characterization and correlation of life processes, and then to the application of systems biology characterized by omics, which has widely influenced all fields of life science research. It has led the comprehensive progress of medicine, agricultural biology and other fields and their technologies, and has greatly contributed to human health and economic and social development.
2 Characteristics of contemporary life sciences
Driven by the new scientific and technological revolution, life science presents the following five characteristics.
(1) Original discoveries emerge in an endless stream, while underlying innovations erupt. Molecular cell biology has penetrated into the entire discipline system of life science, and has become the foundation and pillar of all basic and applied disciplines of life science, promoting endless original discoveries and underlying innovations. For example, powerful gene editing techniques [1, 2] stem from the discovery of CRISPR, an adaptive immune molecular mechanism by which microbes and archaea evolved to deal with viral (phage) infections [3, 4]; The polymerase chain reaction (PCR) technology that changed the face of molecular biology was derived from the discovery of heat-resistant DNA polymerase in extremophile cells [5], the discovery of immune checkpoints (CTLA-4 and PD-1) [6, 7], and cellular immunology research, which led to the rapid development of tumor targeted immunotherapy [8-10]. Is upending the traditional cancer treatment model; The discovery of RNA interference (RNAi) mechanism [11] initiated the treatment route of gene silencing in genetic diseases. apoptosis [12], pyroptosis [13], programmed necroptosis [14], authophagy [15], ferroptosis [16], and cell component transformation (tra) nsition [17] and other phenomena were discovered, which described various ingenious self-regulation mechanisms of cells in physiological and pathological processes, and derived new strategies for the treatment of major diseases.
(2) Both system theory and reductionism reveal complex life processes layer by layer. Molecular biology has successfully annotated a large number of functional genes, and linked many life processes and disease pathogenesis with related functional genes and transcription and expression products. If we call this "reductionism", bioomics composed of genomics, transcriptomics, proteomics and metabolomics is a systematic understanding of complex life network systems. The combination of "bottom-up" and "top-down" has greatly improved the opportunities for discovery in life science, and spawned one new research direction and frontier hot spots. For example, integrating genomics, the basis of disease molecular biology and clinical data, combining precision medicine with personalized therapy [18-20]; Human microbiome and metabolome have been found to be closely related to health and many diseases [21, 22]. "Intestinal microbiome and metabolome also provide a new perspective to explain the principles of traditional Chinese medicine" (1). Through genomics and transcriptome studies and gene function annotation, it is found that only 2% of the genome codes for proteins, and the remaining 98% whose function is unknown is likened to the "dark matter" in the genome [23]. Among them, a large number of non-coding Rnas were found to play a key role in the spatiotemporal regulation of cellular networks [24], which "opens up a whole new field of biology... It has unlimited potential in future follow-up studies "[25].
① Tong Xiaolin, personal correspondence.
(3) Integration of disciplines, life science research starts from qualitative description to achieve dynamic, accurate and quantitative interpretation. The complexity of life process is determined by the genetic variation of living system, the dynamic changes of metabolism and regulation in time and space, and the flexibility of living matter. Most of the accumulated knowledge of life science system is the integration of a large number of qualitative fragments. The creation and application of various physical and chemical methods and technical platforms such as super-resolution microscopy, cryo-electron microscopy, flow mass spectrometry, mass spectrometry, magnetic resonance imaging, enhanced Raman spectroscopy, patch clamp, optical tweezers, nanopore sequencing, nano and molecular biosensing, total microanalysis system (µTAS), organ-on-a-chip, and 3D bioprinting It provides increasingly powerful tools for life science research to enable single-cell, visual, high-throughput, spatio-temporal resolution analysis and manipulation. High-resolution brain mapping [26, 27], single-cell transcriptome [28, 29], single-cell proteome [30], embryonic cell lineage [31], protein 3D structure determination in living cells [32], single-particle virus tracing in living cells [33, 34], multi-organ interaction on chip and organoid creation [35, 36] have been obtained. Living systems can be accurately, quantitatively, visually characterized and even successfully simulated at the microscopic level.
(4) Scientific data sharing has become a general rule followed by the life science community. All kinds of life science databases, with gene database and protein structure database as the core, play a great role in modern life science research. Database builders and the scientific community formed a principle: researchers use the database at the same time, the discovery of their own research (gene sequence or protein structure) data stored in the database, thus becoming both users and contributors to the database. Nowadays, the database has become the most reliable record carrier of life history books and a powerful data analysis platform for the entire life science research institute to rely on. For example, since the COVID-19 pandemic, there have been more than 10 million genome sequence data of the novel coronavirus. The data is published in real time through the Global Shared Influenza Data Initiative (GISAID), the National Data Center for Genome Sciences (CNCB), the National Center for Biotechnology Information (NCBI), and the European BioInformatics Institute (EBI), among others. It has provided a basis for the research of pathogen biology and molecular epidemiology, the establishment of detection technology, and the research and development of drugs and vaccines, and played a major role in the global scientific and technological response to the epidemic.
(5) The rise of synthetic biology and artificial intelligence (AI) has provided a new paradigm for life science research. ① The rise of synthetic biology coincides with the 21st century [37, 38], which brings together life science, physics, chemistry, materials science, computer and information science, and combines engineering concepts and automation technology to redesign and synthesize organisms [39]. Its "bottom-up" model, from the characterization of natural biological macromolecules into standardized "components" to the creation of biological components such as "modules" and "circuits" and cell "chassis" to build the intended artificial living system and study the underlying laws of life. This concept has promoted the research strategy of "knowledge from objects" that we are accustomed to to a new height of "knowledge from objects" [40]. However, given the complexity of biological systems, the rational design of biological systems currently relies on high-throughput "trial and error" experiments, which has led to the emergence of "Biofoundary", or automated facilities for biological design and synthesis. Also based on this, another concept of synthetic biology, "creation for use", is giving birth to future biotechnology. Based on big data, algorithms and machine learning, the most typical example of AI applied to life sciences is the prediction of the 3D structure of proteins. For a long time, the progress of protein structure prediction is very slow. For a protein with unknown structure, if there is no structure of its homologous protein, it is necessary to determine its structure information by experiment. After AlphaFold from Google's DeepMind team emerged from the biennial "Critical Testing of Protein Structure Prediction Technology" (CASP), the team shared the AlphaFold2 open source code in Nature in 2021 [41]. At the same time, a team such as the University of Washington in the United States also published a new deep learning tool RoseTTAFold in Science [42]. AlphaFold2 then predicted the 3D structure of 98.5% of human proteins with high accuracy [43]. Further, the DeepMind team announced the AlphaFold protein structure Database, which expanded the structural coverage of the known protein sequence space to an unprecedented extent; The initial version of the database contains more than 360,000 predicted structures across 21 model biological proteomes and will soon be expanded to cover the majority (more than 100 million) representative sequences of the UniRef90 dataset (validated protein sequences) [44]. These advances are disruptive to structural biology technology, reflected in two aspects: (1) protein 3D structure data will grow exponentially, thereby providing a better data basis for machine learning, and will enable the current quality defects of AI structure prediction to be solved one by one; Since protein structure and function are fundamental scientific issues in molecular cell biology, relevant advances will certainly have a profound impact on life sciences.
3 High-impact research contributions to life sciences in China
Analysis of high-impact papers in the past 10 years
By using the data of scientific literature database to build a visual map, the contribution and development level of a country's scientific research can be analyzed macroscopically. For a rapidly developing science and technology country, it is obviously biased to evaluate the total number of papers and the number of citations per article, and the total number of citations is relatively reasonable [45]. However, with China now employing more than 5 million full-time people per year in research and experimental development (R&D), far more than in the United States and Europe, it makes more sense to focus on analyzing high-impact research activities. High impact papers include highly cited papers and hot papers. In this paper, Clarivate Analytics' InCite research evaluation and analysis platform is used to analyze only highly cited papers within the top 1% of citations in the same field (hereinafter referred to as "top 1% papers"). While there may be a few exceptions (i.e. the academic impact is not really high), and scientific papers do not represent the full spectrum of scientific and technological strength, the top 1% of papers generally reflect outstanding research with substantial contributions from all parties. Figure 2 shows the composition of the top 15 countries (hereinafter referred to as the "Top 15") in the top 1% of global life science papers, showing the distribution, relative rank and changes of high-impact research output in major scientific and technological powers in the past 10 years. After 20 years of "tracking development" with a very thin foundation, China's life science research began to enter the field of vision in the first decade of the 21st century, and then showed a sustained and strong growth momentum. In the 10-year span of 2012, 2016 and 2021, the top 1% papers published by Chinese scholars accounted for 5.5%, 8.1% and 14.1% of the total number of the top 15 in biological sciences, respectively; medical sciences, 3.1%, 5.2% and 8.0%; For agriculture sciences, 9.4%, 15.2% and 24.8% (Figure 3). This analysis is mainly based on the major disciplinary directions of each field, some sub-disciplines are not fully collected, or there is crossover in the three fields, but it does not affect the overall trend shown in the statistical results. In addition, the contributions of Chinese scholars in Hong Kong, Macao and Taiwan have not been counted.
email:1583694102@qq.com
wang@kongjiangauto.com