Table of Contents

Lab Introduction

Tianhe-2 Supercomputer

The group focuses on biomedical high-performance computing studies. Through the development of high-performance computing, big data and artificial intelligence algorithms, it solves major problems in biomedical science, empowers the biomedical industry, develops accurate and ultra-fast disease diagnosis and treatment and new drug research and development algorithms, and builds a large number based on Tianhe No. 2 According to the unified cloud supercomputing platform for analysis and calculation, it provides one-stop service for industrial applications.

With the rapid development of health big data, health medicine has become a hot spot in AI industry applications. Tencent, Alibaba, Huawei and other companies have all laid out relevant industrial development one after another. This year's fight against the epidemic has fully demonstrated the important role of big data analysis in human health. During the epidemic, the laboratory made full use of the super computing power of Tianhe 2 to carry out a series of work such as CT-based intelligent diagnosis, drug intelligent recommendation algorithm, and medical knowledge map, and achieved many important results. The project involves CV, NLP, ML, knowledge map and other fields. It has been widely used in various cutting-edge AI technologies. Students who are interested in any technology can find their own stage here.

At present, the main research contents include drug intelligence design, protein structure and function prediction, knowledge-driven multi-omics big data analysis, and the development of a biomedical high-performance computing platform based on Tianhe 2.

AI Drug Design

Focusing on the whole process of drug design, AI-based full-process algorithms have been developed, including drug screening, molecular optimization, ADMET property prediction, chemical synthesis route prediction, etc. Relevant representative work includes:

1) Drug Virtual Screening

Drug-VQA: Protein-Drug Interaction Prediction

2) Property Prediction & GCN Algorithm

3) Molecular Optimization

Meta-MO: Meta Learning Model for Molecular Optimization

Protein Structure and Function Prediction

Proteins are one of the most important macromolecules in organisms. They participate in almost all life activities. Accurately predicting the three-dimensional spatial structure and folding process of proteins is listed as one of the major scientific difficulties in the 21st century. Since 2013, PI has developed the SPIDER series of protein secondary structure prediction using deep learning technology, which is one of the earliest studies in the world to use deep learning for protein secondary structure prediction. Since then, it has continuously introduced multitasking learning, model iterative training and other strategies, and has made structural prediction from the past. The discrete state of the secondary structure is converted into continuous numerical prediction.

1) Protein function prediction

GraphSite: Binding Site Prediction based on Modelled Structures by Alphafold2

2) Protein structure prediction

3) Protein tertiary structure prediction

EM-SEG: Automatic Protein Structure Constructing Method for Cryo-EM

Multi-omics data analysis

With the diversification and scale of histological data, the state of biological individuals can be comprehensively explained from multiple spatiotemporal scales and different perspectives, so that multiomics data analysis plays an increasingly important role. However, the multi-noise, high-dimensional, and complex relationship between variables of multiomics requires the help of prior knowledge to achieve accurate multi-omics data analysis.

1) Biomedical knowledge graph

PharmKG: Biomedical Knowledge Graph

2) scRNA-seq data analysis

GraphCS: Scalable Single Cell Classification to Huge Datasets

3) Multi-omics analysis

COVID19-net: COVID-19 Diagnosis Method based on CT images