자료실

AI 활용신약DB 상세

[논문] Deep Geometric Framework to Predict Antibody-Antigen Binding Affinity 2024-12-02
조회수 42

본 DB에서는 해당 논문의 원본데이터 및 이를 활용한 다른 논문의 전처리 데이터의 정보를 제공하고 있습니다.

원저작물에 대한 권리는 해당 연구자 및 기관에 있습니다.


In drug development, the efficacy of an antibody depends on how the antibody interacts with the target antigen. The strength of these interactions gives an indication of how successful an antibody is in neutralizing an antigen. Therefore, the strength, measured by “binding affinity”, is a critical aspect of antibody engineering. In theory, the higher the binding affinity, the higher the chances are that the antibody is successful against the target antigen. Currently, techniques such as molecular docking and molecular dynamics are utilized in quantifying the binding affinity. However, owing to the computational complexity of the aforementioned techniques, running simulations for large antibodies/antigens remains a daunting task. Despite the commendable improvements in deep learning-based binding affinity prediction, such approaches are highly dependent on the quality of the antibody-antigen structures and they tend to overlook the importance of capturing the evolutionary details of proteins upon mutation. Further, most of the existing datasets for the task only include antibody-antigen pairs related to one antigen variant and, thus, are not suitable for developing comprehensive data-driven approaches. To circumvent the said complexities, we first curate the largest and most generalized datasets for antibody-antigen binding affinity prediction, consisting of both protein sequences and structures. Subsequently, we propose a deep geometric neural network comprising a structure-based model and a sequence-based model that considers both atomistic and evolutionary details when predicting the binding affinity. The proposed framework exhibited a 10% improvement in mean absolute error compared to the state-of-the-art models while showing a strong correlation between the predictions and target values. We release the datasets and code publicly (https://drug-discovery-entc.github.io/p2pxml/) to support the development of antibody-antigen binding affinity prediction frameworks for the benefit of science and society.


Dataset 

Raw Mutations Data Type Numerical Datapoints Value AB-Bind 1 101 Available Sequences ∆∆G AB-Cov 1 964 Available Structures IC50, EC50 CATNAP 129 686 Available Names only IC50, IC80 ID50 SAbDab 1 327 Available Structures ∆∆G Affinity SKEMPI 7 086 Available Structures Affinity AlphaSeq 1 259 700 N/A Sequences Kd Table 6. Summary of the used publicly available datasets. Here, ∆∆G, IC50, EC50, IC80, ID50 and Kd refer to the change in the change in Gibbs free energy, half-maximal inhibitory concentration, half-maximal effective concentration, 80% inhibitory concentration, 50% inhibitory dose and protein-protein dissociation constant respectively homology modeling. If either of the two requirements was not satisfied, then we used the AlphaFoldV2 pipeline. Figure 4 shows the processing steps associated with the dataset curation.


Dataset Curation. 

Since publicly available datasets are tabulated in different formats, the first task associated with curating a generalized dataset was to process the datasets to have a similar format. In addition, as the models require the 3D structure of the proteins, suitable measures had to be taken to generate the 3D structures that were unavailable in some public datasets. Homology modeling (29) and AlphaFoldV2 (30) were employed in this regard. The Table 6 presents a summary of publicly available datasets. Generally, experimental uncertainties can produce multiple IC50 values for a given antibody-antigen pair over several trials. To negate the impact of outliers, we first considered the median value of the provided IC50 values for repeated entries of antibody-antigen pairs. Based on the availability of the template structure and the mutation profile, we then decided whether to use Homology Modelling or AlphaFoldV2 to generate the 3D structure. If the mutation profile, along with the template structure, was provided, then we utilized