조회수 39
자료실
AI 활용신약DB 상세
조회수 39
본 DB에서는 해당 논문의 원본데이터 및 이를 활용한 다른 논문의 전처리 데이터의 정보를 제공하고 있습니다.
원저작물에 대한 권리는 해당 연구자 및 기관에 있습니다.
Recent advances in artificial intelligence (AI) have led to the development of transformer-based models that have shown success in identifying potential drug molecules for therapeutic purposes. However, for a molecule to be considered a viable drug candidate, it must exhibit certain desirable properties such as low toxicity, high druggability, and synthesizability. To address this, we propose an approach that incorporates prior knowledge about these properties during the model training process. In this study, we utilized the PubChem database, which contains 100 million molecules, to filter drug-like molecules based on the quantity of drug-likeliness (QED) score and the Pfizer rule. We then used this filtered dataset of drug-like molecules to train both molecular representation (ChemBERTa) and molecular generation models (MolGPT). To assess the performance of the molecular representation model, we fine-tuned the results on the MoleculeNet benchmark datasets. Meanwhile, we evaluated the performance of the molecular generation model based on the generated samples comprising 10,000 molecules. Despite the limited diversity of the pretraining dataset, the models for molecular representation were able to retain at least 90% of their original performance on benchmark datasets, with an additional improvement of 6% in predicting clinical toxicology. In the domain of molecular generation, the model pre-trained on drug-like molecules exhibited a high rate of desirable molecule properties in the unconditionally generated outputs. Additionally, the diversity of generated structures demonstrated notable performance compared to the conditional generation approach. Moreover, the drug-like molecule pre-training strategy is not limited to a specific model or training method, making it a flexible approach that can be easily modified based on the research interests and criteria of interest.