O03-03

A Foundation Model-Based Approach for Reaction Type-Specific Retrosynthesis Prediction

Kyohei MORIMOTO *1, Yoshihiro YAMANISHI1, Shinnosuke TAKADA2

1Department of Complex Systems Science, Graduate School of Informatics, Nagoya University
2Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology


Purpose
Designing efficient synthetic routes for target compounds by means of retrosynthetic analysis [1] is a main task in organic synthesis. Recently, deep learning-based retrosynthesis models have demonstrated high performance [2,3]. However, research in this field is still hampered by the limited availability of curated public reaction data sets. To the best of our knowledge, Open Reaction Database (ORD) [4] is the only freely accessible large-scale resource. In this study, we propose a strategy to maximize the utility of limited reaction data for reaction type-specific retrosynthesis prediction in the framework of foundation model.

Methods
Initially, a foundation model was developed by learning the correlation between graphs and SMILES strings on a large reaction dataset derived from the United States Patent and Trademark Office available through ORD. Subsequently, we constructed four small datasets, each consisting of a single reaction type: (1) Minisci reaction, (2) nickel-catalyzed cross-coupling reaction, (3) Buchwald–Hartwig amination, and (4) Suzuki–Miyaura cross-coupling reaction. We proposed to fine-tune the foundation model using each dataset, thereby obtaining four fine-tuned models for predicting efficient synthetic routes specialized for the associated reaction types.

Results
We evaluated the performance of the proposed method on the synthetic route prediction using the benchmark dataset under the same experimental conditions, and confirmed that the proposed method worked better than the previous methods with the same objective. We then compared the performance between the foundation, fine-tuned, and scratch models on each of the four reaction types. Consequently, the proposed fine-tuned models demonstrated superior performance for all reaction types compared to the other two models. These results suggest that fine-tuning a large and diverse pre-trained foundation model is an effective strategy for developing high performance, reaction type-specific retrosynthesis models.

[1] Corey, E. J. Chem. Soc. Rev. 1988, 17, 111–133.
[2] Tu, Z. & Coley, C. W. J. Chem. Inf. Model. 2022, 62 (15), 3503–3513.
[3] Han, Y. et al. Nat. Commun. 2024, 15 (1), 6404.
[4] Kearnes, S. M. et al. J. Am. Chem. Soc. 2021, 143, 18820–18826.