P10-02

Boltz2 benchmark on in-house dataset: thinking of how to effectively use in drug discovery campaigns

Akitoshi OKADA *, Kan SHIRAISHI, Ayako MORITOMO

Daiichi Sankyo Co., Ltd.


[Purpose]
The release of Boltz2 (ref1) has gained tremendous attention in the industry. Not only it predicts the protein-ligand complex structure but also the affinity value associated with the predicted model. They claimed it to be "the first AI model to approach the performance of free-energy perturbation (FEP) methods". Therefore, in this poster, we would like to discuss the usage of Boltz2 in real-life drug discovery campaigns by benchmarking its accuracy with the in-house dataset.

[Methods]
Protein-ligand complex structures with the known activity value were extracted from in-house database, and their structures as well as affinity were predicted by Boltz2 on Tokyo-1 environment. The accuracies of affinity values were compared in three settings. 1. Unknown about the number of oligomers (this is just a throw-anything-in-case which we do not really intend to use like this) 2. Knowing the number of oligomers that binds to the ligand and 3. 2 with knowing the binding site (but contained a bug in the program. This will be done if it is debugged by the annual meeting). We have also performed the prediction of decoy dataset where ligands were randomly chosen from the different project. The dataset was prepared and compared in Molecular Operating Environment (ref2) platform.

[Results and Discussion]
We have obtained the R2 value of 0.40 with the test2 scenario. However, there were several concerns looking at the data closely. For example, this R2 value does not come from the x=y line and the confidence score of the complex model did not correlate well with the predicted error. However, very low confidence score can probably be used as a vague cutoff line. Also, the accuracy of the prediction varied amongst the projects where a certain project's R2 was as high as 0.63 whereas some just did not show any correlation at all. Further investigation is in progress, and we would like to discuss them at the poster session.

[Conclusions]
So far, we do not intend to use it as a replacement for FEP, especially for the relative binding free energy method. However, we think "just run it and see" right before the absolute binding free energy (ABFE) method as it is an order of magnitude faster calculation. The scoring function between the docking score (or MM/G(P)BSA) and ABFE was always necessary for the drug discovery campaigns especially when virtual screening the (ultra) large number of compounds. We speculate that the Boltz2 prediction may be useful in this domain. We are excited about the rapid growth of this technology and welcome to discuss the AI usage in the drug discovery at the poster session.

(ref1): Saro Passaro et al., Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction, bioRxiv 2025.06.14.659707; doi: https://doi.org/10.1101/2025.06.14.659707

(ref2): Molecular Operating Environment (MOE), 2024.06; Chemical Computing Group ULC, 1010 Sherbrooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2024