P02-18 CBI2025

P02-18

Development of an Integrated Machine Learning Model for the Design and Prediction of PPI Modulators

Tsubasa NAGAE *^{1, 2}, Kohei SODA^{1, 3}, Kazuyoshi IKEDA⁴, Masashi TSUBAKI², Kentaro TOMII^{1, 2, 3}

¹Graduate School of Medical Life Science, Yokohama City University
²Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology
³Graduate School of Frontier Sciences, The University of Tokyo
⁴Center for Computational Science, RIKEN

Background
Protein–protein interactions (PPIs) have emerged as attractive targets in drug discovery, offering opportunities to tackle diseases that were previously considered “undruggable.” PPI modulators, including both inhibitors and stabilizers, often display physicochemical properties distinct from conventional drug-like molecules, presenting both challenges and opportunities for drug development. Existing computational indicators and methods designed for traditional drug targets are often not applicable to PPI modulators. Although PPI-specific indicators have been proposed, they are generally biased toward inhibitors, insufficiently capture stabilizers, and frequently fail to integrate detailed information about the target proteins. Therefore, novel computational strategies are urgently needed to enable the design and prediction of both PPI inhibitors and stabilizers.

Methods
We constructed a new dataset that integrates compound information (SMILES) and amino acid sequences of target protein pairs. Data collection was carried out through database mining and literature review, with entries comprising both inhibitors and stabilizers. Since stabilizer information was underrepresented in existing databases, we supplemented the dataset by extracting and filtering protein–ligand–protein complex information from the Protein Data Bank (PDB). In addition, to compensate for missing structural information, we explored the potential use of structure prediction tools such as AlphaFold. By incorporating predicted structures as auxiliary input, we aimed to capture more refined representations of protein–protein interaction interfaces. For machine learning, we employed representations from compound and protein language models, further adjusted to account for the unique properties of PPI modulators.

Results
The resulting dataset contained over 4,000 triplets, with a slight bias toward inhibitors, which we corrected to prepare a balanced dataset for training. Our machine learning model successfully classified PPI stabilizers and inhibitors, achieving satisfactory prediction performance. Moreover, the integration of structure prediction approaches, including AlphaFold, suggested their potential feasibility in supporting the modeling process, and pointed to future opportunities for further enhancing predictive performance.

Conclusion
This study demonstrates the feasibility of designing and predicting PPI modulators through an integrated machine learning model that incorporates both compound and protein information. By enabling the classification of stabilizers and inhibitors in a unified framework, our approach complements existing methodologies and may contribute to advancing the discovery of PPI-targeting therapeutics. Furthermore, the prospective use of structure prediction tools such as AlphaFold holds promise for expanding and refining future applications.