Interpretable Activity Prediction of SGLT2 Inhibitors using Dynamics- and Electronic-Structure-Augmented Graph Attention Networks
Yusuke TATEISHI *
Graduate School of Science and Technology, Kumamoto University
Introduction: Supplementing ligand’s dynamics (e.g., conformational mobility, vibration) and electronic structure (e.g., charge shifts) to the molecular graph should provide richer chemical representation, enhancing predictive performance and chem-physically insightful structure-activity relationships. Such dynamic/electronic features will be valuable for target proteins whose binding sites are highly flexible. Sodium-glucose cotransporter 2 (SGLT2) is one such flexible membrane protein: its outward-open and occluded states allow ligands to adopt multiple poses [1], making activity prediction challenging. Thus, we developed dynamics/electronic-augmented Graph Attention Networks (GAT) [2] for activity prediction of SGLT2 inhibitors.
Methods: SGLT2 inhibitors dataset with IC50 values was obtained via ChEMBL. Several node/edge features were embedded (e.g., atom type, aromatic bonds, etc.). Additionally, dynamic features of ligands were added, such as Flex-Mean, which represents the mean of displacements per atom on 100 random-sampled conformations relative to the DFT-optimized geometry in the S0 state. Vib-Disp, which represents vibrational displacements extracted from frequency analysis, was also added. These features relate to the entropic properties (ΔS) of ligands, which contribute to the binding free energy (ΔG). Furthermore, electronic features were incorporated, such as charge shifts: ΔQS0→Anion, ΔQCation→S0, and ΔQSolv(water), reflecting nucleophilicity, electrophilicity, and hydrogen bonding ability, respectively. Incorporating these features, we then constructed GAT models for IC50 prediction.
Results and Discussion: The augmented model modestly outperformed the structure-only model (Test R2 of 0.71 vs. 0.68). Cruicially, however, the added features provided the interpretability. Flex-Mean and Vib-Disp (750–1000 cm-1) exhibited the highest saliency scores, suggesting activity reduction due to the high mobility of specific atoms and their entropy loss. ΔQCation→S0 pinpointed several atoms where electrophilicity strengthens/lowers activity, indicating that they make electrostatic docking with specific amino acid residues. In this way, visualization of the saliency score, attention score, and atom (node) contribution enables local insights of which atoms and which types of features influence the activity, and provides feedback for molecular design in terms of ligand dynamics and electronic structure.
Conclusion: This study reports the attempt to integrate ligand dynamics/electronic features into a chemical structures-based graph, possibly advancing molecular activity prediction from both performance and mechanism interpretation.
Acknowledgments: The author is grateful for the valuable comments from and permission to use the computers of Prof. Manabu Sugimoto, Kumamoto University.
[1] M. Hiraizumi, et al., Nat. Struct. Mol. Biol. 31, 159-169 (2024).
[2] P. Veličković, et al., arXiv, 1710.10903v3 (2017)."