P07-01

Predicting the biological pathways activated by cigarette or heated tobacco product use: a proof-of-concept study

Taro OSHIRO *, Hiromi OHARA, Shigeaki ITO, Hiroaki SUZUKI

Scientific Product Assessment Center, Japan Tobacco Inc.


【Objective】
To develop a method to predict the biological pathways activated following cigarette or heated tobacco product use.

【Methods】
Non-targeted metabolomics dataset was obtained from urine samples collected in an observational study involving NonSmokers (NS), Conventional cigarette Smokers (CS), and Heated Tobacco Product users (HTP), 50 participants per group. An endpoint-centered model was constructed to classify smoking statuses based on metabolites that correlated with Biomarkers of Potential Harm (BoPH) levels, indicators for smoking-related biological perturbations.
The metabolites in the highest-accuracy model were categorized according to the ontology annotated by Human Metabolome Technologies Inc. Subsequently, models for the same classification task were constructed using these ontology-based-metabolite-sets (i.e., the ontology model). The cumulative effect size of selected metabolites for each predicted class was defined as the Biological Probability Score (BPS). BPS was calculated for each individual by summing each Shapley value for classification, weighted by the predicted probabilities of the classes. The calculation was applied to the test dataset, which was separated during the model training process. BPS for true class prediction in each model was incorporated into a Bayesian network analysis to estimate the conditional probabilities for biological pathways across different smoking statuses.

【Results and discussion】 Correlation analysis between BoPH and each metabolite was performed for endpoint-centered feature selection. As a result, 65 metabolites positively correlated with leukocyte counts were used to construct an endpoint-centered model. Metabolite-set based feature reselection from the features used in the endpoint-centered model enables to construct the 132 ontology models. In the 132 ontology models, accuracy in 15 models was improved compared to the endpoint-centered model.
BPS calculated from the 15 ontology models with improved accuracy were subjected to structural learning, resulting in a sparse network. Conditional probabilities of “presence” of activation of each biological pathway was estimated for each smoking status in modeled Bayesian network. As a result, HTP and NS had the lowest conditional probability of “presence” of the activation compared to CS.
These findings suggest the possibility that BPS could be utilized as a training dataset for computational models to predict the presence and/or probability of biological pathway activation which may represent the conditions of individuals with different smoking statuses. By using more knowledge-rich omics data such as transcriptomics, our approach would be invaluable for consolidating complex and large-scale datasets into interpretable biological pathway perturbations by exposure to xenobiotics.