O04-01

Development of a Predictive Model for Type 2 Diabetes Onset Using Life Log and Health Checkup Data

Shuhei TANAKA *, Taisei TOSAKI, Nanae ARATAKE, Eiichiro UCHINO, Yasushi OKUNO

Graduate School of Medicine, Kyoto University


Background and Objective:
Lifestyle-related diseases are caused by daily habits such as diet and physical activity, which vary widely among individuals. These diseases remain a major cause of mortality in many countries, including Japan. Since they are often asymptomatic in the early stages and can progress without notice, individualized preventive strategies are critical. In recent years, studies have explored the use of machine learning models with annual health checkup data to predict the risk of developing such diseases within a few years. However, relying solely on annual checkups lacks sufficient information on daily behaviors that strongly influence disease onset. This study aims to develop and validate a prediction model for future onset of type 2 diabetes—a representative lifestyle-related disease—by integrating wearable device-derived life log data (e.g., step count, weight, blood pressure, blood glucose) with annual health checkup data.
Methods:
We used anonymized, processed data provided by DeSC Healthcare, Inc., consisting of health insurance claims, annual health checkup records, life log data, and subscriber registries collected from April 2014 to December 2023. From a population of approximately 8.35 million individuals, a 1% sample (n = 83,596) was extracted. We included only participants with available life log data and constructed a binary classification model to predict whether they would develop type 2 diabetes within three years. We used XGBoost algorithm as the classification model.
Results:
The predictive model for 3-year onset of type 2 diabetes achieved an AUROC of 0.9131 on the test dataset. Examination of feature importance using SHAP (Shapley Additive Explanations) revealed that, along with clinically relevant variables such as HbA1c, step count from life log data ranked among the top contributing features.
Discussion and Conclusion:
We successfully developed a high-performance machine learning model for predicting the risk of type 2 diabetes onset by combining annual health checkup data with daily life log data. Step count, together with clinical markers like HbA1c, ranked among the top contributors in the prediction model, indicating that life log data are important for accurately predicting the onset of type 2 diabetes.
In the future, it is desirable to utilize the developed model in conjunction with more granular life log data collected at higher frequency than annual health checkups, to simulate individualized and realistic behavioral interventions.