Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCAN-Based Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), and Random Forest

Abstract

As the risk of diseases diabetes and hypertension increases, machine learning algorithms are being utilized to improve early stage diagnosis. This study proposes a Hybrid Prediction Model (HPM), which can provide early prediction of type 2 diabetes (T2D) and hypertension based on input risk-factors from individuals. The proposed HPM consists of Density-based Spatial Clustering of Applications with Noise (DBSCAN)-based outlier detection to remove the outlier data, Synthetic Minority Over-Sampling Technique (SMOTE) to balance the distribution of class, and Random Forest (RF) to classify the diseases. Three benchmark datasets were utilized to predict the risk of diabetes and hypertension at the initial stage. The result showed that by integrating DBSCAN-based outlier detection, SMOTE, and RF, diabetes and hypertension could be successfully predicted. The proposed HPM provided the best performance result as compared to other models for predicting diabetes as well as hypertension. Furthermore, our study has demonstrated that the proposed HPM can be applied in real cases in the IoT-based Health-care Monitoring System, so that the input risk-factors from end-user android application can be stored and analyzed in a secure remote server. The prediction result from the proposed HPM can be accessed by users through an Android application; thus, it is expected to provide an effective way to find the risk of diabetes and hypertension at the initial stage.

Published in: Applied Sciences
DOI: https://doi.org/10.3390/app8081325