Hybrid Machine Learning Methods and Ensemble Voting for Identification of Parkinson's Disease Subtypes
Parkinson's disease (PD) can be further subdivided into subtypes since homogeneous groups are more likely to share genetic and pathological features, enabling earlier disease detection and more …

Parkinson's disease (PD) can be further subdivided into subtypes since homogeneous groups are more likely to share genetic and pathological features, enabling earlier disease detection and more targeted treatment.
Our goal is to identify PD subtypes using advanced hybrid machine learning (ML) methods followed by ensemble voting.
Methods
The timeless dataset consists of 885 studies from longitudinal datasets (years 0, 1, 2, and 4; Parkinson's Progressive Marker Initiative). In this study, the dorsal striatum (DS) in DAT SPECT images was segmented using MRI.
Our standardized SERA software was used to extract the radiomic features of DS. We constructed hybrid ML systems using 16 feature reduction algorithms (FRAs), 14 clustering algorithms (CAS), and 16 classifiers (Cs).
Each trajectory (hybrid system) was evaluated using the C-index method to optimize the number of derived clusters (between 2-10).
We selected optimal subtypes for all trajectory types using the Average of Classifier Performance (AOCP) and Average of Correlation Factor (AOCF).
AOCF assessed the correlation between different clustering methods, while AOCP assessed the accuracy of the final classification.
Finally, we employed ensemble voting to assign patients to different subtypes based on comprehensive voting by other hybrid systems.
To achieve this, first, we applied t-distributed Stochastic Neighbor Embedding (t-SNE) analysis to the list of subtypes derived from the different trajectories, transforming the high-dimensional dataset into two dimensions in such a way that similar data points are modeled by nearby data points and different datapoints by distant data points with high probability.
Then, hierarchical agglomerative clustering was used to identify three distinct subclusters.
Results
Initially, disease subtypes selected via the C-index were inconsistent across hybrid ML methods. Through AOCP and AOCF, we could choose more consistent clusters across different hybrid methods.
The generation of subtypes using only non-imaging clinical information was not reproducible, but using SPECT information enabled the constant generation of subtypes. Overall, we identified three distinct subtypes.
Cluster I patients demonstrated milder scores in all domains, including motor and non-motor symptoms, as well as imaging, compared to other PD sub-clusters, and higher scores compared to healthy controls.
Patients in Cluster II had scores that were generally higher than those in Cluster I. Cluster III exhibited the most severe clinical manifestations and values compared to other sub-clusters.
Therefore, the three subtypes were classified as 1) mild, 2) intermediate, and 3) severe.
Conclusions
An appropriate hybrid ML framework identified three distinct subtypes in PD patients. The clinical information was combined with SPECT images segmented by MRI in the context of ensemble voting from various ML analysis trajectories.
With the help of t-SNE analysis, our ensemble voting framework could enable more comprehensive identification and analysis of disease subtypes.
Research article link: https://www.sciencedirect.com/science/article/pii/S0169260721002066