Hybrid Machine Learning Systems for Prediction of Parkinson’s Disease Pathogenic Variants using Clinical Information and Radiomics Features
Parkinson's disease (PD) is a complex neurodegenerative disorder that causes motor and non-motor symptoms.
Five to ten percent of cases are genetic, with mutations …

Parkinson's disease (PD) is a complex neurodegenerative disorder that causes motor and non-motor symptoms.
Five to ten percent of cases are genetic, with mutations identified in several genes, including leucine-rich repeat kinase 2 (LRRK2), glucocerebrosidase (GBA), etc. The penetrance of genetic mutations is incomplete.
Using hybrid machine learning systems (HMLS), we will identify pathogenic variants in LRRK2 and GBA genes, using imaging and non-imaging data, in the long run aiming to identify conversion to active disease.
Radiomics features (RF) can encode in-depth information in imaging data, which we use in addition to clinical characteristics (CF) and conventional imaging features (CIF).
While imaging studies have identified abnormalities associated with mutation status in several systems, no study has utilized radionics features and hybrid approaches to predict LRRK2 and GBA mutations and their subsequent conversion to disease.
Methods
Data from the Parkinson's Progression Markers Initiative database was used to select 264 (129) patients with known LRRK2 (GBA) mutation status. 120 (59) presented pathogenic variants, while 144 (70) presented without mutations.
We considered the identification of pathogenic variants in the LRRK2 and GBA genes as outcomes.
The SERA software created two datasets with 514 features, including CFs, CIFs, and RFs derived from segmented SPECT images (left and right striatum).
A z-score technique was used to normalize the datasets, and the t-test, adjusted by Benjamini-Hochberg, was used to test features for statistical significance univariately.
Various families of learner algorithms were preselected for optimal algorithms. As a first step, 21 classifiers were directly applied to predict the mutation status.
Our next step was to use HMLSs, including 11 feature extraction algorithms (FEA), ten feature selection algorithms (FSA), and 21 classifiers optimized by 5-fold cross-validation and grid search to enhance prediction performance.
A 5-fold cross-validation method was used to select the best model for HNLS based on 80% of patient data. A portion of the remaining 20% was used to test the model chosen externally.
Results
The intensity histogram (IH), neighborhood gray-tone difference (NGT), and neighboring gray level dependence (NGL) features in the right striatum were significantly associated with LRRK2 mutation status based on univariate analysis (adjusted for false discovery rate).
Furthermore, 17 CFs and 12 CIFs were very predictive. Shape, IH, and NGT&NGL in the right striatum and size zone and distance zone matrix in the left striatum significantly predicted GBA gene status in the dataset with GBA.
In addition, 12 and 1 features in CFs and CIFs were very predictive, respectively. To predict LRRK2 mutations, HMLS, including Extra Tree, Gradient Boosting, Random Forests, and Ensemble Voting algorithms, are used, followed by Correlation-based Feature Selection, Diffusion Map Algorithm, etc.
The results were 0.98±0.02 and 1.00 in 5-fold cross-validation and external testing. The performance of other HMLSs was also appropriate.
Multiple HMLSs linked to FSAs and FEAs for GBA gene status, including Bagging Classifier, Decision Tree, K-Nearest Neighbor, Random Forest, and Ensemble Voting, significantly improved accuracy to 0.90±0.08 and 0.96 in two tests. Additionally, other HMLSs performed well.
Conclusions
Combining information from SPECT-based radionics with conventional features and optimally using HMLSs to predict mutation status improved the prediction of PD mutation status.
Research is underway to determine whether predicting the conversion to active disease is possible.
Source: https://jnm.snmjournals.org/content/63/supplement_2/2508