Project ID: P202205170005
Hybrid Machine Learning Systems for Prediction of Parkinson’s Disease Pathogenic Variants using Clinical Information and Radiomics Feature
Deep Learning, Machine Learning, Image Processing
People with the below expertise are able to apply for this project: 1-The individual with enough experience in medical image processing techniques 2-The individual with enough experience in 3-D Deep learning methods 3- The individual with enough experience in traditional and deep learning fusion techniques. Both programming languages Matlab and Python are acceptable, but Python works better for some people who aim at working on google Colab.
Objectives: Parkinson’s disease (PD) is a complex neurodegenerative disorder characterized by motor and non-motor symptoms. 5–10% of cases are of genetic origin with mutations identified in several genes such as leucine-rich repeat kinase 2 (LRRK2), glucocerebrosidase (GBA), etc. Genetic mutations have incomplete penetrance. We aim to predict presence of pathogenic variants in LRRK2 and GBA genes using hybrid machine learning systems (HMLS), using imaging and non-imaging data, with the long-term goal to predict conversion to active disease. Radiomics features (RF) have the potential to encode deep information within imaging data, which we employ in addition to clinical features (CF) and conventional imaging features (CIF). While imaging studies have identified abnormalities associated with mutation status in several systems, to our knowledge, no study has focused on prediction of LRRK2 and GBA mutations and subsequently conversion to disease via radiomics features and hybrid systems. Methods: 264 (129) patients with known LRRK2 (GBA) mutations status extracted from the Parkinson's Progression Markers Initiative database were selected; 120 (59) patients suffered from pathogenic variants, while 144 (70) patients were mutation negative. Identification of pathogenic variants in LRRK2 and GBA genes were considered as outcomes. We generated 2 datasets with 514 features, including CFs, CIFs and RFs from SPECT image segmented regions of interest (left and right striatum) using the standardized SERA software. The datasets were normalized by z-score technique, and features were univariately analyzed for statistical significance by the t-test, adjusted by Benjamini-Hochberg. A range of optimal algorithms was pre-selected amongst various families of learner algorithms. First, we directly applied 21 classifiers to predict the mutation status. Subsequently, we employed HMLSs, including 11 feature extraction (FEA) and 10 features selection algorithms (FSA) linked with 21 classifiers optimized by 5-fold cross-validation and grid search, to enhance prediction performances. 80% of patient data were used for HNLSs to select the best model based on maximum performance resulting from 5-fold cross-validation. The remaining 20% was used for external testing of the selected model. Results: Univariate analysis (adjusted for false discovery rate), indicated intensity histogram (IH), neighborhood grey tone difference (NGT), and neighboring grey level dependence (NGL) features in the right striatum as significantly predictive of LRRK2 mutation status. Moreover, 17 CFs and 12 CIFs were significantly predictive. In the dataset with GBA, all features of shape, IH, and NGT&NGL in the right striatum and size zone and distance zone matrix in the left striatum were significantly predictive of GBA gene status. Furthermore, 12 and 1 features in CFs and CIFs were significantly predictive, respectively. To predict LRRK2 mutations, HMLSs including Extra Tree, Gradient Boosting, Random Forests, and Ensemble Voting algorithms followed by Correlation-based Feature Selection, Diffusion Map Algorithm etc. resulted in high accuracy of 0.98±0.02 and 1.00 in 5-fold cross-validation and external testing. Further, other HMLSs resulted in appropriate performance. For GBA gene status, the multiple HMLSs including Bagging Classifier, Decision tree, K-Nearest Neighbor, Random Forests, and Ensemble Voting linked with FSAs and FEAs significantly resulted in high accuracy of 0.90±0.08 and 0.96 in two tests. Moreover, other HMLSs achieved high performances. Conclusion: We demonstrated that combining information with SPECT-based radiomics features with conventional features, and optimal utilization of HMLSs, produces a good prediction of the mutation status in PD patients. Ability to predict conversion to active disease is a next stage of research.