Project ID: P202103100011
Hybrid Machine Learning Methods and Ensemble Voting for Identification of Parkinson’s Disease Subtypes
machine learning, image processing,
People with the below expertise are able to apply for this project: The individuals with enough experience in machine learning algorithms including dimension reduction algorithms, classifiers and clustering algorithms. The individual with enough experience in extracting radiomics features from each region of interest (via SERA package). The individual with enough experience in segmenting different regions via Free Surfer. Both programming languages Matlab and Python are acceptable, but Python works better for some people who aim at working on google Colab.
Objectives: It is important to subdivide Parkinson’s disease (PD) into specific subtypes, since homogeneous groups of patients are more likely to share genetic and pathological features, enabling potentially earlier disease recognition and more tailored treatment strategies. We aim to identify PD subtypes by using advanced hybrid machine learning (ML) methods followed by ensemble voting. Methods: A timeless dataset consisting of 885 studies was derived from longitudinal datasets (years 0, 1, 2 and 4; Parkinson’s Progressive Marker Initiative). Segmentation of dorsal striatum (DS) on DAT SPECT images was performed via MRI. Radiomic features of DS were extracted using our standardized SERA software. Hybrid ML systems were constructed invoking: 16 feature reduction algorithms (FRAs), 14 clustering algorithms (CAs) and 16 classifiers (Cs). The C-index evaluation method was initially used on each trajectory (hybrid system) to optimize number of derived clusters (from range of 2-10 clusters). We then selected optimal number of subtypes, for all trajectories, through both Average of Classifier Performance (AOCP) and Average of Correlation Factor (AOCF); AOCF assessed how well results from different clustering methods correlate with one another, while AOCP assesses accuracies in ultimate classification. Finally, employing ensemble voting enabled us to assign patients to different subtypes based on comprehensive voting by different hybrid systems. To do this, first we applied t-distributed Stochastic Neighbor Embedding (t-SNE) analysis to the list of subtypes resulting from the different trajectories, in order to transform the high-dimensional dataset into 2 dimensions in such a way that similar datapoints are modeled by nearby datapoints and dissimilar datapoints are modeled by distant datapoints with high probability. Subsequently, using hierarchical agglomerative clustering enabled identification of 3 distinct sub-clusters.