Project ID: P202103100012
Longitudinal Clustering Analysis and Prediction of Parkinson’s Disease Progression
Parkinson’s Disease, image processing, Clustering Analysis, Prediction
People with the below expertise are able to apply for this project: The individuals with enough experience in machine learning algorithms including dimension reduction algorithms, classifiers and clustering algorithms. The individual with enough experience in extracting radiomics features from each region of interest (via SERA package). The individual with enough experience in segmenting different regions via Free Surfer. Both programming languages Matlab and Python are acceptable, but Python works better for some people who aim at working on google Colab.
Objectives: We aimed to identify distinct disease progression pathways in Parkinson’s disease (PD), making use of clinical and imaging features, towards improved understanding of disease and powering of clinical trials. In addition, we studies machine learning approaches to predict progression pathways from early (year 0 and 1) data. Methods: We studied 885 PD-subjects derived from longitudinal datasets (years 0, 1, 2 & 4; Parkinson’s Progressive Marker Initiative). We generated and analyzed 980 features, including Movement Disorder Society’s Unified Parkinson's Disease Rating Scale (MDS-UPDRS) measures, a range of task/exam performances, socioeconomic/family histories, and radiomics features (RFs) extracted for each region-of-interest (ROI; left and right caudate as well as putamen) using our standardized SERA radiomics software. Segmentation of ROIs on DAT SPECT images were performed via MRI images. After performing cross-sectional clustering to identify disease subtypes (3 sub-clusters robustly identified in our prior work, namely i) mild, ii) intermediate, and ii) severe) for any given patient in any given year, we performed identification of optimal longitudinal pathways by applying a hybrid system (HS) including Principal Component Analysis (PCA) as a dimension reduction algorithm (DRA), and Hierarchical Agglomerative Clustering (HAC) as a clustering method, to the longitudinal dataset. To optimize the number of longitudinal trajectories (clusters), we applied the Elbow clustering evaluation method to our results (for a range of 2-9 longitudinal clusters/pathways) as generated by HSs including PCA+K-Means Algorithm (KMA) as well as PCA+HAC. Our optimized number of pathways were further confirmed by two other methods: Bayesian Information Criteria (BIC) and Calinski Harabatz Criteria (CHC) as applied on clustering results provided by KMA. Subsequently, prediction of the identified trajectories based on early years (data in year 0 and 1) was performed using multiple HSs including 16 DRAs coupled to 10 classifiers.