HYBRID EVENT: You can participate in person at Rome, Italy or Virtually from your home or work.
Speaker at International Alzheimer’s Disease & Dementia Conference 2022 - Kesheng Wang
West Virginia University, United States
Title : Utilizing feature selection to identify proteomic biomarkers for support vector machine learning-based classification of Alzheimer’s disease


Background: Proteins play an important role in disease pathways and drug-target discovery. The most widely used biomarkers for Alzheimer’s disease (AD) include the apolipoprotein E (APOE) protein, the clinical cerebrospinal fluid (CSF) protein biomarkers of Aβ plaques (Aβ42), the biomarkers of pathologic tau (total tau-tTau and phosphorylated tau-pTau), and the biomarkers of neurodegeneration injury (such as MRI). There is a critical need to develop machine learning (ML) approaches in translating univariate biomarker findings into clinically useful multivariate decision support systems. This study aimed to perform feature selection of proteomic biomarkers for support vector machine (SVM) learning-based classification of AD.

Methods: A total of 162 non-Hispanic Whites, including 109 with AD and 53 with cognitive normal functioning (CN), were selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. 146 proteomic  biomarkers were from an ADNI subset of “Biomarkers Consortium Plasma Proteomics Project RBM multiplex data.”  The Z score was computed for each protein using the mean and standard deviation. An independent samples t-test was used to compare the means in each protein level between AD and CN. Variable cluster analysis using the first principal component (oblique principal component cluster analysis-OPCCA) was used to divide proteomic biomarkers into disjoint clusters. Random forest (RF) was used to test relative predictive importance of biomarkers. The SVM algorithm (linear kernel, radial kernel, and polynomial kernel) was applied to develop a model to predict AD.

Results: The independent t-test revealed that there were significant differences between AD and CN in 44 proteomic biomarkers. The top 9 biomarkers of the 146 analyzed with RF feature importance (BNP, ApoAII, SGOT, A1Micro, ApoE, Calcitonin, Eotaxin3, Vitronectin, and IP10) revealed the optimal model with accuracy of 0.7897 and Kappa of 0.4065, while the top 5 of 44 AD-associated biomarkers (BNP, ApoE, A1Micro, PLGF, and PAPPA) revealed an optimal model with accuracy of 0.8694 and Kappa of 0.6753. Furthermore, the radial kernel model in SVM based on the top 5 of the 44 AD- associated features analyzed had accuracy of 0.8617, Kappa of 0.6658, ROC of 0.8791, sensitivity of 0.7967, and specificity of 0.8461, slightly better than the model produced with the top 9 protein biomarkers selected by RF. After adding 3 clinical CSF biomarkers (Aβ42, tTau and pTau) to the top 5 AD associated biomarkers, the polynomial kernel SVM model provided the optimal model with accuracy of 0.9572, Kappa of 0.9088, ROC of 0.9871, sensitivity of 0.9383, and specificity of 0.9076. Finally, the OPCCA clustered 44 AD-associated biomarkers plus 3 clinical CSF biomarkers into 13 clusters, where tTau and pTau were in one cluster, and Aβ42 with ApoE and ApoAII were in another cluster.

Conclusions: The model using both RF-selected proteomic biomarkers and clinical CSF biomarkers has the potential in predicting AD.

Keywords: Alzheimer’s disease; proteomics; biomarkers; machine learning; random forest; support vector machine

What will audience learn from your presentation?

  • ·This study aimed to identify proteomic biomarkers for support vector machine learning-based classification of Alzheimer’s disease (AD).
  • The audience will understand the application of machine learning techniques in predicting AD and in clinical diagnosis.
  • The audience can use these techniques in their research or teaching.
  • Feature selection will help improve the efficiency of classification of AD using machine learning.


Dr. Kesheng Wang is an Associate Professor in Biostatistics at the School of Nursing, West Virginia University. Dr. Wang received his PhD in 2001 from Georg-August-University of Goettingen, Germany. As a postdoctoral fellow, Dr. Wang obtained advanced training in Biostatistics (including Genetic Epidemiology and Statistical Genetics) at the Hospital for Sick Children and University of Toronto, Canada. His research interests focus on theory and application of biostatistics, behavioral epidemiology, and genetic epidemiology/statistical genetics. Dr. Wang has published more than 150 scientific papers.