Jing Zhang (she/her/hers)
Oklahoma State University
Stillwater, Oklahoma, United States
Francisco M. Ochoa Corona
Professor
Oklahoma State University
Stillwater, OK, USA
Maria Ma
Professor
Oklahoma State University
Stillwater, Oklahoma, United States
Sohrab Bodaghi
University of California, Riverside
Riverside, California, United States
Georgios Vidalakis
University of California, Riverside
Riverside, California, United States
Andres S. Espindola
Assistant Professor
Oklahoma State University
Stillwater, OK, USA
High throughput sequencing (HTS) is increasingly used to screen for graft-transmissible diseases. However, most bioinformatic tools detecting pathogens in HTS data require expert interpretation. This study explores the application of machine learning (ML) to enhance HTS-based citrus diagnostics by integrating Kraken2 and Bowtie2 genomic analyses. ‘Candidatus liberibacter asiaticus’, citrus leaf blotch virus (CLBV), and citrus exocortis viroid were used as proof of concept for a citrus ML prototype (CMLP). Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) were trained using features of read counts and alignment-specific metrics in CMLP. Results show the Principal Coordinate Analysis (PCoA) revealed two clusters, positives and negatives. PCo1 explained 78.22% of the dissimilarities, reflecting the primary factors driving the separation, while PCo2 accounted for 14.45%. In model evaluation, data were split into 70% training and 30% testing sets. For CLBV, LR achieved 87% accuracy and 100% specificity but failed to detect positives (0% sensitivity). RF showed 100% accuracy, specificity, and sensitivity but risked overfitting. SVM provided a balanced performance with 100% sensitivity and 60% specificity, resulting in 95% accuracy. CMLP preliminary findings demonstrate the strengths and limitations of Kraken2 and Bowtie2, emphasizing the need for multi-feature integration. RF and SVM demonstrated strong diagnostic potential, with RF excelling in accuracy but prone to overfitting, while SVM offered a balanced approach for the citrus virus detections.