I+R+D

AI-Driven Disease Prediction in Data-Scarce Environments

Photo of author

By Milthon Lujan

Scientists develop AI to predict disease outbreaks in river ecosystems. Illustration by Nanobanana.
Scientists develop AI to predict disease outbreaks in river ecosystems. Illustration by Nanobanana.

In the aquaculture industry, delayed pathogen detection signifies more than just a lost harvest; it can jeopardize the solvency of an entire enterprise. Disease propagation in river systems is notoriously difficult to predict due to the scarcity of genomic or laboratory data during the early stages of an outbreak.

‘Our research facilitates the prediction of disease occurrence,’ notes lead author Dr. Pouria Ramazi, an assistant professor at the University of Calgary. This model achieves an impressive 0.7 AUC by leveraging environmental variables to bridge critical gaps in biological information.

Key Highlights

  • Proactive Safeguarding: The model serves as an early warning system for fish diseases before they escalate throughout the watershed.
  • High-Performance Accuracy: It achieves a 0.7 AUC without prior test results, requiring only a single baseline control point for effective estimation.
  • Variable Integration: Leverages readily available environmental metrics—such as temperature, pH, and sedimentation—to bridge critical gaps in biological data.
  • Industrial Scalability: Applicable to pathogens vital to global aquaculture, including Salmonella, Vibrio, and Edwardsiella ictaluri.

An Invisible Threat

The research focuses on whirling disease, caused by the parasite Myxobolus cerebralis, which targets salmonid species—such as trout and salmon—resulting in mortality rates as high as 90% among juveniles. For aquaculture stakeholders, this pathology is an economic crisis due to its devastating impact on the recreational and commercial viability of fish stocks.

The primary challenge lies in the detection window; by the time the parasite is identified through laboratory testing, it is often too late for effective mitigation.

Data Engineering: How AI “Forecasts” Infection

In contrast to traditional machine learning models that necessitate thousands of data labels, this novel approach utilizes extended Hidden Markov Models (HMM).

“What is remarkable is that, unlike most AI systems that require extensive confirmed cases, this model remained robust even when trained on a single confirmed instance of whirling disease,” Ramazi explains. The system extracts latent cues from environmental metrics across various samples to map potential infestation sites.

Model Architecture

The system discretizes the river into “pixels” and employs a Directed Acyclic Graph (DAG) to represent the directional flow of water. The mathematical framework is based on the joint probability of response variables—specifically, the presence of the pathogen—and “emissions,” which represent the observed environmental data.

Critical Results for Fisheries Management

The research team validated the model in the Oldman River basin, employing a grid of over 164,000 points. Utilizing data from a mere 113 pixels (0.07% of the area), the TAN-HMM (Tree-Augmented Naive Bayes) model demonstrated a strategic advantage over traditional statistical frameworks:

  • Superiority in Data-Sparse Scenarios: With as few as 2 to 5 test samples, the AI significantly outperformed traditional mixed models.
  • AUC Scalability: Precision escalated from an initial 0.7 with negligible data to 0.9 as 100 test results were integrated.
  • Risk Factor Identification: The system determined that infection probability exceeds 93% in conditions of low water quality or high conductivity.
Environmental VariableCorrelation with Aquaculture Risk
Air TemperatureRisk increases at higher temperatures, which foster parasite proliferation.
Water Quality (Sedimentation)High-quality metrics (low sediment) are inversely correlated with parasite prevalence.
Water pHA critical indicator; risk levels rise in tandem with elevated pH levels.
Human DisturbanceAnthropogenic impact is directly correlated with increased pathogen loads.

A Global Safeguard for Aquaculture

The significance of this breakthrough lies in its transferability to other pathogens that currently plague the global aquaculture industry. Beyond whirling disease, the model is adaptable for tracking Edwardsiella ictaluri—the agent behind enteric septicemia in catfish—and various Vibrio strains impacting shrimp farming. This framework empowers hatchery managers to optimize surveillance by identifying critical hotspots before committing to costly laboratory sampling, transitioning the industry from reactive crisis management to high-precision, science-based prevention.

Stay Always Informed

Join our communities to instantly receive the most important news, reports, and analysis from the aquaculture industry.

Limitations and Future Directions

Despite its success, the model possesses inherent constraints; it restricts spatial dependence to immediate upstream neighbors to maintain computational efficiency. Furthermore, current iterations utilize discrete data (categorized as low, medium, or high), meaning the next developmental phase will focus on implementing algorithms for continuous variables distributed under Gaussian models.

Contact
Pouria Ramazi
Department of Biological Sciences and Department of Mathematics and Statistics, University of Calgary
Calgary, Alberta, Canadá
Email: pouria.ramazi@ucalgary.ca

Reference (open access)
Ramazi, P., Bende, P., Haratian, A., Greiner, R., & Lewis, M. A. 2025. Early warning signal for river-borne diseases with almost no data. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210x.70199