Key Facts
- Data split strategy: GroupShuffleSplit grouped by Plot ID to avoid leakage across dates.
- Nitrogen prediction model: Gaussian Naive Bayes (65.5% accuracy), selected for stability and low overfitting risk.
- Irrigation prediction model: Decision Tree (49.1% accuracy), selected to capture non-linear threshold logic.
- Preprocessing: StandardScaler normalization; targets encoded via Label Encoding.
- Operationalization: models + scalers + label encoders persisted as .pkl; inference pipeline replays identical preprocessing on XML + flight dates.
- Known constraint: dataset scarcity limits generalization, especially for irrigation.
Overview
This report describes the training methodology, selection criteria, and operational processes for classification models trained on processed agricultural data (DAS, spectral statistics, plot IDs).
Model Training Strategy and Methodology
- Group-based splitting (GroupShuffleSplit): group by Plot ID to prevent memorization when the same plot appears across dates.
- Algorithm selection emphasizes suitability for limited data and agricultural logic over pure mathematical accuracy.
- Standardization and labeling: normalize inputs via StandardScaler; encode targets via Label Encoding.
Key idea
Evaluate generalization on plots the model has never seen before by preventing Plot ID leakage across train/test.
Selected Models and Rationale
Nitrogen Prediction
Gaussian Naive Bayes — 65.5% accuracy
Selected for stability with limited observations and strong correlation between NDVI/spectral indices and nitrogen, reducing overfitting risk.
Irrigation Prediction
Decision Tree — 49.1% accuracy
Selected to capture non-linear threshold-driven effects of irrigation on plant morphology.
Model Management
- Object persistence: serialize trained models, scalers, and label encoders to .pkl for reuse without retraining.
- Inference pipeline: accept raw XML data and flight dates; apply identical scaling/encoding steps used in training.
General Evaluation and Constraints
The main bottleneck is volumetric scarcity of the dataset; irrigation accuracy is constrained by insufficient diversity for generalization.
Conclusion
Infrastructure is logically validated; higher accuracy is expected through retraining as data volume increases, without structural code changes.
Figures
Supplementary figures and visual materials

Loading image...
Cover page with report title and project code.
Cover
Smart Farming, Data Processing Model Training Report (23 January 2026).
Download Full Report
Access the complete report in PDF format
Data Processing Model Training Report
How to Cite
Use the citation below to reference this report in your work
Kuru, E., & Bulut, M. A. (2026, January 23). Smart Farming: Data Processing Model Training Report (Project 2023-1-DE01-KA220-HED-000166720). Preunec.
Related Reports
Explore complementary research and documentation
Key Facts
- Data split strategy: GroupShuffleSplit grouped by Plot ID to avoid leakage across dates.
- Nitrogen prediction model: Gaussian Naive Bayes (65.5% accuracy), selected for stability and low overfitting risk.
- Irrigation prediction model: Decision Tree (49.1% accuracy), selected to capture non-linear threshold logic.
- Preprocessing: StandardScaler normalization; targets encoded via Label Encoding.
- Operationalization: models + scalers + label encoders persisted as .pkl; inference pipeline replays identical preprocessing on XML + flight dates.
- Known constraint: dataset scarcity limits generalization, especially for irrigation.