Key Facts
- Targets (classes): Low, Medium, Ideal.
- Nitrogen levels: 0 g/m² (Low), 10 g/m² (Medium), 20 g/m² (Ideal).
- Irrigation levels: 0% (Low), 50% (Medium), 100% (Ideal).
- Observed correlation: nitrogen vs NDVI is stronger than irrigation vs NDVI.
- Processing: wide-to-long transformation; DAS integration; StandardScaler; LabelEncoder; heterogeneity range (Max–Min) feature.
- Decision: use Naive Bayes for nitrogen and Decision Tree for water stress, considering cost and stability.
Overview
This report presents a methodological framework for processing multispectral drone data from wheat plots and estimating nitrogen and water requirements using machine learning and LLM-based evaluation.
Data Preparation and Definition
The dataset combines XML statistics derived from multispectral drone imagery with plot-based nitrogen and irrigation variables.
- Nitrogen classes: 0 g/m² (Low), 10 g/m² (Medium), 20 g/m² (Ideal).
- Irrigation classes: 0% (Low), 50% (Medium), 100% (Ideal).
- Preliminary correlation insight: nitrogen has a stronger relationship with NDVI than irrigation.
Methodology and Data Processing
- Transform data from wide (date columns) to long/tidy (one row per measurement moment).
- Add DAS (Days After Sowing) to model biological age effects on NDVI dynamics.
- Normalize features with StandardScaler; encode categorical values with LabelEncoder.
- Add heterogeneity range (Max–Min) as a feature to capture within-plot variability, relevant for water stress.
Machine Learning vs Large Language Models
- ML: low compute cost, deterministic reliability, requires domain expertise for feature engineering (DAS, scaling).
- LLM: high compute/API cost, can capture logical relationships but has hallucination risk.
- Few-shot setup: provide 215 real training samples and ask for 55 test predictions.
Analysis Results
Nitrogen prediction
Highlights
Gemini 3 (LLM) reached 72.73% accuracy; Naive Bayes led ML with 65.50%.
Water stress (irrigation) prediction
Highlights
Gemini 3 and DeepSeek-V3 reached 54.55%; Decision Tree led ML with 49.10%.
Full comparison
The embedded results-table figure contains the complete set of evaluated models and accuracies.
General Evaluation and Roadmap
- Key learning: expert-driven processing (especially DAS) is critical.
- Decision: adopt Naive Bayes for nitrogen (cost-effective and stable among ML).
- Decision: adopt Decision Tree for water stress to capture non-linear branching.
Figures
Supplementary figures and visual materials

Loading image...
Cover page with report title and project code.
Cover
Input Data Infrastructure and Data Processing Model Selection Report (9 January 2026).

Loading image...
Table comparing multiple models and their nitrogen and irrigation accuracy values.
Model evaluation table
Comparison table listing evaluated ML and LLM models with nitrogen and irrigation accuracy (embedded as figure).
Download Full Report
Access the complete report in PDF format
Input Data Infrastructure and Data Processing Model Selection Report
How to Cite
Use the citation below to reference this report in your work
Kuru, E., & Bulut, M. A. (2026, January 9). Smart Farming: Input Data Infrastructure and Data Processing Model Selection Report (Project 2023-1-DE01-KA220-HED-000166720). Preunec.
Related Reports
Explore complementary research and documentation
Key Facts
- Targets (classes): Low, Medium, Ideal.
- Nitrogen levels: 0 g/m² (Low), 10 g/m² (Medium), 20 g/m² (Ideal).
- Irrigation levels: 0% (Low), 50% (Medium), 100% (Ideal).
- Observed correlation: nitrogen vs NDVI is stronger than irrigation vs NDVI.
- Processing: wide-to-long transformation; DAS integration; StandardScaler; LabelEncoder; heterogeneity range (Max–Min) feature.
- Decision: use Naive Bayes for nitrogen and Decision Tree for water stress, considering cost and stability.