Smart Farming: Input Data Infrastructure and Data Processing Model Selection Report
MethodologicalOverview:DataProcessing&AIEvaluation

Key Facts

  • Targets (classes): Low, Medium, Ideal.
  • Nitrogen levels: 0 g/m² (Low), 10 g/m² (Medium), 20 g/m² (Ideal).
  • Irrigation levels: 0% (Low), 50% (Medium), 100% (Ideal).
  • Observed correlation: nitrogen vs NDVI is stronger than irrigation vs NDVI.
  • Processing: wide-to-long transformation; DAS integration; StandardScaler; LabelEncoder; heterogeneity range (Max–Min) feature.
  • Decision: use Naive Bayes for nitrogen and Decision Tree for water stress, considering cost and stability.

Overview

This report presents a methodological framework for processing multispectral drone data from wheat plots and estimating nitrogen and water requirements using machine learning and LLM-based evaluation.

Data Preparation and Definition

The dataset combines XML statistics derived from multispectral drone imagery with plot-based nitrogen and irrigation variables.

  • Nitrogen classes: 0 g/m² (Low), 10 g/m² (Medium), 20 g/m² (Ideal).
  • Irrigation classes: 0% (Low), 50% (Medium), 100% (Ideal).
  • Preliminary correlation insight: nitrogen has a stronger relationship with NDVI than irrigation.

Methodology and Data Processing

  • Transform data from wide (date columns) to long/tidy (one row per measurement moment).
  • Add DAS (Days After Sowing) to model biological age effects on NDVI dynamics.
  • Normalize features with StandardScaler; encode categorical values with LabelEncoder.
  • Add heterogeneity range (Max–Min) as a feature to capture within-plot variability, relevant for water stress.

Machine Learning vs Large Language Models

  • ML: low compute cost, deterministic reliability, requires domain expertise for feature engineering (DAS, scaling).
  • LLM: high compute/API cost, can capture logical relationships but has hallucination risk.
  • Few-shot setup: provide 215 real training samples and ask for 55 test predictions.

Analysis Results

Nitrogen prediction

Highlights

Gemini 3 (LLM) reached 72.73% accuracy; Naive Bayes led ML with 65.50%.

Water stress (irrigation) prediction

Highlights

Gemini 3 and DeepSeek-V3 reached 54.55%; Decision Tree led ML with 49.10%.

Full comparison

The embedded results-table figure contains the complete set of evaluated models and accuracies.

General Evaluation and Roadmap

  • Key learning: expert-driven processing (especially DAS) is critical.
  • Decision: adopt Naive Bayes for nitrogen (cost-effective and stable among ML).
  • Decision: adopt Decision Tree for water stress to capture non-linear branching.

Figures

Supplementary figures and visual materials

Cover page with report title and project code.

Loading image...

Cover page with report title and project code.

Cover

Input Data Infrastructure and Data Processing Model Selection Report (9 January 2026).

Table comparing multiple models and their nitrogen and irrigation accuracy values.

Loading image...

Table comparing multiple models and their nitrogen and irrigation accuracy values.

Model evaluation table

Comparison table listing evaluated ML and LLM models with nitrogen and irrigation accuracy (embedded as figure).

Download Full Report

Access the complete report in PDF format

Input Data Infrastructure and Data Processing Model Selection Report

application/pdf2.4 MB

How to Cite

Use the citation below to reference this report in your work

Kuru, E., & Bulut, M. A. (2026, January 9). Smart Farming: Input Data Infrastructure and Data Processing Model Selection Report (Project 2023-1-DE01-KA220-HED-000166720). Preunec.

Related Reports

Explore complementary research and documentation