Agrolati – Plant Stalking Project
Solution Architecture Documentation
1. Overview of the Solution
Agrolati is an embedded AI-driven plant monitoring and conversational assistant designed for smallholder farmers. The system combines hardware sensors with custom machine-learning models and an LLM-powered response engine to diagnose crop conditions and deliver voice-based guidance in real time.
The initial version of Agrolati is optimized for maize crops, using synthetic, Nigeria-specific environmental data to ensure contextual relevance.
2. AI Models and Techniques Used
2.1 Custom Crop Condition Classification Model
- Algorithm: XGBoost (Extreme Gradient Boosting)
- Model Type: Proprietary custom model trained specifically for Nigerian maize environmental conditions
- Input Features:
- Soil moisture readings (low, medium, high)
- Light intensity readings from an LDR/photoresistor (low, medium, high)
- Output:
- 9 diagnostic states (3 moisture levels × 3 lighting levels) representing real-world maize growth conditions
- Purpose:
- Converts raw sensor data into high-level agronomic insights
- Rationale:
- XGBoost performs exceptionally well on tabular datasets
- Works efficiently on low-power hardware
- High accuracy even with relatively small synthetic datasets
- Fast inference, making it suitable for embedded systems
2.2 LLM-Based Insight Generator
- Model: Grok AI API (Proprietary)
- Purpose:
- Interprets the output of the XGBoost model
- Generates natural language explanations and recommendations (e.g., watering advice, lighting adjustments, risk alerts)
- Workflow:
- Takes the diagnostic classification text
- Produces a farmer-friendly explanation
- Output is converted to speech and sent back to the hardware device
2.3 Text-to-Speech (TTS) System
- Model / Framework: Proprietary or third-party TTS (e.g., Grok TTS or another cloud service)
- Purpose: Converts LLM-generated text into audio for the embedded hardware to play back.
3. Training Datasets and Sources
3.1 Dataset Used
- Name: sammaz.csv
- Type: Synthetic Dataset
- Size: Manually generated structured dataset
- Crop Supported: Maize
- Features:
- Light Level: {Low, Medium, High}
- Soil Moisture Level: {Low, Medium, High}
- Condition Label (agronomic interpretation)
3.2 Rationale for Synthetic Data
- Lack of publicly available Nigeria-specific maize sensor datasets
- Enables modeling region-specific environmental behaviors
- Ensures the model is tailored to smallholder farmer conditions
- Avoids licensing and copyright issues
- Enables rapid iteration
4. Solution Architecture Diagram (Text Description)
USER → Agrolati Hardware
- Microphone (captures speech)
- Sensors (Soil Moisture Sensor, Photoresistor/LDR)
Hardware → Software Backend
- Audio sent to speech-to-text engine
- Sensor data transmitted to classification pipeline
AI Core Layer
1. XGBoost Classification Model (Proprietary)
- Input: Light + Moisture readings
- Output: Diagnostic Condition (1 of 9 states)
2. LLM (Grok API)
- Input: Diagnostic text
- Output: Natural-language explanation & actionable advice
3. Text-to-Speech Engine
- Converts LLM output into audio
Software Backend → Hardware
- Audio response returned to device
- User hears Agrolati's voice feedback
5. Rationale for Architecture / Model Choices
| Component | Rationale | Licensing |
|---|---|---|
| XGBoost | High performance for structured data, fast computation, works well on embedded systems | Open Source (Apache 2.0) |
| Custom Crop Condition Model | Purpose-built for Nigerian maize conditions; ensures localized accuracy | Proprietary |
| Synthetic Dataset (sammaz.csv) | Eliminates data scarcity; enables domain-specific training | Proprietary |
| Grok AI LLM | High reasoning capability; suitable for generating detailed agronomic advice | Proprietary |
| TTS Engine | Provides natural audio output to the hardware device | Licensed / Proprietary |
| Embedded Hardware Sensors | Lightweight, low-power components suitable for rural environments | Licensed hardware components |
6. Resources Utilized
6.1 Open-Source Components
- XGBoost Framework (Apache 2.0)
6.2 Proprietary Components
- Custom maize diagnostic model
- Synthetic dataset (sammaz.csv)
- Entire embedded-software integration
- System logic for sensor interpretation and conditioning
6.3 Licensed or Third-Party Resources
- Grok AI API for LLM
- Text-to-Speech engine
- Hardware modules (microcontroller, sensors)
7. Risks & Mitigation Plans
| Risk | Impact | Mitigation |
|---|---|---|
| Slow response time | Poor user experience | Optimizing pipeline, exploring MCP for faster inference |
| Limited sensor accuracy | Misdiagnosis | Calibration routines + data smoothing |
| Overreliance on synthetic data | Potential performance gaps | Field-data expansion in future updates |
| Connectivity issues | LLM unavailability | Local fallback responses in future versions |
8. Future Roadmap
- Expand dataset using real field-collected data
- Add more crops beyond maize
- Implement on-device lightweight LLM for offline functionality
- Introduce MCP (Multi-Component Processing) to reduce latency
- Build an API layer for third-party integrations
- Extend from diagnostics to yield prediction and disease alerts