Plot Validation¶
The /validate_plots and /validate_plot API endpoints perform comprehensive integrity checks on agricultural plot geometries to ensure they meet quality standards for sustainability analysis. These endpoints validate geometric properties, detect common issues, and provide detailed feedback for data quality improvement.
Validation Checks¶
The system performs comprehensive integrity checks to ensure plot geometries are suitable for:
- Sustainability Analysis: Accurate area calculations and boundary definitions
- Satellite Analysis: Proper geometric structure for Earth Engine processing
- Compliance Reporting: Valid geometries for regulatory submissions
- Data Quality: Clean, consistent data for downstream processing
Supported Commodities¶
The system has dedicated detection models for these commodities:
Supported: shrimp, rubber, palm, palm oil, oil palm, cocoa, coffee, cattle, beef, leather, soy, soybean, soya, soja
Unsupported Commodities: If you provide a commodity not in the list above, the system will: - Use a plantation forest mask as an umbrella commodity detection model (appropriate for most agro-forestry commodities) - Include a warning in the response indicating the fallback behavior - Still perform all geometric and protected area checks normally
This allows validation to proceed for unlisted commodities while making it transparent that a generic plantation detection model is being used.
Endpoints¶
POST /validate_plots¶
Validates multiple plot geometries from an uploaded GeoJSON file. Supports GeoJSON, Shapefile, CSV, and Excel (with optional AI normalization).
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | GeoJSON file containing plot geometries |
country |
string | No | - | Country for commodity and tenure validation; used for commodity detection and Canada/US deforestation tenure exemptions |
commodity_type |
string | No | - | Commodity type for validation (optional) |
include_deforestation |
boolean | No | false | Include deforestation check (plot-level + vicinity); enables GEE deforestation mask and cherry-picking (vicinity) risk |
production_volume |
float | No | - | Claimed total annual production (tonnes) for circumvention pillar (production volume vs. yield) |
Response:
Returns a streaming GeoJSON FeatureCollection with:
- Collection Properties: Aggregate integrity statistics
- Feature Properties: Individual validation results for each plot
Example Request:
curl -X POST "https://api.epoch.eco/validate_plots" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@plots.geojson" \
-F "country=Thailand" \
-F "commodity_type=rubber" \
-F "include_deforestation=false" \
-F "min_area=0.5" \
-F "distance=1000" \
-F "angle=30"
Usage Examples¶
Python¶
import requests
import json
# Set up the request
url = "https://api.epoch.eco/validate_plots"
headers = {
"Authorization": "Bearer <your_firebase_token>"
}
# Prepare the file and parameters
files = {
'file': ('my_plots.geojson', open('path/to/your/file.geojson', 'rb'), 'application/geo+json')
}
data = {
'country': 'Thailand', # Optional: for commodity and tenure validation
'commodity_type': 'rubber', # Optional: for commodity validation
'include_deforestation': False, # Set True for deforestation check
'min_area': 0.5, # Minimum area threshold
'distance': 1000, # Maximum distance between vertices
'angle': 30, # Minimum angle threshold
'min_precision': 6, # Minimum coordinate precision
'min_length': 10, # Minimum boundary segment length
'overlap_threshold': 0.5, # Overlap threshold
'max_distance': 250000 # Maximum distance to nearest neighbor
}
# Make the request
response = requests.post(url, headers=headers, files=files, data=data)
# Check if request was successful
if response.status_code == 200:
result = response.json()
# Access the summary statistics
properties = result['properties']
print(f"Total plots: {properties.get('total_plots', 0)}")
print(f"Total area (ha): {properties.get('total_plot_area_ha', 0)}")
print(f"Overall risk: {properties.get('overall', {}).get('risk', 'N/A')}")
print(f"Investigate: {properties.get('overall', {}).get('investigate', [])}")
# Access individual features with validation results
for feature in result['features']:
props = feature['properties']
risk = props.get('overall', {}).get('risk', 'N/A')
print(f"Plot {props.get('plot_id', 'unknown')}: risk={risk}")
else:
print(f"Error: {response.status_code} - {response.text}")
JavaScript¶
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
async function validatePlots(filePath, token, options = {}) {
try {
const formData = new FormData();
// Add the file
formData.append('file', fs.createReadStream(filePath));
// Add optional parameters
if (options.country) formData.append('country', options.country);
if (options.commodity_type) formData.append('commodity_type', options.commodity_type);
if (options.min_area) formData.append('min_area', options.min_area);
if (options.distance) formData.append('distance', options.distance);
if (options.angle) formData.append('angle', options.angle);
const response = await axios({
method: 'post',
url: 'https://api.epoch.eco/validate_plots',
headers: {
'Authorization': `Bearer ${token}`,
...formData.getHeaders()
},
data: formData
});
// Process the response
const result = response.data;
// Access the summary statistics
const properties = result.properties;
console.log(`Total plots: ${properties.total_plots || 0}`);
console.log(`Total area (ha): ${properties.total_plot_area_ha || 0}`);
console.log(`Overall risk: ${properties.overall?.risk || 'N/A'}`);
console.log(`Investigate: ${JSON.stringify(properties.overall?.investigate || [])}`);
// Access individual features with validation results
result.features.forEach(feature => {
const props = feature.properties;
const risk = props.overall?.risk || 'N/A';
console.log(`Plot ${props.plot_id || 'unknown'}: risk=${risk}`);
});
} catch (error) {
console.error('Error:', error.response?.data || error.message);
}
}
// Usage example
validatePlots('./my_plots.geojson', 'your_token', {
country: 'Thailand',
commodity_type: 'rubber',
min_area: 0.5,
distance: 1000,
angle: 30
});
cURL¶
curl -X POST "https://api.epoch.eco/validate_plots" \
-H "Authorization: Bearer <your_firebase_token>" \
-H "Accept: application/json" \
-F "file=@path/to/your/file.geojson" \
-F "country=Thailand" \
-F "commodity_type=rubber" \
-F "min_area=0.5" \
-F "distance=1000" \
-F "angle=30" \
-F "min_precision=6" \
-F "min_length=10" \
-F "overlap_threshold=0.5" \
-F "max_distance=250000" \
--output validation_results.json
GET /validate_plot¶
Validates a single plot geometry using a WKT geometry string.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
geometry |
string | Yes | - | WKT geometry (Polygon, Multipolygon, etc.) |
country |
string | No | - | Country for commodity and tenure validation |
commodity_type |
string | No | - | Commodity type for validation (optional) |
include_deforestation |
boolean | No | false | Include deforestation check |
Response:
Returns a GeoJSON FeatureCollection with a single feature containing validation results.
Example Request:
curl -X GET "https://api.epoch.eco/validate_plot" \
-H "Authorization: Bearer YOUR_TOKEN" \
-G \
-d "geometry=POLYGON((100.5018 13.7563, 100.5028 13.7563, 100.5028 13.7573, 100.5018 13.7573, 100.5018 13.7563))" \
-d "country=Thailand"
POST /validate_locations¶
Universal validation endpoint that accepts any file format. Auto-detects whether input is facility or plot data and routes to the appropriate validation logic. Use normalize_input=true to enable AI schema detection and geocoding for CSV/Excel files.
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
File | - | Any input file (GeoJSON, SHP, CSV, Excel) |
normalize_input |
boolean | false | Enable AI/geocoding for non-GeoJSON formats |
location_type |
string | auto | facility or plot (auto-detected if omitted) |
limit |
int | 100 | Max ranked features to return |
country |
string | - | Country for commodity and tenure validation |
commodity_type |
string | - | Commodity type |
include_deforestation |
boolean | false | Include deforestation check (plot validation only) |
POST /normalize_locations¶
Normalizes geospatial data (clean/geocode) without running full validation. Useful for preparing CSV/Excel files before validation. Returns normalized GeoJSON.
| Parameter | Type | Description |
|---|---|---|
file |
File | CSV, Excel, or GeoJSON file |
location_type |
string | facility or plot |
commodity_type |
string | Commodity type for context |
Validation Checks¶
Geometric Integrity Checks¶
1. Area Validation¶
- Purpose: Ensures plots meet minimum size requirements
- Check:
area >= min_area(default: 0.1 hectares) - Issue: Plots too small for meaningful analysis
- Fix: Combine small adjacent plots or increase minimum threshold
2. Self-Intersection Detection¶
- Purpose: Identifies invalid polygon geometries
- Check: No self-intersecting boundaries
- Issue: Invalid geometry that can cause analysis errors
- Fix: Use GIS tools to fix self-intersections
3. Duplicate Vertices¶
- Purpose: Removes redundant coordinate points
- Check: No consecutive identical coordinates
- Issue: Inefficient geometry storage and potential analysis issues
- Fix: Remove duplicate vertices using GIS tools
4. Angle Validation¶
- Purpose: Ensures reasonable boundary angles (spike detection)
- Check: All angles >= angle (default: 30°)
- Issue: Extremely sharp angles may indicate digitization errors
- Fix: Smooth boundaries or adjust digitization precision
5. Distance Validation¶
- Purpose: Ensures reasonable distances between vertices
- Check: All vertex distances <= distance (default: 1000m)
- Issue: Extremely long segments may indicate missing vertices
- Fix: Add intermediate vertices along long boundaries
6. Precision Validation¶
- Purpose: Ensures adequate coordinate precision
- Check: Coordinates have >= min_precision decimal places (default: 6)
- Issue: Low precision may cause analysis inaccuracies
- Fix: Increase coordinate precision in source data
7. Boundary Segment Length¶
- Purpose: Ensures reasonable boundary segment lengths
- Check: All segments >= min_length (default: 10m)
- Issue: Very short segments may indicate digitization noise
- Fix: Simplify geometry or adjust digitization settings
Spatial Relationship Checks¶
8. Overlap Detection¶
- Purpose: Identifies overlapping plots within the dataset
- Check: Overlap area <= overlap_threshold (default: 1% of plot area)
- Issue: Overlapping plots can cause double-counting in analysis
- Fix: Resolve overlaps by adjusting boundaries or removing duplicates
9. Nearest Neighbor Distance¶
- Purpose: Identifies isolated plots that may be errors
- Check: Distance to nearest neighbor <= max_distance (default: 250,000m)
- Issue: Isolated plots may be digitization errors or require separate handling
- Fix: Verify plot location or adjust analysis parameters
Geometry Type Handling¶
10. Geometry Collection Processing¶
- Purpose: Handles complex geometry collections
- Check: Separates and validates individual geometry components
- Issue: Mixed geometry types in single features
- Fix: Split into separate features by geometry type
11. MultiPolygon Flattening¶
- Purpose: Processes MultiPolygon geometries
- Check: Separates MultiPolygons into individual Polygons
- Issue: Complex MultiPolygon structures
- Fix: Flatten to individual Polygon features
Deforestation Checks (when include_deforestation=true)¶
The deforestation check is aligned with the icechunk zonal stats (Prefect) flow and uses the same exemptions so validation results match batch processing outputs.
12. Plot-Level Deforestation¶
- Purpose: Detects deforestation alerts (GLAD, TMF, Hansen) within each plot, restricted to natural forest
- Baseline: FDP natural forest + JRC TMF
- Exemptions (pixels excluded from deforestation count):
- GFC subtype 20: Managed forest (plantation) from JRC GFC2020 subtypes
- Canada: Managed forest (values 11, 12, 40, 50) from
Canada_MFv2020.tif - United States: Forest ownership (values 1, 2, 3, 4) from
US_forest_ownership.tif - Country: Required for Canada/US tenure; TIFFs are loaded only when
countrymatches (no overhead for other countries)
13. Cherry-Picking (Vicinity) Check¶
- Purpose: Assesses geographic cherry-picking risk — whether plots are purposefully selected to outline only low-risk areas (no/minimal deforestation, high commodity presence) while the surrounding landscape has much higher risk.
- What it detects: Plots drawn to avoid deforestation hotspots; the vicinity (extent of all plots) may have high deforestation or low commodity coverage compared to the broader region.
- Metrics: Vicinity area, commodity area, natural forest area, protected area, deforestation area within the extent (bounding box) of all plots.
- Risk: Low (≤5% deforestation in vicinity), medium (≤20%), or high (>20%).
Circumvention (Production Volume) Check¶
14. Circumvention — Production Volume vs. Yield¶
- Purpose: Assesses production volume circumvention risk — whether the claimed annual production volume is plausible given the total plot area and typical yields.
- What it detects: Suppliers declaring a much larger production volume than the plots can physically support (e.g., claiming 10,000 tonnes from 100 ha when typical yield would support ~2,000 tonnes).
- Requires:
production_volumeparameter (claimed total annual production in tonnes). - Metrics:
estimated_annual_production_tonnes(yield × area),production_volume_tonnes(claimed),deviation_pct(signed: positive = overstatement). - Risk: Low (claimed ≤ estimated or small overstatement), medium (moderate overstatement), or high (large overstatement).
Cherry-picking vs circumvention: Cherry-picking is about where plots are located (spatial selection bias). Circumvention is about how much production is claimed relative to plot area (volume overstatement).
Protected Area Checks¶
15. WDPA Overlap Detection¶
- Purpose: Identifies plots that overlap World Database of Protected Areas
- Check: Spatial intersection with WCMC/WDPA/current/polygons dataset
- Issue: Plots in protected areas may have restrictions on agricultural activities
- Information Extracted:
NAME: Protected area name (e.g., "Serengeti National Park")DESIG: Designation type (e.g., "National Park", "Wildlife Reserve")IUCN_CAT: IUCN management category (I, II, III, IV, V, VI)- Fix: Verify plot legality and check local regulations for protected areas
Risk Assessment¶
The validation system uses a risk-based model (low, medium, high) instead of numeric confidence scores. Lower risk indicates better data quality.
Cherry-Picking vs Circumvention (Distinct Concepts)¶
| Concept | What it measures | Risk type |
|---|---|---|
| Cherry-picking | Plots purposefully outlining low-risk areas while the surrounding landscape has higher deforestation/risk | Geographic selection bias |
| Circumvention | Claimed production volume exceeds what the plot area can support based on typical yields | Production volume overstatement |
Cherry-picking asks: Are these plots drawn to avoid bad areas? Circumvention asks: Can this area actually produce the claimed volume?
Pillar Structure¶
| Pillar | Description |
|---|---|
| plot_validity | Shape validity, commodity presence, protected area overlap |
| deforestation | Plots with deforestation alerts (when include_deforestation=true) |
| cherry_picking | Geographic risk: Plots purposefully outlining low-risk areas while surrounding landscape has higher risk. Vicinity metrics (commodity, natural forest, deforestation in extent of all plots). |
| circumvention | Production volume risk: Claimed production vs. yield-based estimate. Detects overstatement of annual production relative to plot area. Requires production_volume. |
| overall | Weighted combination; investigate lists categories to review |
Shape Validity Issue Weights¶
Used to compute plot_validity.shape_validity.risk:
| Issue Type | Weight | Description |
|---|---|---|
intersects_another_polygon_count |
0.30 | Overlaps - can cause double-counting |
self_intersection_count |
0.25 | Invalid geometry |
distance_nearest_neighbor_count |
0.20 | Isolated plots - may be errors |
commodity_absence_count |
0.25 | Wrong commodity area |
protected_area_overlap_count |
0.20 | Protected area overlap |
area_too_small_count |
0.10 | Small areas |
max_distance_gt_min_distance_count |
0.10 | Long edges |
max_angle_lt_min_angle_count |
0.10 | Sharp angles |
bad_precision_count |
0.05 | Precision issues |
short_boundary_segments_count |
0.05 | Short segments |
triangular_geometry_count |
0.05 | Triangular shapes |
duplicate_vertices_count |
0.05 | Duplicate vertices |
nested_geometry_collection_count |
0.05 | Nested collections |
simplifiable_collections_count |
0.05 | Simplifiable collections |
Risk Thresholds¶
- low: Good data quality, minimal issues
- medium: Some issues requiring attention
- high: Significant issues requiring correction
Response Format¶
The response uses a pillar-based structure with nested risk assessments for plot validity, deforestation, circumvention, and cherry-picking (vicinity). Feature properties use flat top-level keys (no nested validation object); location_type identifies output as "plot".
Collection Properties (Aggregate Statistics)¶
{
"type": "FeatureCollection",
"properties": {
"total_plots": 150,
"total_plot_area_ha": 1250.5,
"plot_validity": {
"risk": "medium",
"shape_validity": {
"risk": "low",
"issues": {
"intersects_another_polygon_count": 4,
"self_intersection_count": 2,
"area_too_small_count": 3,
"max_distance_gt_min_distance_count": 0,
"max_angle_lt_min_angle_count": 1,
"bad_precision_count": 2,
"short_boundary_segments_count": 1,
"distance_nearest_neighbor_count": 2,
"triangular_geometry_count": 0,
"duplicate_vertices_count": 5,
"nested_geometry_collection_count": 0,
"simplifiable_collections_count": 0
}
},
"commodity_presence": {
"risk": "low",
"commodity_absence_count": 5,
"description": "5 plots lack commodity presence"
},
"protected_area_overlap": {
"risk": "medium",
"protected_area_overlap_count": 12,
"description": "12 plots overlap with protected areas"
}
},
"deforestation": {
"risk": "low",
"plot_deforestation_pct": 2.5,
"deforestation_area_ha": 31.26,
"plots_with_deforestation": 3,
"description": "3 plots have deforestation alerts (31.3 ha total)"
},
"circumvention": {
"risk": "low",
"estimated_annual_production_tonnes": 4500.0,
"production_volume_tonnes": 4200.0,
"deviation": 0.067
},
"cherry_picking": {
"risk": "low",
"vicinity_area_ha": 5000.0,
"vicinity_commodity_area_ha": 3200.0,
"vicinity_natural_forest_area_ha": 800.0,
"vicinity_protected_area_ha": 150.0,
"vicinity_deforestation_ha": 45.0,
"deforestation_pct": 1.4
},
"overall": {
"risk": "medium",
"investigate": ["protected_area_overlap", "commodity_presence"]
}
},
"features": [...]
}
Note: deforestation and cherry_picking are null/empty when include_deforestation=false or when no vicinity check is run. circumvention (production volume vs. yield) requires production_volume.
Feature Properties (Individual Results)¶
Each feature includes user-provided properties plus validation results as flat top-level keys (no nested validation object):
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[100.5018, 13.7563], [100.5028, 13.7563], [100.5028, 13.7573], [100.5018, 13.7573], [100.5018, 13.7563]]]
},
"properties": {
"location_type": "plot",
"plot_id": "plot_001",
"area": 1.25,
"plot_validity": {
"risk": "low",
"shape_validity": {
"risk": "low",
"indicators": { "self_intersection_count": 0, "intersects_another_polygon_count": 0, ... }
},
"commodity_presence": {
"risk": "low",
"commodity_absence_count": 0
},
"protected_area_overlap": {
"risk": "high",
"protected_area_overlap_count": 1,
"protected_area_metadata": [
{ "NAME": "Serengeti National Park", "DESIG": "National Park", "IUCN_CAT": "II" }
]
}
},
"deforestation": {
"risk": "low",
"deforestation_present": 0,
"deforestation_value": 0.0
},
"circumvention": {
"risk": "low",
"estimated_annual_production_tonnes": 45.2,
"production_volume_tonnes": 42.0,
"deviation": 0.07
},
"overall": {
"risk": "low"
},
"primary_warning": null
}
}
location_type:"plot"— identifies this as plot validation outputprimary_warning: Optional string when critical issues exist (e.g. self-intersection, overlaps)- Risk values:
"low"(good),"medium","high"(bad).overall.investigate(in collection properties) lists categories to review, ordered by priority.
Validation Results¶
Low Risk¶
- Good data quality with minimal issues
- Critical checks pass
- Suitable for sustainability analysis
Medium Risk¶
- Some issues requiring attention
- Most checks pass
- Review
overall.investigatefor priority categories
High Risk¶
- Significant issues
- One or more critical checks fail
- Correction recommended before analysis
Common Issues and Solutions¶
1. Small Plot Areas¶
Issue: Plots smaller than minimum threshold Solution:
- Increase
min_areaparameter if appropriate - Combine adjacent small plots
- Use different analysis approach for small plots
2. Self-Intersections¶
Issue: Invalid polygon boundaries Solution:
- Use GIS software to fix geometries
- Re-digitize problematic boundaries
- Use buffer operations to clean boundaries
3. Overlapping Plots¶
Issue: Multiple plots occupy same area Solution:
- Adjust plot boundaries to eliminate overlaps
- Remove duplicate plots
- Use spatial analysis to resolve conflicts
4. Low Coordinate Precision¶
Issue: Insufficient decimal places in coordinates Solution:
- Increase precision in source data
- Adjust
min_precisionparameter if appropriate - Re-digitize with higher precision
5. Isolated Plots¶
Issue: Plots far from nearest neighbors Solution:
- Verify plot location accuracy
- Adjust
max_neighbor_distanceparameter - Handle isolated plots separately in analysis
6. Protected Area Overlaps (NEW)¶
Issue: Plots overlap with World Database of Protected Areas Solution:
- Check local regulations for the specific protected area (IUCN category in response)
- Verify plot legality within the protected area
- Consider alternative plot locations outside protected areas
- Review the
protected_area_metadatafor the protected area name and designation - IUCN Categories: I (Strict Protection), II (National Park), III (Monument), IV (Habitat Management), V (Protected Landscape), VI (Sustainable Use)
Execution Architecture¶
Validation runs three parallel streams for maximum throughput:
- GEE plot task: Commodity presence and deforestation per plot (Earth Engine)
- GEE cherry-picking task: Vicinity metrics (convex hull of all plots)
- Local geometric task: Angles, WDPA, distances, overlaps, area, precision, etc. (CPU-bound)
The shared deforestation mask is built once and reused by both GEE tasks (instruction caching). Geometric checks run in a thread pool while the network requests are in flight, so CPU and I/O overlap.
Performance Considerations¶
- Large Files: Use streaming for files with >1000 plots; 24k+ plots may use 2–4GB RAM during serialization
- Complex Geometries: MultiPolygons and GeometryCollections require more processing time
- Precision: Higher precision requirements increase processing time
- Spatial Checks: Overlap and neighbor distance checks are computationally intensive
- Deforestation: Set
include_deforestation=trueonly when needed; adds GEE calls and smaller batch sizes
Best Practices¶
1. Parameter Tuning¶
- Start with default values and adjust based on your data characteristics
- Lower angle thresholds (0.5-1.0°) for more sensitive spike detection
- Lower distance thresholds (500-800m) for more sensitive edge detection
- Adjust area thresholds based on your expected plot sizes
- Increase precision requirements for high-accuracy applications
2. Data Preparation¶
- Clean geometries before uploading using GIS software like QGIS
- Fix self-intersections and topological issues
- Ensure adequate coordinate precision (6+ decimal places)
- Remove duplicate vertices and simplify complex geometries
- Validate coordinate systems and ensure proper projection