Skip to content

Plot Validation

The /validate_plots and /validate_plot API endpoints perform comprehensive integrity checks on agricultural plot geometries to ensure they meet quality standards for sustainability analysis. These endpoints validate geometric properties, detect common issues, and provide detailed feedback for data quality improvement.

Validation Checks

The system performs comprehensive integrity checks to ensure plot geometries are suitable for:

  • Sustainability Analysis: Accurate area calculations and boundary definitions
  • Satellite Analysis: Proper geometric structure for Earth Engine processing
  • Compliance Reporting: Valid geometries for regulatory submissions
  • Data Quality: Clean, consistent data for downstream processing

Supported Commodities

The system has dedicated detection models for these commodities:

Supported: shrimp, rubber, palm, palm oil, oil palm, cocoa, coffee, cattle, beef, leather, soy, soybean, soya, soja

Unsupported Commodities: If you provide a commodity not in the list above, the system will: - Use a plantation forest mask as an umbrella commodity detection model (appropriate for most agro-forestry commodities) - Include a warning in the response indicating the fallback behavior - Still perform all geometric and protected area checks normally

This allows validation to proceed for unlisted commodities while making it transparent that a generic plantation detection model is being used.

Endpoints

POST /validate_plots

Validates multiple plot geometries from an uploaded GeoJSON file. Supports GeoJSON, Shapefile, CSV, and Excel (with optional AI normalization).

Parameters:

Parameter Type Required Default Description
file File Yes - GeoJSON file containing plot geometries
country string No - Country for commodity and tenure validation; used for commodity detection and Canada/US deforestation tenure exemptions
commodity_type string No - Commodity type for validation (optional)
include_deforestation boolean No false Include deforestation check (plot-level + vicinity); enables GEE deforestation mask and cherry-picking (vicinity) risk
production_volume float No - Claimed total annual production (tonnes) for circumvention pillar (production volume vs. yield)

Response:

Returns a streaming GeoJSON FeatureCollection with:

  • Collection Properties: Aggregate integrity statistics
  • Feature Properties: Individual validation results for each plot

Example Request:

curl -X POST "https://api.epoch.eco/validate_plots" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@plots.geojson" \
  -F "country=Thailand" \
  -F "commodity_type=rubber" \
  -F "include_deforestation=false" \
  -F "min_area=0.5" \
  -F "distance=1000" \
  -F "angle=30"

Usage Examples

Python

import requests
import json

# Set up the request
url = "https://api.epoch.eco/validate_plots"
headers = {
    "Authorization": "Bearer <your_firebase_token>"
}

# Prepare the file and parameters
files = {
    'file': ('my_plots.geojson', open('path/to/your/file.geojson', 'rb'), 'application/geo+json')
}

data = {
    'country': 'Thailand',  # Optional: for commodity and tenure validation
    'commodity_type': 'rubber',  # Optional: for commodity validation
    'include_deforestation': False,  # Set True for deforestation check
    'min_area': 0.5,  # Minimum area threshold
    'distance': 1000,  # Maximum distance between vertices
    'angle': 30,  # Minimum angle threshold
    'min_precision': 6,  # Minimum coordinate precision
    'min_length': 10,  # Minimum boundary segment length
    'overlap_threshold': 0.5,  # Overlap threshold
    'max_distance': 250000  # Maximum distance to nearest neighbor
}

# Make the request
response = requests.post(url, headers=headers, files=files, data=data)

# Check if request was successful
if response.status_code == 200:
    result = response.json()

    # Access the summary statistics
    properties = result['properties']
    print(f"Total plots: {properties.get('total_plots', 0)}")
    print(f"Total area (ha): {properties.get('total_plot_area_ha', 0)}")
    print(f"Overall risk: {properties.get('overall', {}).get('risk', 'N/A')}")
    print(f"Investigate: {properties.get('overall', {}).get('investigate', [])}")

    # Access individual features with validation results
    for feature in result['features']:
        props = feature['properties']
        risk = props.get('overall', {}).get('risk', 'N/A')
        print(f"Plot {props.get('plot_id', 'unknown')}: risk={risk}")
else:
    print(f"Error: {response.status_code} - {response.text}")

JavaScript

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

async function validatePlots(filePath, token, options = {}) {
    try {
        const formData = new FormData();

        // Add the file
        formData.append('file', fs.createReadStream(filePath));

        // Add optional parameters
        if (options.country) formData.append('country', options.country);
        if (options.commodity_type) formData.append('commodity_type', options.commodity_type);
        if (options.min_area) formData.append('min_area', options.min_area);
        if (options.distance) formData.append('distance', options.distance);
        if (options.angle) formData.append('angle', options.angle);

        const response = await axios({
            method: 'post',
            url: 'https://api.epoch.eco/validate_plots',
            headers: {
                'Authorization': `Bearer ${token}`,
                ...formData.getHeaders()
            },
            data: formData
        });

        // Process the response
        const result = response.data;

        // Access the summary statistics
        const properties = result.properties;
        console.log(`Total plots: ${properties.total_plots || 0}`);
        console.log(`Total area (ha): ${properties.total_plot_area_ha || 0}`);
        console.log(`Overall risk: ${properties.overall?.risk || 'N/A'}`);
        console.log(`Investigate: ${JSON.stringify(properties.overall?.investigate || [])}`);

        // Access individual features with validation results
        result.features.forEach(feature => {
            const props = feature.properties;
            const risk = props.overall?.risk || 'N/A';
            console.log(`Plot ${props.plot_id || 'unknown'}: risk=${risk}`);
        });

    } catch (error) {
        console.error('Error:', error.response?.data || error.message);
    }
}

// Usage example
validatePlots('./my_plots.geojson', 'your_token', {
    country: 'Thailand',
    commodity_type: 'rubber',
    min_area: 0.5,
    distance: 1000,
    angle: 30
});

cURL

curl -X POST "https://api.epoch.eco/validate_plots" \
  -H "Authorization: Bearer <your_firebase_token>" \
  -H "Accept: application/json" \
  -F "file=@path/to/your/file.geojson" \
  -F "country=Thailand" \
  -F "commodity_type=rubber" \
  -F "min_area=0.5" \
  -F "distance=1000" \
  -F "angle=30" \
  -F "min_precision=6" \
  -F "min_length=10" \
  -F "overlap_threshold=0.5" \
  -F "max_distance=250000" \
  --output validation_results.json

GET /validate_plot

Validates a single plot geometry using a WKT geometry string.

Parameters:

Parameter Type Required Default Description
geometry string Yes - WKT geometry (Polygon, Multipolygon, etc.)
country string No - Country for commodity and tenure validation
commodity_type string No - Commodity type for validation (optional)
include_deforestation boolean No false Include deforestation check

Response:

Returns a GeoJSON FeatureCollection with a single feature containing validation results.

Example Request:

curl -X GET "https://api.epoch.eco/validate_plot" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -G \
  -d "geometry=POLYGON((100.5018 13.7563, 100.5028 13.7563, 100.5028 13.7573, 100.5018 13.7573, 100.5018 13.7563))" \
  -d "country=Thailand"

POST /validate_locations

Universal validation endpoint that accepts any file format. Auto-detects whether input is facility or plot data and routes to the appropriate validation logic. Use normalize_input=true to enable AI schema detection and geocoding for CSV/Excel files.

Parameter Type Default Description
file File - Any input file (GeoJSON, SHP, CSV, Excel)
normalize_input boolean false Enable AI/geocoding for non-GeoJSON formats
location_type string auto facility or plot (auto-detected if omitted)
limit int 100 Max ranked features to return
country string - Country for commodity and tenure validation
commodity_type string - Commodity type
include_deforestation boolean false Include deforestation check (plot validation only)

POST /normalize_locations

Normalizes geospatial data (clean/geocode) without running full validation. Useful for preparing CSV/Excel files before validation. Returns normalized GeoJSON.

Parameter Type Description
file File CSV, Excel, or GeoJSON file
location_type string facility or plot
commodity_type string Commodity type for context

Validation Checks

Geometric Integrity Checks

1. Area Validation

  • Purpose: Ensures plots meet minimum size requirements
  • Check: area >= min_area (default: 0.1 hectares)
  • Issue: Plots too small for meaningful analysis
  • Fix: Combine small adjacent plots or increase minimum threshold

2. Self-Intersection Detection

  • Purpose: Identifies invalid polygon geometries
  • Check: No self-intersecting boundaries
  • Issue: Invalid geometry that can cause analysis errors
  • Fix: Use GIS tools to fix self-intersections

3. Duplicate Vertices

  • Purpose: Removes redundant coordinate points
  • Check: No consecutive identical coordinates
  • Issue: Inefficient geometry storage and potential analysis issues
  • Fix: Remove duplicate vertices using GIS tools

4. Angle Validation

  • Purpose: Ensures reasonable boundary angles (spike detection)
  • Check: All angles >= angle (default: 30°)
  • Issue: Extremely sharp angles may indicate digitization errors
  • Fix: Smooth boundaries or adjust digitization precision

5. Distance Validation

  • Purpose: Ensures reasonable distances between vertices
  • Check: All vertex distances <= distance (default: 1000m)
  • Issue: Extremely long segments may indicate missing vertices
  • Fix: Add intermediate vertices along long boundaries

6. Precision Validation

  • Purpose: Ensures adequate coordinate precision
  • Check: Coordinates have >= min_precision decimal places (default: 6)
  • Issue: Low precision may cause analysis inaccuracies
  • Fix: Increase coordinate precision in source data

7. Boundary Segment Length

  • Purpose: Ensures reasonable boundary segment lengths
  • Check: All segments >= min_length (default: 10m)
  • Issue: Very short segments may indicate digitization noise
  • Fix: Simplify geometry or adjust digitization settings

Spatial Relationship Checks

8. Overlap Detection

  • Purpose: Identifies overlapping plots within the dataset
  • Check: Overlap area <= overlap_threshold (default: 1% of plot area)
  • Issue: Overlapping plots can cause double-counting in analysis
  • Fix: Resolve overlaps by adjusting boundaries or removing duplicates

9. Nearest Neighbor Distance

  • Purpose: Identifies isolated plots that may be errors
  • Check: Distance to nearest neighbor <= max_distance (default: 250,000m)
  • Issue: Isolated plots may be digitization errors or require separate handling
  • Fix: Verify plot location or adjust analysis parameters

Geometry Type Handling

10. Geometry Collection Processing

  • Purpose: Handles complex geometry collections
  • Check: Separates and validates individual geometry components
  • Issue: Mixed geometry types in single features
  • Fix: Split into separate features by geometry type

11. MultiPolygon Flattening

  • Purpose: Processes MultiPolygon geometries
  • Check: Separates MultiPolygons into individual Polygons
  • Issue: Complex MultiPolygon structures
  • Fix: Flatten to individual Polygon features

Deforestation Checks (when include_deforestation=true)

The deforestation check is aligned with the icechunk zonal stats (Prefect) flow and uses the same exemptions so validation results match batch processing outputs.

12. Plot-Level Deforestation

  • Purpose: Detects deforestation alerts (GLAD, TMF, Hansen) within each plot, restricted to natural forest
  • Baseline: FDP natural forest + JRC TMF
  • Exemptions (pixels excluded from deforestation count):
  • GFC subtype 20: Managed forest (plantation) from JRC GFC2020 subtypes
  • Canada: Managed forest (values 11, 12, 40, 50) from Canada_MFv2020.tif
  • United States: Forest ownership (values 1, 2, 3, 4) from US_forest_ownership.tif
  • Country: Required for Canada/US tenure; TIFFs are loaded only when country matches (no overhead for other countries)

13. Cherry-Picking (Vicinity) Check

  • Purpose: Assesses geographic cherry-picking risk — whether plots are purposefully selected to outline only low-risk areas (no/minimal deforestation, high commodity presence) while the surrounding landscape has much higher risk.
  • What it detects: Plots drawn to avoid deforestation hotspots; the vicinity (extent of all plots) may have high deforestation or low commodity coverage compared to the broader region.
  • Metrics: Vicinity area, commodity area, natural forest area, protected area, deforestation area within the extent (bounding box) of all plots.
  • Risk: Low (≤5% deforestation in vicinity), medium (≤20%), or high (>20%).

Circumvention (Production Volume) Check

14. Circumvention — Production Volume vs. Yield

  • Purpose: Assesses production volume circumvention risk — whether the claimed annual production volume is plausible given the total plot area and typical yields.
  • What it detects: Suppliers declaring a much larger production volume than the plots can physically support (e.g., claiming 10,000 tonnes from 100 ha when typical yield would support ~2,000 tonnes).
  • Requires: production_volume parameter (claimed total annual production in tonnes).
  • Metrics: estimated_annual_production_tonnes (yield × area), production_volume_tonnes (claimed), deviation_pct (signed: positive = overstatement).
  • Risk: Low (claimed ≤ estimated or small overstatement), medium (moderate overstatement), or high (large overstatement).

Cherry-picking vs circumvention: Cherry-picking is about where plots are located (spatial selection bias). Circumvention is about how much production is claimed relative to plot area (volume overstatement).

Protected Area Checks

15. WDPA Overlap Detection

  • Purpose: Identifies plots that overlap World Database of Protected Areas
  • Check: Spatial intersection with WCMC/WDPA/current/polygons dataset
  • Issue: Plots in protected areas may have restrictions on agricultural activities
  • Information Extracted:
  • NAME: Protected area name (e.g., "Serengeti National Park")
  • DESIG: Designation type (e.g., "National Park", "Wildlife Reserve")
  • IUCN_CAT: IUCN management category (I, II, III, IV, V, VI)
  • Fix: Verify plot legality and check local regulations for protected areas

Risk Assessment

The validation system uses a risk-based model (low, medium, high) instead of numeric confidence scores. Lower risk indicates better data quality.

Cherry-Picking vs Circumvention (Distinct Concepts)

Concept What it measures Risk type
Cherry-picking Plots purposefully outlining low-risk areas while the surrounding landscape has higher deforestation/risk Geographic selection bias
Circumvention Claimed production volume exceeds what the plot area can support based on typical yields Production volume overstatement

Cherry-picking asks: Are these plots drawn to avoid bad areas? Circumvention asks: Can this area actually produce the claimed volume?

Pillar Structure

Pillar Description
plot_validity Shape validity, commodity presence, protected area overlap
deforestation Plots with deforestation alerts (when include_deforestation=true)
cherry_picking Geographic risk: Plots purposefully outlining low-risk areas while surrounding landscape has higher risk. Vicinity metrics (commodity, natural forest, deforestation in extent of all plots).
circumvention Production volume risk: Claimed production vs. yield-based estimate. Detects overstatement of annual production relative to plot area. Requires production_volume.
overall Weighted combination; investigate lists categories to review

Shape Validity Issue Weights

Used to compute plot_validity.shape_validity.risk:

Issue Type Weight Description
intersects_another_polygon_count 0.30 Overlaps - can cause double-counting
self_intersection_count 0.25 Invalid geometry
distance_nearest_neighbor_count 0.20 Isolated plots - may be errors
commodity_absence_count 0.25 Wrong commodity area
protected_area_overlap_count 0.20 Protected area overlap
area_too_small_count 0.10 Small areas
max_distance_gt_min_distance_count 0.10 Long edges
max_angle_lt_min_angle_count 0.10 Sharp angles
bad_precision_count 0.05 Precision issues
short_boundary_segments_count 0.05 Short segments
triangular_geometry_count 0.05 Triangular shapes
duplicate_vertices_count 0.05 Duplicate vertices
nested_geometry_collection_count 0.05 Nested collections
simplifiable_collections_count 0.05 Simplifiable collections

Risk Thresholds

  • low: Good data quality, minimal issues
  • medium: Some issues requiring attention
  • high: Significant issues requiring correction

Response Format

The response uses a pillar-based structure with nested risk assessments for plot validity, deforestation, circumvention, and cherry-picking (vicinity). Feature properties use flat top-level keys (no nested validation object); location_type identifies output as "plot".

Collection Properties (Aggregate Statistics)

{
  "type": "FeatureCollection",
  "properties": {
    "total_plots": 150,
    "total_plot_area_ha": 1250.5,
    "plot_validity": {
      "risk": "medium",
      "shape_validity": {
        "risk": "low",
        "issues": {
          "intersects_another_polygon_count": 4,
          "self_intersection_count": 2,
          "area_too_small_count": 3,
          "max_distance_gt_min_distance_count": 0,
          "max_angle_lt_min_angle_count": 1,
          "bad_precision_count": 2,
          "short_boundary_segments_count": 1,
          "distance_nearest_neighbor_count": 2,
          "triangular_geometry_count": 0,
          "duplicate_vertices_count": 5,
          "nested_geometry_collection_count": 0,
          "simplifiable_collections_count": 0
        }
      },
      "commodity_presence": {
        "risk": "low",
        "commodity_absence_count": 5,
        "description": "5 plots lack commodity presence"
      },
      "protected_area_overlap": {
        "risk": "medium",
        "protected_area_overlap_count": 12,
        "description": "12 plots overlap with protected areas"
      }
    },
    "deforestation": {
      "risk": "low",
      "plot_deforestation_pct": 2.5,
      "deforestation_area_ha": 31.26,
      "plots_with_deforestation": 3,
      "description": "3 plots have deforestation alerts (31.3 ha total)"
    },
    "circumvention": {
      "risk": "low",
      "estimated_annual_production_tonnes": 4500.0,
      "production_volume_tonnes": 4200.0,
      "deviation": 0.067
    },
    "cherry_picking": {
      "risk": "low",
      "vicinity_area_ha": 5000.0,
      "vicinity_commodity_area_ha": 3200.0,
      "vicinity_natural_forest_area_ha": 800.0,
      "vicinity_protected_area_ha": 150.0,
      "vicinity_deforestation_ha": 45.0,
      "deforestation_pct": 1.4
    },
    "overall": {
      "risk": "medium",
      "investigate": ["protected_area_overlap", "commodity_presence"]
    }
  },
  "features": [...]
}

Note: deforestation and cherry_picking are null/empty when include_deforestation=false or when no vicinity check is run. circumvention (production volume vs. yield) requires production_volume.

Feature Properties (Individual Results)

Each feature includes user-provided properties plus validation results as flat top-level keys (no nested validation object):

{
  "type": "Feature",
  "geometry": {
    "type": "Polygon",
    "coordinates": [[[100.5018, 13.7563], [100.5028, 13.7563], [100.5028, 13.7573], [100.5018, 13.7573], [100.5018, 13.7563]]]
  },
  "properties": {
    "location_type": "plot",
    "plot_id": "plot_001",
    "area": 1.25,
    "plot_validity": {
      "risk": "low",
      "shape_validity": {
        "risk": "low",
        "indicators": { "self_intersection_count": 0, "intersects_another_polygon_count": 0, ... }
      },
      "commodity_presence": {
        "risk": "low",
        "commodity_absence_count": 0
      },
      "protected_area_overlap": {
        "risk": "high",
        "protected_area_overlap_count": 1,
        "protected_area_metadata": [
          { "NAME": "Serengeti National Park", "DESIG": "National Park", "IUCN_CAT": "II" }
        ]
      }
    },
    "deforestation": {
      "risk": "low",
      "deforestation_present": 0,
      "deforestation_value": 0.0
    },
    "circumvention": {
      "risk": "low",
      "estimated_annual_production_tonnes": 45.2,
      "production_volume_tonnes": 42.0,
      "deviation": 0.07
    },
    "overall": {
      "risk": "low"
    },
    "primary_warning": null
  }
}
  • location_type: "plot" — identifies this as plot validation output
  • primary_warning: Optional string when critical issues exist (e.g. self-intersection, overlaps)
  • Risk values: "low" (good), "medium", "high" (bad). overall.investigate (in collection properties) lists categories to review, ordered by priority.

Validation Results

Low Risk

  • Good data quality with minimal issues
  • Critical checks pass
  • Suitable for sustainability analysis

Medium Risk

  • Some issues requiring attention
  • Most checks pass
  • Review overall.investigate for priority categories

High Risk

  • Significant issues
  • One or more critical checks fail
  • Correction recommended before analysis

Common Issues and Solutions

1. Small Plot Areas

Issue: Plots smaller than minimum threshold Solution:

  • Increase min_area parameter if appropriate
  • Combine adjacent small plots
  • Use different analysis approach for small plots

2. Self-Intersections

Issue: Invalid polygon boundaries Solution:

  • Use GIS software to fix geometries
  • Re-digitize problematic boundaries
  • Use buffer operations to clean boundaries

3. Overlapping Plots

Issue: Multiple plots occupy same area Solution:

  • Adjust plot boundaries to eliminate overlaps
  • Remove duplicate plots
  • Use spatial analysis to resolve conflicts

4. Low Coordinate Precision

Issue: Insufficient decimal places in coordinates Solution:

  • Increase precision in source data
  • Adjust min_precision parameter if appropriate
  • Re-digitize with higher precision

5. Isolated Plots

Issue: Plots far from nearest neighbors Solution:

  • Verify plot location accuracy
  • Adjust max_neighbor_distance parameter
  • Handle isolated plots separately in analysis

6. Protected Area Overlaps (NEW)

Issue: Plots overlap with World Database of Protected Areas Solution:

  • Check local regulations for the specific protected area (IUCN category in response)
  • Verify plot legality within the protected area
  • Consider alternative plot locations outside protected areas
  • Review the protected_area_metadata for the protected area name and designation
  • IUCN Categories: I (Strict Protection), II (National Park), III (Monument), IV (Habitat Management), V (Protected Landscape), VI (Sustainable Use)

Execution Architecture

Validation runs three parallel streams for maximum throughput:

  1. GEE plot task: Commodity presence and deforestation per plot (Earth Engine)
  2. GEE cherry-picking task: Vicinity metrics (convex hull of all plots)
  3. Local geometric task: Angles, WDPA, distances, overlaps, area, precision, etc. (CPU-bound)

The shared deforestation mask is built once and reused by both GEE tasks (instruction caching). Geometric checks run in a thread pool while the network requests are in flight, so CPU and I/O overlap.

Performance Considerations

  • Large Files: Use streaming for files with >1000 plots; 24k+ plots may use 2–4GB RAM during serialization
  • Complex Geometries: MultiPolygons and GeometryCollections require more processing time
  • Precision: Higher precision requirements increase processing time
  • Spatial Checks: Overlap and neighbor distance checks are computationally intensive
  • Deforestation: Set include_deforestation=true only when needed; adds GEE calls and smaller batch sizes

Best Practices

1. Parameter Tuning

  • Start with default values and adjust based on your data characteristics
  • Lower angle thresholds (0.5-1.0°) for more sensitive spike detection
  • Lower distance thresholds (500-800m) for more sensitive edge detection
  • Adjust area thresholds based on your expected plot sizes
  • Increase precision requirements for high-accuracy applications

2. Data Preparation

  • Clean geometries before uploading using GIS software like QGIS
  • Fix self-intersections and topological issues
  • Ensure adequate coordinate precision (6+ decimal places)
  • Remove duplicate vertices and simplify complex geometries
  • Validate coordinate systems and ensure proper projection