Skip to content

Facility Validation

The /validate_facilities and /validate_facility API endpoints assess whether locations are suitable for commodity aggregation facilities. These endpoints analyze geographic, infrastructure, and business factors to determine the likelihood that a location serves as a first-mile aggregation point for commodities.

Validation Metrics

The system combines multiple data sources and AI analysis to evaluate:

  • Commodity Presence: Whether the target commodity is present in the area using commodity maps
  • Infrastructure Access: Road, water, and port accessibility using Overture Maps and Google Places API
  • Building/Industrial Presence: Industrial facility and building analysis within 100m using Overture Places dataset with graduated scoring (large = +0.23, medium = +0.12, none = +0.0). The AI receives binary values (0/1) but scoring uses graduated values.
  • Business Context: Nearby POIs using multiple data sources
  • Geographic Suitability: Building density and latitude constraints for commodity production
  • Producer Verification: Cross-referencing facility information with external business databases

Supported Commodities

The system has dedicated detection models for these commodities:

Supported: rubber, palm, cocoa, coffee, soy, timber, cattle

Unsupported Commodities: If you provide a commodity not in the list above, the system will: - Use a plantation forest mask as an umbrella commodity detection model (appropriate for most agro-forestry commodities) - Include a warning in the response indicating the fallback behavior - Still perform all infrastructure and location checks normally

This allows validation to proceed for unlisted commodities while making it transparent that a generic plantation detection model is being used.

Endpoints

POST /validate_facilities

Validates multiple facility locations from an uploaded GeoJSON file.

Parameters:

Parameter Type Required Default Description
file File Yes - GeoJSON file containing facility geometries
commodity_type string No - Commodity type for validation (rubber, palm, cocoa, coffee, soy, timber, cattle)
commodity_radius float No 5000 Radius in meters to check for commodity presence
check_water_access boolean No False Whether to check water access proximity
check_port_proximity boolean No False Whether to check port proximity
check_road_access boolean No True Whether to check road access (fast Overture Places query)
include_deforestation boolean No False Run 30 m deforestation check in a 50 km radius around each facility, masked to natural forest only (post-EUDR cutoff 2021+)
include_legality_check boolean No False Run AI-powered web search legality risk analysis using Gemini with Google Search grounding

Response:

Returns a streaming GeoJSON FeatureCollection with:

  • Collection Properties: Aggregate statistics across all facilities
  • Feature Properties: Individual validation results for each facility

Example Request:

curl -X POST "https://api.epoch.eco/validate_facilities" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@facilities.geojson" \
  -F "commodity_type=rubber" \
  -F "commodity_radius=3000" \
  -F "check_water_access=true"

GET /validate_facility

Validates a single facility location using a WKT geometry string.

Parameters:

Parameter Type Required Default Description
geometry string Yes - WKT geometry (Point, Polygon, etc.)
commodity_type string No - Commodity type for validation
commodity_radius float No 5000 Radius in meters to check for commodity presence
check_water_access boolean No False Whether to check water access proximity
check_port_proximity boolean No False Whether to check port proximity
check_road_access boolean No True Whether to check road access

Response:

Returns a GeoJSON FeatureCollection with a single feature containing validation results.

Example Request:

curl -X GET "https://api.epoch.eco/validate_facility" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -G \
  -d "geometry=POINT(100.5018 13.7563)" \
  -d "commodity_type=rubber" \
  -d "check_road_access=true"
  • POST /validate_locations - Universal validation; auto-detects facility vs plot and routes accordingly. Use for mixed or unknown input types.
  • POST /normalize_locations - Normalize/geocode data without full validation (e.g., prepare CSV/Excel before validation).

Validation Criteria

Confidence Score Calculation (Updated)

The system uses a penalty-based scoring model that starts at perfect confidence (1.0) and deducts points for missing critical attributes. This ensures that facilities with missing infrastructure or commodity presence are appropriately penalized.

Scoring Algorithm:

confidence_score = 1.0  # Start at maximum

# Penalties for missing attributes:
- 0.50 if commodity_presence = 0 (major - outside production region)
- 0.25 if commodity_presence < 1.0 (moderate - low commodity presence)
- 0.20 if poi_presence = 0 (no business activity nearby)
- 0.15 if building_presence = 0 (no industrial presence)
- 0.15 if road_access = 0 (no infrastructure)
- 0.10 if high_density = 1 (urban area - less suitable)
- 0.25 if deforestation_area_ha >= 1000 (significant deforestation nearby, when enabled)
- 0.15 if deforestation_area_ha >= 100 (moderate deforestation nearby, when enabled)
- 0.10 if deforestation_area_ha > 0 (minor deforestation nearby, when enabled)

# NOT SCORED: water_access, port_proximity (optional checks, ignored)
# SEPARATE SCORE: legality_risk_score (0-1, independent of confidence_score)

# Result: confidence_score = max(0.0, min(1.0, score))

Confidence Level Thresholds (implementation uses 0.7 / 0.45): - High (≥ 0.7): Strong commodity and infrastructure indicators, minimal gaps - Medium (0.45 to < 0.7): Good core indicators but some important gaps - Low (< 0.45): Poor core indicators or critical gaps

Commodity Presence Scoring

Commodity presence is a critical factor. The system checks:

  • Present in area: Facility is in a commodity-producing region
  • Country-level check: Facility is in a country that produces the target commodity
  • Latitude-level check: Facility is within viable latitude range for the commodity (if applicable)

Infrastructure & Location Factors

The system evaluates these factors to determine facility suitability. Absence of critical infrastructure reduces confidence:

  • Commodity Presence: Critical factor in scoring
  • Building/Industrial Presence: Evidence of operational facility
  • POI Nearby: Relevant businesses (warehouses, processors, agricultural companies)
  • Road Access: Essential for logistics
  • High Building Density: Urban areas are less suitable for commodity facilities

Scoring Impact: - Water access and port proximity: Not scored (optional, ignored) - All others: Penalize if absent (see Confidence Score Calculation above)

Latitude Constraints by Commodity

Commodity Optimal Range Viable Range Unlikely Beyond
Rubber 10°N-10°S 20°N/S 30°N/S
Palm Oil 10°N-10°S 15°N/S 20°N/S
Cocoa 20°N-20°S 25°N/S 30°N/S
Coffee 25°N-25°S 35°N/S 40°N/S
Soy 30°N-30°S 45°N/S 50°N/S
Timber 30°N-60°N, 30°S-60°S 20°N-70°N, 20°S-70°S Extreme polar regions
Cattle Global Global None

Commodity-Producing Countries

  • Rubber: Thailand, Vietnam, Indonesia, Malaysia, Ivory Coast, China, Ghana, Philippines, Laos, Cambodia, Myanmar, India, Bangladesh, Nepal, Bhutan, Liberia
  • Palm Oil: Thailand, Indonesia, Malaysia, Ivory Coast, Ghana, Nigeria, Ecuador, Honduras, Brazil, Colombia, Venezuela, Peru, Bolivia, Panama, Nicaragua, Guatemala, Belize, Liberia, Togo, Benin, Cameroon
  • Cocoa: Ghana, Ivory Coast, Ecuador, Colombia, Peru, Togo, Benin, Nigeria, Cameroon, Venezuela, Panama, Brazil, Bolivia, Dominican Republic, Liberia
  • Coffee: Peru, Brazil, Ecuador, Colombia, Nicaragua, Honduras, El Salvador, Uganda, Ethiopia, Indonesia, Vietnam, Argentina, Bolivia, Venezuela, Panama, Costa Rica, Guatemala, Belize, India, Sri Lanka, Malaysia, Myanmar, Cambodia, Thailand, Laos, Papua New Guinea
  • Soy: Argentina, Brazil, Paraguay, Uruguay, Bolivia
  • Timber: Global (all countries with forest cover)
  • Cattle: Global (all countries)

Response Format

Feature properties use flat top-level keys (no nested validation object); location_type identifies output as "facility". All metrics (commodity_presence, road_access, etc.) and scores are at the top level.

Collection Properties (Aggregate Statistics)

{
  "type": "FeatureCollection",
  "properties": {
    "total_facilities": 382,
    "commodity_presence": 315,
    "poi_presence": 298,
    "road_access": 245,
    "water_access": 89,
    "port_proximity": 23,
    "high_density": 67,
    "confidence_score": 0.456,
    "confidence_level": {
      "high": 89,
      "medium": 149,
      "low": 144
    },
    "producer_confirmed": {
      "true": 156,
      "false": 226
    }
  },
  "features": [...]
}

Feature Properties (Individual Results)

Feature properties use flat top-level keys (no nested validation object):

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [100.5018, 13.7563]
  },
  "properties": {
    "location_type": "facility",
    "facility_name": "Rubber Processing Mill",
    "country": "Thailand",
    "commodity_type": "rubber",
    "commodity_presence": 1.0,
    "poi_presence": 1.0,
    "high_density": 0.0,
    "building_presence": 1.0,
    "road_access": 1.0,
    "water_access": 0.0,
    "port_proximity": 0.0,
    "deforestation_area_ha": 1250.5,
    "legality_risk_score": 0.45,
    "legality_risk_level": "medium",
    "legality_sources": [
      {
        "title": "NGO report on deforestation in rubber supply chains",
        "url": "https://example.com/report",
        "date": "2024-03-15",
        "snippet": "Investigation found ongoing forest clearing in Chanthaburi province",
        "relevance": "regional",
        "risk_signal": "deforestation"
      }
    ],
    "legality_summary": "Regional deforestation concerns in rubber sector; no direct link to this facility",
    "producer_confirmation_score": 0.85,
    "producer_confirmed": true,
    "producer_confirmation_comment": "Rule-based analysis: 0.85 score",
    "aggregation_comment": "Rule-based analysis: 0.85 score",
    "confidence_score": 0.85,
    "confidence_level": "high",
    "primary_warning": null
  }
}
  • location_type: "facility" — identifies this as facility validation output
  • primary_warning: Optional string when issues exist (e.g. outside commodity zone, no infrastructure)

Confidence Levels

The confidence score reflects the facility's suitability as a commodity aggregation point based on critical infrastructure and commodity presence:

  • High (≥ 0.7): Strong evidence of suitable commodity facility with good infrastructure
  • Medium (0.45 to < 0.7): Moderate evidence with some infrastructure gaps
  • Low (< 0.45): Weak evidence or significant infrastructure/commodity gaps
  • Invalid: Geometry missing or invalid; no scoring performed

Interpretation Guide: - 0.9-1.0: Exceptional facility characteristics, very strong commodity aggregation point - 0.7-0.9: High quality facility characteristics, strong commodity aggregation point - 0.45-0.7: Moderate facility characteristics, requires verification - 0.3-0.45: Poor facility characteristics, significant issues - 0.0-0.3: Very poor facility characteristics, unlikely commodity aggregation point - Invalid geometry: Feature skipped; confidence_level is "invalid" and confidence_comment explains the error


Possible confidence_level values (per facility)

These are the only values returned in each feature’s properties.confidence_level by validate_facilities / validate_facility / validate_locations (facility path):

Value Meaning
high Confidence score ≥ 0.7. Strong commodity and infrastructure indicators.
medium Confidence score ≥ 0.45 and < 0.7. Some important gaps.
low Confidence score < 0.45, or facility failed country/latitude/commodity checks.
invalid Geometry is null or empty. No scoring; confidence_comment describes the error.

Aggregate collection properties.confidence_level is a counts object (e.g. {"high": 89, "medium": 149, "low": 144}). When this result is used in batch supply shed, the collection-level value is a single string: "low" if any facility is low, otherwise the most common level (so "high", "medium", "low", or "invalid" can appear).


How validation is used by batch supply shed

Facility validation produces the assessment (per-facility and aggregated confidence_level). That assessment is what batch supply shed uses; it is not generated by the batch supply shed endpoint itself.

  1. validate_facilities / validate_facility / validate_locations (facility path) compute per-facility confidence_level and optional confidence_comment.
  2. When batch supply shed stages a facility collection with validate_locations=true, it runs the same validation logic and aggregates to one collection-level confidence_level.
  3. fetch_deforestation_check exposes the stored assessment (or a skip reason) as locations_confidence. Allowed values are: high, medium, low, invalid, or a skip reason when processing did not run (e.g. not processed: validation confidence too low, not processed: invalid address, not processed: address too approximate, not processed: invalid file, not processed: no address or geolocation provided, not processed: commodity not in scope). No custom messages are returned. See fetch_deforestation_check for the full list and meanings.

Deforestation Check (Optional)

When include_deforestation=true, the system runs a 30 m resolution deforestation analysis within a 50 km radius of each facility. The check:

  • Uses the same deforestation mask as the plot validation (GLAD, GFC, TMF alerts)
  • Only counts deforestation on natural forest (FDP + TMF baseline as of EUDR cutoff 2020-12-31)
  • Excludes managed forests (GFC subtype 20), water, built-up areas, and baseline commodity
  • Only counts alerts post-EUDR cutoff (2021+)
  • Returns deforestation_area_ha per facility

Deforestation findings are factored into the confidence score with graduated penalties (see Scoring Algorithm above).

Legality Risk Check (Optional)

When include_legality_check=true, the system uses Gemini AI with Google Search grounding to find recent (post-2021) reports about legality risks for each facility. This covers EUDR Article 10.2 risk dimensions:

  • Deforestation & forest degradation
  • Land rights & indigenous peoples
  • Environmental law violations
  • Labour & human rights
  • Corruption & governance
  • Sanctions & enforcement
  • Supply chain integrity

Per-facility output:

Field Type Description
legality_risk_score float 0.0-1.0 (independent of confidence_score)
legality_risk_level string "low" / "medium" / "high"
legality_sources array Up to 5 ranked sources with title, URL, date, snippet, relevance tier, risk signal
legality_summary string One-sentence synthesis

Relevance tiers (ranked): facility (names this company) > local (same district) > regional (same province) > national (country-level only)

Score adjustment: Gemini's raw score is adjusted based on evidence quality: - Facility-level sources: +0.1 bonus - Local sources: +0.05 bonus - National-only sources: -0.1 penalty - Fresh sources (current year): +0.05 bonus - Stale sources (>3 years): -0.15 penalty

Results are cached for 24 hours. Repeated requests for the same commodity + location return instantly from cache.

Aggregate output (in collection properties):

{
  "legality": {
    "risk": "medium",
    "avg_legality_risk_score": 0.42,
    "locations_flagged": 8,
    "total_assessed": 24,
    "top_sources": [...],
    "description": "8 of 24 facilities flagged (avg score 0.42)"
  }
}

Data Sources

  • Earth Engine: Commodity presence detection, building density analysis, and deforestation detection using satellite imagery
  • Overture Maps: Road access, water features, building footprints, and POI data
  • Google Places API: Business listings, POI verification, and company name enrichment for legality checks
  • Google Maps: Reverse geocoding for legality search context
  • Gemini AI: Combined analysis and scoring, legality risk assessment with Google Search grounding

Performance Notes

  • Road Access: Uses fast Overture Places queries (default enabled)
  • Water Access: Uses Overture Places for water features (optional)
  • POI Detection: Combines Overture Maps and Google Places API
  • Commodity Detection: Earth Engine satellite analysis
  • Parallel Processing: All validation checks run concurrently for optimal performance

Error Handling

The API handles various error conditions:

  • Invalid geometries: Set to 0 scores with appropriate error messages
  • API timeouts: Graceful fallback with partial results
  • Missing data: Default to 0 values with explanatory comments
  • Rate limiting: Automatic retry with exponential backoff

Use Cases

  • Supply Chain Verification: Validate claimed aggregation facilities
  • Due Diligence: Assess facility suitability for commodity sourcing
  • Risk Assessment: Identify potentially fraudulent facility claims
  • Compliance: Support EUDR and other deforestation regulations
  • Site Selection: Evaluate potential locations for new facilities