Skip to content

Data Normalization

The /normalize_locations and /validate_locations API endpoints provide intelligent data cleaning and standardization for geospatial data in any format. These endpoints use AI-powered (Gemini) analysis combined with geocoding to transform messy, unstructured data into standardized GeoJSON with complete EUDR compliance payloads.

Overview

Data normalization is essential for supply chain analysis because real-world datasets often arrive in diverse formats with inconsistent schemas. The normalization service automatically handles this complexity by:

  • Auto-detecting schemas from Excel, CSV, and other unstructured formats using Gemini AI
  • Mapping columns to standard fields (supplier name, address, location, commodity, etc.)
  • Geocoding addresses when geometry is missing
  • Detecting commodities intelligently using AI and heuristic fallbacks
  • Generating EUDR payloads with detected/inferred data
  • Smart routing to facility or plot validation based on geometry type and detected data

Supported Input Formats

The system accepts multiple geospatial and tabular formats:

Format Type Notes
GeoJSON Native Geospatial Direct parsing, no normalization needed
Shapefile Native Geospatial Automatic unpacking and parsing
GeoPackage Native Geospatial OGR/GDAL compatible
GeoParquet Native Geospatial Apache Parquet-based format
KML/KMZ Native Geospatial Google Earth compatible
CSV Tabular With or without geometry columns
XLSX Tabular With or without geometry columns
PDF Unstructured Parsed by Gemini AI
TXT Unstructured Parsed by Gemini AI

Endpoints

POST /normalize_locations

Cleans and normalizes geospatial data without running full validation checks.

Use case: Standardize raw data before batch processing, detailed analysis, or export to downstream systems.

Parameters:

Parameter Type Required Default Description
file File Yes - Input file in any supported format
location_type string No None Specify 'facility' or 'plot'. If None, auto-detect from geometry
commodity_type string No None Commodity type (e.g., 'palm', 'soy', 'cocoa'). If not provided, infer from data

Response:

Returns a GeoJSON FeatureCollection with:

  • Normalized features with standardized properties
  • EUDR payloads including DDS references and verification numbers
  • Metadata with detection results and processing details

Example Request:

curl -X POST "https://api.epoch.eco/normalize_locations" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@suppliers.xlsx" \
  -F "location_type=facility" \
  -F "commodity_type=palm"

POST /validate_locations

Universal location validator that combines normalization with comprehensive validation checks.

Use case: Accept any file format, auto-detect location type (facility or plot), and return validation results with full analysis.

Parameters:

Parameter Type Required Default Description
file File Yes - Input file in any supported format
normalize_input boolean No False Enable AI-powered normalization before validation
location_type string No None Specify 'facility' or 'plot'. If None, auto-detect
commodity_type string No None Commodity type for context and validation
limit integer No 100 Max ranked features to return (top N by confidence)
country string No None Country context for validation

Plot-Specific Parameters:

Parameter Type Default Description
angle float 30 Minimum angle threshold (degrees)
distance float 1000 Maximum distance between vertices (meters)
overlap_threshold float 0.5 Overlap detection threshold (0-1)
max_distance float 250000 Maximum distance to nearest neighbor (meters)
min_area float 0.1 Minimum plot area (hectares)
min_length float 10 Minimum boundary segment length (meters)
min_precision integer 6 Minimum coordinate decimal places

Facility-Specific Parameters:

Parameter Type Default Description
commodity_radius float 5000 Radius in meters to check for commodity presence
building_density_threshold float 2e-10 Building density threshold
check_water_access boolean False Whether to check water access
check_port_proximity boolean False Whether to check port proximity
check_road_access boolean False Whether to check road access

Response:

Routes intelligently and returns either facility or plot validation results (see Facility Validation or Plot Validation for response formats).

Example Request:

curl -X POST "https://api.epoch.eco/validate_locations" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@locations.xlsx" \
  -F "normalize_input=true" \
  -F "location_type=facility" \
  -F "commodity_type=palm" \
  -F "commodity_radius=3000" \
  -F "check_water_access=true"

How It Works

When you submit data to the normalization service, the following process occurs:

1. File Format Recognition

The system automatically detects and parses your file format:

  • Native geospatial (GeoJSON, Shapefile, GeoPackage, KML): Automatically extracted and validated
  • Tabular data (CSV, XLSX): Analyzed for geographic columns (address, coordinates, WKT, lat/lon)
  • Unstructured formats (PDF, TXT): AI analysis to extract location and commodity information

2. Column Mapping

Your data structure is analyzed to identify:

  • Location data: Addresses, coordinates, geometries
  • Business information: Supplier/facility names, identifiers
  • Product information: Commodity type and quantity
  • Compliance data: DDS references, HS codes, production dates

3. Geocoding (if needed)

If your data contains addresses but no coordinates:

  • Addresses are geocoded to precise Point geometries
  • Processing status is reported (e.g., "completed 45 rows")
  • Geocoding errors are handled gracefully with fallback options

4. Commodity Detection

The system identifies the commodity type with intelligent fallback:

  1. Uses your commodity_type parameter if provided
  2. AI analysis of column headers and sample values
  3. Keyword search if AI detection misses the commodity
  4. Automatic normalization (e.g., "Palm Oil" → "palm", "Natural Rubber" → "rubber")

Recognized commodities: rubber, palm, cocoa, coffee, soy, timber, cattle

5. Country Enrichment

Country information is determined and validated:

  • From existing country columns in your data
  • From reverse geocoding using geometry
  • Used as context for validation and compliance checks

6. EUDR Payload Generation

Each location receives a standardized compliance payload:

{
  "hs_code": "1801",
  "product_trade_name": "Cocoa Beans",
  "product_quantity": 1000.0,
  "product_type": "cocoa",
  "production_country": "Ghana",
  "production_year": 2024,
  "supplier_name": "Cocoa Cooperative Ltd",
  "supplier_address": "Accra, Ghana",
  "referenced_dds": [
    {
      "referenceNumber": "DDS-2024-001",
      "verificationNumber": "VER-2024-001"
    }
  ]
}

This payload is ready for EUDR compliance reporting and data export.

7. Quality Metadata

The response includes processing metrics:

{
  "detected_type": "facility",
  "feature_count": 150,
  "original_filename": "suppliers.xlsx",
  "geocoding_status": "completed (45 rows)",
  "commodity_type_detected": "palm"
}

This helps you understand what was detected and how many records were processed successfully.

Smart Routing Logic

When using /validate_locations with normalize_input=true, the system intelligently routes to appropriate validation:

Routing Decision Tree

  1. User Explicit Input (highest priority)
  2. Use location_type parameter if provided

  3. Geometry Analysis

  4. Polygons/MultiPolygons → Plot validation
  5. Points/MultiPoints → Continue to next layer

  6. AI Classification

  7. If strongly contradicted by heuristics → Use heuristic
  8. Otherwise → Trust AI classification

  9. Heuristic Fallback (lowest priority)

  10. Keywords analysis (mill, factory → facility)
  11. Column patterns (survey, plot → plot)

Routing Examples

Scenario 1: Auto-detect from geometry

# File contains Points → Routes to facility validation
curl -X POST "https://api.epoch.eco/validate_locations" \
  -F "file=@facilities.geojson" \
  -F "normalize_input=true"
# Result: Facility validation logic applied

Scenario 2: Auto-detect from schema

# CSV detected as "Survey Data" by AI → Routes to plot validation
curl -X POST "https://api.epoch.eco/validate_locations" \
  -F "file=@plot_survey.xlsx" \
  -F "normalize_input=true"
# Result: Plot validation logic applied

Scenario 3: User override

# User specifies facility type → Always routes to facility validation
curl -X POST "https://api.epoch.eco/validate_locations" \
  -F "file=@locations.csv" \
  -F "location_type=facility" \
  -F "normalize_input=true"
# Result: Facility validation logic applied

Response Format

Normalization Response

{
  "normalized_geojson": {
    "type": "FeatureCollection",
    "properties": {
      "detected_type": "facility",
      "feature_count": 150,
      "geocoding_status": "completed (45 rows)",
      "commodity_type_detected": "palm"
    },
    "features": [
      {
        "type": "Feature",
        "geometry": {
          "type": "Point",
          "coordinates": [100.5018, 13.7563]
        },
        "properties": {
          "supplier_name": "Thailand Palm Cooperative",
          "supplier_address": "Bangkok, Thailand",
          "commodity_type": "palm",
          "country": "Thailand",
          "eudr_payload": {
            "hs_code": "1201",
            "product_type": "palm",
            "product_quantity": 5000.0,
            "production_country": "Thailand",
            "production_year": 2024
          }
        }
      }
    ]
  },
  "metadata": {
    "detected_type": "facility",
    "feature_count": 150,
    "geocoding_status": "completed (45 rows)",
    "commodity_type_detected": "palm"
  }
}

Usage Examples

Python

import requests
import json

url = "https://api.epoch.eco/normalize_locations"
headers = {
    "Authorization": "Bearer <your_firebase_token>"
}

# Upload supplier data for normalization
files = {
    'file': ('suppliers.csv', open('path/to/suppliers.csv', 'rb'), 'text/csv')
}

data = {
    'location_type': 'facility',
    'commodity_type': 'rubber'
}

response = requests.post(url, headers=headers, files=files, data=data)

if response.status_code == 200:
    result = response.json()
    normalized = result['normalized_geojson']

    print(f"Processed {len(normalized['features'])} features")
    print(f"Detected type: {result['metadata']['detected_type']}")
    print(f"Commodity: {result['metadata']['commodity_type_detected']}")
    print(f"Geocoding: {result['metadata']['geocoding_status']}")

    # Access individual features
    for feature in normalized['features']:
        props = feature['properties']
        print(f"  - {props.get('supplier_name', 'Unknown')}")
        print(f"    Location: {feature['geometry']['coordinates']}")
        if 'eudr_payload' in props:
            print(f"    EUDR Ready: Yes")
else:
    print(f"Error: {response.status_code} - {response.text}")

JavaScript

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

async function normalizeLocations(filePath, token, options = {}) {
    try {
        const formData = new FormData();

        // Add the file
        formData.append('file', fs.createReadStream(filePath));

        // Add optional parameters
        if (options.locationType) formData.append('location_type', options.locationType);
        if (options.commodityType) formData.append('commodity_type', options.commodityType);

        const response = await axios({
            method: 'post',
            url: 'https://api.epoch.eco/normalize_locations',
            headers: {
                'Authorization': `Bearer ${token}`,
                ...formData.getHeaders()
            },
            data: formData
        });

        const result = response.data;
        console.log(`Processed ${result.normalized_geojson.features.length} features`);
        console.log(`Commodity detected: ${result.metadata.commodity_type_detected}`);

        // Access normalized features
        result.normalized_geojson.features.forEach(feature => {
            const props = feature.properties;
            console.log(`  - ${props.supplier_name}`);
            if (props.eudr_payload) {
                console.log(`    EUDR Ready: Yes`);
            }
        });

    } catch (error) {
        console.error('Error:', error.response?.data || error.message);
    }
}

// Usage
normalizeLocations('./suppliers.csv', 'your_token', {
    locationType: 'facility',
    commodityType: 'rubber'
});

cURL

curl -X POST "https://api.epoch.eco/normalize_locations" \
  -H "Authorization: Bearer <your_firebase_token>" \
  -F "file=@suppliers.csv" \
  -F "location_type=facility" \
  -F "commodity_type=rubber" \
  --output normalized_result.json

Common Workflows

Workflow 1: Clean & Export

Normalize messy data, then export for downstream processing:

  1. Upload file with /normalize_locations
  2. Receive standardized GeoJSON with EUDR payloads
  3. Export result to your system

Workflow 2: Validate & Route

Accept any format, auto-detect type, and validate appropriately:

  1. Upload file with /validate_locations and normalize_input=true
  2. System auto-detects location type (facility or plot)
  3. Receive full validation results

Workflow 3: Batch Processing

Normalize multiple files, combine results, then analyze:

  1. Call /normalize_locations for each file
  2. Combine normalized GeoJSON files
  3. Submit combined result to batch analysis

Performance Characteristics

Operation Typical Time Notes
File load & parsing 1-5s Varies by format and size
Schema detection (Gemini) 3-10s One API call per file
Geocoding (per address) 200-500ms Parallel batch processing
Country lookup 100-200ms GIS reverse geocoding
EUDR payload generation 10-50ms Per feature

Optimization Tips: - Pre-clean data when possible (remove invalid rows) - Use more specific commodity types (helps skip heuristic search) - Batch multiple files for parallel processing - Geocode separately if address quality is low

Error Handling

Common Errors

Error Cause Solution
422: No valid geometries File has no location data Ensure file includes WKT, coordinates, or address
400: Invalid file format Unsupported format Use supported formats (GeoJSON, Shapefile, CSV, XLSX, etc.)
401: Unauthorized Invalid or missing token Verify valid Firebase JWT in Authorization header
413: Payload too large File exceeds size limit Split into smaller files or use batch API
503: Service unavailable Geocoding or AI service down Retry after 30 seconds with exponential backoff

Best Practices

Data Preparation

  • Clean column names: Use descriptive, consistent naming
  • Standardize formats: Use consistent date/number formats
  • Include geometry: Provide WKT, coordinates, or address data
  • Add context: Include country, commodity, or business type when known

Parameter Configuration

  • Specify commodity_type: Speeds up detection and improves accuracy
  • Use location_type when known: Specify 'facility' or 'plot' to skip detection
  • Set appropriate limits: Use limit parameter to reduce response size
  • Enable normalization when needed: Set normalize_input=true for non-standard formats

Integration Patterns

  • Single-file normalization: Use /normalize_locations for quick standardization
  • Multi-format validation: Use /validate_locations with normalize_input=true for comprehensive analysis
  • Batch workflows: Process multiple files with /normalize_locations, combine results, then analyze

See Also