Data Normalization¶
The /normalize_locations and /validate_locations API endpoints provide intelligent data cleaning and standardization for geospatial data in any format. These endpoints use AI-powered (Gemini) analysis combined with geocoding to transform messy, unstructured data into standardized GeoJSON with complete EUDR compliance payloads.
Overview¶
Data normalization is essential for supply chain analysis because real-world datasets often arrive in diverse formats with inconsistent schemas. The normalization service automatically handles this complexity by:
- Auto-detecting schemas from Excel, CSV, and other unstructured formats using Gemini AI
- Mapping columns to standard fields (supplier name, address, location, commodity, etc.)
- Geocoding addresses when geometry is missing
- Detecting commodities intelligently using AI and heuristic fallbacks
- Generating EUDR payloads with detected/inferred data
- Smart routing to facility or plot validation based on geometry type and detected data
Supported Input Formats¶
The system accepts multiple geospatial and tabular formats:
| Format | Type | Notes |
|---|---|---|
| GeoJSON | Native Geospatial | Direct parsing, no normalization needed |
| Shapefile | Native Geospatial | Automatic unpacking and parsing |
| GeoPackage | Native Geospatial | OGR/GDAL compatible |
| GeoParquet | Native Geospatial | Apache Parquet-based format |
| KML/KMZ | Native Geospatial | Google Earth compatible |
| CSV | Tabular | With or without geometry columns |
| XLSX | Tabular | With or without geometry columns |
| Unstructured | Parsed by Gemini AI | |
| TXT | Unstructured | Parsed by Gemini AI |
Endpoints¶
POST /normalize_locations¶
Cleans and normalizes geospatial data without running full validation checks.
Use case: Standardize raw data before batch processing, detailed analysis, or export to downstream systems.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | Input file in any supported format |
location_type |
string | No | None | Specify 'facility' or 'plot'. If None, auto-detect from geometry |
commodity_type |
string | No | None | Commodity type (e.g., 'palm', 'soy', 'cocoa'). If not provided, infer from data |
Response:
Returns a GeoJSON FeatureCollection with:
- Normalized features with standardized properties
- EUDR payloads including DDS references and verification numbers
- Metadata with detection results and processing details
Example Request:
curl -X POST "https://api.epoch.eco/normalize_locations" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@suppliers.xlsx" \
-F "location_type=facility" \
-F "commodity_type=palm"
POST /validate_locations¶
Universal location validator that combines normalization with comprehensive validation checks.
Use case: Accept any file format, auto-detect location type (facility or plot), and return validation results with full analysis.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | Input file in any supported format |
normalize_input |
boolean | No | False | Enable AI-powered normalization before validation |
location_type |
string | No | None | Specify 'facility' or 'plot'. If None, auto-detect |
commodity_type |
string | No | None | Commodity type for context and validation |
limit |
integer | No | 100 | Max ranked features to return (top N by confidence) |
country |
string | No | None | Country context for validation |
Plot-Specific Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
angle |
float | 30 | Minimum angle threshold (degrees) |
distance |
float | 1000 | Maximum distance between vertices (meters) |
overlap_threshold |
float | 0.5 | Overlap detection threshold (0-1) |
max_distance |
float | 250000 | Maximum distance to nearest neighbor (meters) |
min_area |
float | 0.1 | Minimum plot area (hectares) |
min_length |
float | 10 | Minimum boundary segment length (meters) |
min_precision |
integer | 6 | Minimum coordinate decimal places |
Facility-Specific Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
commodity_radius |
float | 5000 | Radius in meters to check for commodity presence |
building_density_threshold |
float | 2e-10 | Building density threshold |
check_water_access |
boolean | False | Whether to check water access |
check_port_proximity |
boolean | False | Whether to check port proximity |
check_road_access |
boolean | False | Whether to check road access |
Response:
Routes intelligently and returns either facility or plot validation results (see Facility Validation or Plot Validation for response formats).
Example Request:
curl -X POST "https://api.epoch.eco/validate_locations" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@locations.xlsx" \
-F "normalize_input=true" \
-F "location_type=facility" \
-F "commodity_type=palm" \
-F "commodity_radius=3000" \
-F "check_water_access=true"
How It Works¶
When you submit data to the normalization service, the following process occurs:
1. File Format Recognition¶
The system automatically detects and parses your file format:
- Native geospatial (GeoJSON, Shapefile, GeoPackage, KML): Automatically extracted and validated
- Tabular data (CSV, XLSX): Analyzed for geographic columns (address, coordinates, WKT, lat/lon)
- Unstructured formats (PDF, TXT): AI analysis to extract location and commodity information
2. Column Mapping¶
Your data structure is analyzed to identify:
- Location data: Addresses, coordinates, geometries
- Business information: Supplier/facility names, identifiers
- Product information: Commodity type and quantity
- Compliance data: DDS references, HS codes, production dates
3. Geocoding (if needed)¶
If your data contains addresses but no coordinates:
- Addresses are geocoded to precise Point geometries
- Processing status is reported (e.g., "completed 45 rows")
- Geocoding errors are handled gracefully with fallback options
4. Commodity Detection¶
The system identifies the commodity type with intelligent fallback:
- Uses your
commodity_typeparameter if provided - AI analysis of column headers and sample values
- Keyword search if AI detection misses the commodity
- Automatic normalization (e.g., "Palm Oil" → "palm", "Natural Rubber" → "rubber")
Recognized commodities: rubber, palm, cocoa, coffee, soy, timber, cattle
5. Country Enrichment¶
Country information is determined and validated:
- From existing country columns in your data
- From reverse geocoding using geometry
- Used as context for validation and compliance checks
6. EUDR Payload Generation¶
Each location receives a standardized compliance payload:
{
"hs_code": "1801",
"product_trade_name": "Cocoa Beans",
"product_quantity": 1000.0,
"product_type": "cocoa",
"production_country": "Ghana",
"production_year": 2024,
"supplier_name": "Cocoa Cooperative Ltd",
"supplier_address": "Accra, Ghana",
"referenced_dds": [
{
"referenceNumber": "DDS-2024-001",
"verificationNumber": "VER-2024-001"
}
]
}
This payload is ready for EUDR compliance reporting and data export.
7. Quality Metadata¶
The response includes processing metrics:
{
"detected_type": "facility",
"feature_count": 150,
"original_filename": "suppliers.xlsx",
"geocoding_status": "completed (45 rows)",
"commodity_type_detected": "palm"
}
This helps you understand what was detected and how many records were processed successfully.
Smart Routing Logic¶
When using /validate_locations with normalize_input=true, the system intelligently routes to appropriate validation:
Routing Decision Tree¶
- User Explicit Input (highest priority)
-
Use
location_typeparameter if provided -
Geometry Analysis
- Polygons/MultiPolygons → Plot validation
-
Points/MultiPoints → Continue to next layer
-
AI Classification
- If strongly contradicted by heuristics → Use heuristic
-
Otherwise → Trust AI classification
-
Heuristic Fallback (lowest priority)
- Keywords analysis (mill, factory → facility)
- Column patterns (survey, plot → plot)
Routing Examples¶
Scenario 1: Auto-detect from geometry
# File contains Points → Routes to facility validation
curl -X POST "https://api.epoch.eco/validate_locations" \
-F "file=@facilities.geojson" \
-F "normalize_input=true"
# Result: Facility validation logic applied
Scenario 2: Auto-detect from schema
# CSV detected as "Survey Data" by AI → Routes to plot validation
curl -X POST "https://api.epoch.eco/validate_locations" \
-F "file=@plot_survey.xlsx" \
-F "normalize_input=true"
# Result: Plot validation logic applied
Scenario 3: User override
# User specifies facility type → Always routes to facility validation
curl -X POST "https://api.epoch.eco/validate_locations" \
-F "file=@locations.csv" \
-F "location_type=facility" \
-F "normalize_input=true"
# Result: Facility validation logic applied
Response Format¶
Normalization Response¶
{
"normalized_geojson": {
"type": "FeatureCollection",
"properties": {
"detected_type": "facility",
"feature_count": 150,
"geocoding_status": "completed (45 rows)",
"commodity_type_detected": "palm"
},
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [100.5018, 13.7563]
},
"properties": {
"supplier_name": "Thailand Palm Cooperative",
"supplier_address": "Bangkok, Thailand",
"commodity_type": "palm",
"country": "Thailand",
"eudr_payload": {
"hs_code": "1201",
"product_type": "palm",
"product_quantity": 5000.0,
"production_country": "Thailand",
"production_year": 2024
}
}
}
]
},
"metadata": {
"detected_type": "facility",
"feature_count": 150,
"geocoding_status": "completed (45 rows)",
"commodity_type_detected": "palm"
}
}
Usage Examples¶
Python¶
import requests
import json
url = "https://api.epoch.eco/normalize_locations"
headers = {
"Authorization": "Bearer <your_firebase_token>"
}
# Upload supplier data for normalization
files = {
'file': ('suppliers.csv', open('path/to/suppliers.csv', 'rb'), 'text/csv')
}
data = {
'location_type': 'facility',
'commodity_type': 'rubber'
}
response = requests.post(url, headers=headers, files=files, data=data)
if response.status_code == 200:
result = response.json()
normalized = result['normalized_geojson']
print(f"Processed {len(normalized['features'])} features")
print(f"Detected type: {result['metadata']['detected_type']}")
print(f"Commodity: {result['metadata']['commodity_type_detected']}")
print(f"Geocoding: {result['metadata']['geocoding_status']}")
# Access individual features
for feature in normalized['features']:
props = feature['properties']
print(f" - {props.get('supplier_name', 'Unknown')}")
print(f" Location: {feature['geometry']['coordinates']}")
if 'eudr_payload' in props:
print(f" EUDR Ready: Yes")
else:
print(f"Error: {response.status_code} - {response.text}")
JavaScript¶
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
async function normalizeLocations(filePath, token, options = {}) {
try {
const formData = new FormData();
// Add the file
formData.append('file', fs.createReadStream(filePath));
// Add optional parameters
if (options.locationType) formData.append('location_type', options.locationType);
if (options.commodityType) formData.append('commodity_type', options.commodityType);
const response = await axios({
method: 'post',
url: 'https://api.epoch.eco/normalize_locations',
headers: {
'Authorization': `Bearer ${token}`,
...formData.getHeaders()
},
data: formData
});
const result = response.data;
console.log(`Processed ${result.normalized_geojson.features.length} features`);
console.log(`Commodity detected: ${result.metadata.commodity_type_detected}`);
// Access normalized features
result.normalized_geojson.features.forEach(feature => {
const props = feature.properties;
console.log(` - ${props.supplier_name}`);
if (props.eudr_payload) {
console.log(` EUDR Ready: Yes`);
}
});
} catch (error) {
console.error('Error:', error.response?.data || error.message);
}
}
// Usage
normalizeLocations('./suppliers.csv', 'your_token', {
locationType: 'facility',
commodityType: 'rubber'
});
cURL¶
curl -X POST "https://api.epoch.eco/normalize_locations" \
-H "Authorization: Bearer <your_firebase_token>" \
-F "file=@suppliers.csv" \
-F "location_type=facility" \
-F "commodity_type=rubber" \
--output normalized_result.json
Common Workflows¶
Workflow 1: Clean & Export¶
Normalize messy data, then export for downstream processing:
- Upload file with
/normalize_locations - Receive standardized GeoJSON with EUDR payloads
- Export result to your system
Workflow 2: Validate & Route¶
Accept any format, auto-detect type, and validate appropriately:
- Upload file with
/validate_locationsandnormalize_input=true - System auto-detects location type (facility or plot)
- Receive full validation results
Workflow 3: Batch Processing¶
Normalize multiple files, combine results, then analyze:
- Call
/normalize_locationsfor each file - Combine normalized GeoJSON files
- Submit combined result to batch analysis
Performance Characteristics¶
| Operation | Typical Time | Notes |
|---|---|---|
| File load & parsing | 1-5s | Varies by format and size |
| Schema detection (Gemini) | 3-10s | One API call per file |
| Geocoding (per address) | 200-500ms | Parallel batch processing |
| Country lookup | 100-200ms | GIS reverse geocoding |
| EUDR payload generation | 10-50ms | Per feature |
Optimization Tips: - Pre-clean data when possible (remove invalid rows) - Use more specific commodity types (helps skip heuristic search) - Batch multiple files for parallel processing - Geocode separately if address quality is low
Error Handling¶
Common Errors¶
| Error | Cause | Solution |
|---|---|---|
| 422: No valid geometries | File has no location data | Ensure file includes WKT, coordinates, or address |
| 400: Invalid file format | Unsupported format | Use supported formats (GeoJSON, Shapefile, CSV, XLSX, etc.) |
| 401: Unauthorized | Invalid or missing token | Verify valid Firebase JWT in Authorization header |
| 413: Payload too large | File exceeds size limit | Split into smaller files or use batch API |
| 503: Service unavailable | Geocoding or AI service down | Retry after 30 seconds with exponential backoff |
Best Practices¶
Data Preparation¶
- Clean column names: Use descriptive, consistent naming
- Standardize formats: Use consistent date/number formats
- Include geometry: Provide WKT, coordinates, or address data
- Add context: Include country, commodity, or business type when known
Parameter Configuration¶
- Specify commodity_type: Speeds up detection and improves accuracy
- Use location_type when known: Specify 'facility' or 'plot' to skip detection
- Set appropriate limits: Use
limitparameter to reduce response size - Enable normalization when needed: Set
normalize_input=truefor non-standard formats
Integration Patterns¶
- Single-file normalization: Use
/normalize_locationsfor quick standardization - Multi-format validation: Use
/validate_locationswithnormalize_input=truefor comprehensive analysis - Batch workflows: Process multiple files with
/normalize_locations, combine results, then analyze
See Also¶
- Facility Validation - Detailed facility validation documentation
- Plot Validation - Detailed plot validation documentation
- Core API Reference - Overview of all Core API endpoints
- Authentication Guide - API authentication details