Plot Validation¶
The /validate_plots and /validate_plot API endpoints perform comprehensive integrity checks on agricultural plot geometries to ensure they meet quality standards for sustainability analysis. These endpoints validate geometric properties, detect common issues, and provide detailed feedback for data quality improvement.
Validation Checks¶
The system performs comprehensive integrity checks to ensure plot geometries are suitable for:
- Sustainability Analysis: Accurate area calculations and boundary definitions
- Satellite Analysis: Proper geometric structure for Earth Engine processing
- Compliance Reporting: Valid geometries for regulatory submissions
- Data Quality: Clean, consistent data for downstream processing
Endpoints¶
POST /validate_plots¶
Validates multiple plot geometries from an uploaded GeoJSON file.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | GeoJSON file containing plot geometries |
country |
string | No | - | Country for commodity validation (optional) |
commodity_type |
string | No | - | Commodity type for validation (optional) |
Response:
Returns a streaming GeoJSON FeatureCollection with:
- Collection Properties: Aggregate integrity statistics
- Feature Properties: Individual validation results for each plot
Example Request:
curl -X POST "https://api.epoch.eco/validate_plots" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@plots.geojson" \
-F "country=Thailand" \
-F "commodity_type=rubber" \
-F "min_area=0.5" \
-F "distance=1000" \
-F "angle=30"
Usage Examples¶
Python¶
import requests
import json
# Set up the request
url = "https://api.epoch.eco/validate_plots"
headers = {
"Authorization": "Bearer <your_firebase_token>"
}
# Prepare the file and parameters
files = {
'file': ('my_plots.geojson', open('path/to/your/file.geojson', 'rb'), 'application/geo+json')
}
data = {
'country': 'Thailand', # Optional: for commodity validation
'commodity_type': 'rubber', # Optional: for commodity validation
'min_area': 0.5, # Minimum area threshold
'distance': 1000, # Maximum distance between vertices
'angle': 30, # Minimum angle threshold
'min_precision': 6, # Minimum coordinate precision
'min_length': 10, # Minimum boundary segment length
'overlap_threshold': 0.5, # Overlap threshold
'max_distance': 250000 # Maximum distance to nearest neighbor
}
# Make the request
response = requests.post(url, headers=headers, files=files, data=data)
# Check if request was successful
if response.status_code == 200:
result = response.json()
# Access the summary statistics
properties = result['properties']
print(f"Total plots: {properties.get('total_plots', 0)}")
print(f"Valid plots: {properties.get('valid_plots', 0)}")
print(f"Invalid plots: {properties.get('invalid_plots', 0)}")
print(f"Area issues: {properties.get('area_issues', 0)}")
print(f"Self-intersection issues: {properties.get('self_intersection_issues', 0)}")
# Access individual features with validation results
for feature in result['features']:
props = feature['properties']
print(f"Plot {props.get('plot_id', 'unknown')}: Valid={props.get('overall_valid', False)}")
else:
print(f"Error: {response.status_code} - {response.text}")
JavaScript¶
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
async function validatePlots(filePath, token, options = {}) {
try {
const formData = new FormData();
// Add the file
formData.append('file', fs.createReadStream(filePath));
// Add optional parameters
if (options.country) formData.append('country', options.country);
if (options.commodity_type) formData.append('commodity_type', options.commodity_type);
if (options.min_area) formData.append('min_area', options.min_area);
if (options.distance) formData.append('distance', options.distance);
if (options.angle) formData.append('angle', options.angle);
const response = await axios({
method: 'post',
url: 'https://api.epoch.eco/validate_plots',
headers: {
'Authorization': `Bearer ${token}`,
...formData.getHeaders()
},
data: formData
});
// Process the response
const result = response.data;
// Access the summary statistics
const properties = result.properties;
console.log(`Total plots: ${properties.total_plots || 0}`);
console.log(`Valid plots: ${properties.valid_plots || 0}`);
console.log(`Invalid plots: ${properties.invalid_plots || 0}`);
console.log(`Area issues: ${properties.area_issues || 0}`);
console.log(`Self-intersection issues: ${properties.self_intersection_issues || 0}`);
// Access individual features with validation results
result.features.forEach(feature => {
const props = feature.properties;
console.log(`Plot ${props.plot_id || 'unknown'}: Valid=${props.overall_valid || false}`);
});
} catch (error) {
console.error('Error:', error.response?.data || error.message);
}
}
// Usage example
validatePlots('./my_plots.geojson', 'your_token', {
country: 'Thailand',
commodity_type: 'rubber',
min_area: 0.5,
distance: 1000,
angle: 30
});
cURL¶
curl -X POST "https://api.epoch.eco/validate_plots" \
-H "Authorization: Bearer <your_firebase_token>" \
-H "Accept: application/json" \
-F "file=@path/to/your/file.geojson" \
-F "country=Thailand" \
-F "commodity_type=rubber" \
-F "min_area=0.5" \
-F "distance=1000" \
-F "angle=30" \
-F "min_precision=6" \
-F "min_length=10" \
-F "overlap_threshold=0.5" \
-F "max_distance=250000" \
--output validation_results.json
GET /validate_plot¶
Validates a single plot geometry using a WKT geometry string.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
geometry |
string | Yes | - | WKT geometry (Polygon, Multipolygon, etc.) |
country |
string | No | - | Country for commodity validation (optional) |
commodity_type |
string | No | - | Commodity type for validation (optional) |
Response:
Returns a GeoJSON FeatureCollection with a single feature containing validation results.
Example Request:
curl -X GET "https://api.epoch.eco/validate_plot" \
-H "Authorization: Bearer YOUR_TOKEN" \
-G \
-d "geometry=POLYGON((100.5018 13.7563, 100.5028 13.7563, 100.5028 13.7573, 100.5018 13.7573, 100.5018 13.7563))" \
-d "country=Thailand"
Validation Checks¶
Geometric Integrity Checks¶
1. Area Validation¶
- Purpose: Ensures plots meet minimum size requirements
- Check:
area >= min_area(default: 0.1 hectares) - Issue: Plots too small for meaningful analysis
- Fix: Combine small adjacent plots or increase minimum threshold
2. Self-Intersection Detection¶
- Purpose: Identifies invalid polygon geometries
- Check: No self-intersecting boundaries
- Issue: Invalid geometry that can cause analysis errors
- Fix: Use GIS tools to fix self-intersections
3. Duplicate Vertices¶
- Purpose: Removes redundant coordinate points
- Check: No consecutive identical coordinates
- Issue: Inefficient geometry storage and potential analysis issues
- Fix: Remove duplicate vertices using GIS tools
4. Angle Validation¶
- Purpose: Ensures reasonable boundary angles
- Check: All angles >= min_angle (default: 1.0 degrees)
- Issue: Extremely sharp angles may indicate digitization errors
- Fix: Smooth boundaries or adjust digitization precision
5. Distance Validation¶
- Purpose: Ensures reasonable distances between vertices
- Check: All vertex distances <= max_distance (default: 1000m)
- Issue: Extremely long segments may indicate missing vertices
- Fix: Add intermediate vertices along long boundaries
6. Precision Validation¶
- Purpose: Ensures adequate coordinate precision
- Check: Coordinates have >= min_precision decimal places (default: 6)
- Issue: Low precision may cause analysis inaccuracies
- Fix: Increase coordinate precision in source data
7. Boundary Segment Length¶
- Purpose: Ensures reasonable boundary segment lengths
- Check: All segments >= min_length (default: 10m)
- Issue: Very short segments may indicate digitization noise
- Fix: Simplify geometry or adjust digitization settings
Spatial Relationship Checks¶
8. Overlap Detection¶
- Purpose: Identifies overlapping plots within the dataset
- Check: Overlap area <= overlap_threshold (default: 1% of plot area)
- Issue: Overlapping plots can cause double-counting in analysis
- Fix: Resolve overlaps by adjusting boundaries or removing duplicates
9. Nearest Neighbor Distance¶
- Purpose: Identifies isolated plots that may be errors
- Check: Distance to nearest neighbor <= max_neighbor_distance (default: 1000m)
- Issue: Isolated plots may be digitization errors or require separate handling
- Fix: Verify plot location or adjust analysis parameters
Geometry Type Handling¶
10. Geometry Collection Processing¶
- Purpose: Handles complex geometry collections
- Check: Separates and validates individual geometry components
- Issue: Mixed geometry types in single features
- Fix: Split into separate features by geometry type
11. MultiPolygon Flattening¶
- Purpose: Processes MultiPolygon geometries
- Check: Separates MultiPolygons into individual Polygons
- Issue: Complex MultiPolygon structures
- Fix: Flatten to individual Polygon features
Confidence Scoring¶
The validation system calculates confidence scores for both individual plots and the overall dataset based on the severity and frequency of detected issues.
Confidence Score Calculation¶
The system uses a weighted penalty system that starts with a perfect score (1.0) and deducts points based on detected issues:
Issue Weights (Penalty System)¶
| Issue Type | Weight | Description |
|---|---|---|
intersects_another_polygon_count |
0.30 | Overlaps are serious - can cause double-counting |
self_intersection_count |
0.25 | Self-intersections are serious - invalid geometry |
distance_nearest_neighbor_count |
0.20 | Isolated plots are concerning - may be errors |
commodity_absence_count |
0.15 | Wrong commodity area is concerning |
area_too_small_count |
0.10 | Small areas are less serious |
max_distance_gt_min_distance_count |
0.10 | Long edges are less serious |
max_angle_lt_min_angle_count |
0.10 | Sharp angles are less serious |
bad_precision_count |
0.05 | Precision issues are minor |
short_boundary_segments_count |
0.05 | Short segments are minor |
triangular_geometry_count |
0.05 | Triangular shapes are minor |
duplicate_vertices_count |
0.05 | Duplicate vertices are minor |
nested_geometry_collection_count |
0.05 | Nested collections are minor |
simplifiable_collections_count |
0.05 | Simplifiable collections are minor |
Confidence Level Thresholds¶
- High (0.8-1.0): Excellent data quality with minimal issues
- Medium (0.5-0.8): Good data quality with some issues requiring attention
- Low (0.0-0.5): Poor data quality with significant issues requiring correction
Calculation Process¶
- Start with perfect score: 1.0
- Calculate penalty ratio:
min(issue_count / total_plots, 1.0) - Apply weighted penalty:
penalty_ratio × issue_weight - Final score:
max(0.0, 1.0 - total_penalty)
Individual Plot Confidence¶
Each plot receives its own confidence score based on the specific issues detected for that plot:
- Plot-specific penalties: Applied based on individual plot issues
- Same thresholds: High (≥0.8), Medium (≥0.5), Low (<0.5)
- Individual scoring: Each plot's confidence is independent of others
Response Format¶
Collection Properties (Aggregate Statistics)¶
{
"type": "FeatureCollection",
"properties": {
"total_plots": 150,
"valid_plots": 142,
"invalid_plots": 8,
"area_issues": 3,
"self_intersection_issues": 2,
"duplicate_vertex_issues": 5,
"angle_issues": 1,
"distance_issues": 0,
"precision_issues": 2,
"segment_length_issues": 1,
"overlap_issues": 4,
"neighbor_distance_issues": 2,
"total_area_hectares": 1250.5,
"average_plot_area": 8.34,
"confidence_score": 0.756,
"confidence_level": "medium"
},
"features": [...]
}
Feature Properties (Individual Results)¶
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[100.5018, 13.7563], [100.5028, 13.7563], [100.5028, 13.7573], [100.5018, 13.7573], [100.5018, 13.7563]]]
},
"properties": {
"plot_id": "plot_001",
"area_hectares": 1.25,
"area_valid": true,
"self_intersection": false,
"duplicate_vertices": false,
"angle_valid": true,
"distance_valid": true,
"precision_valid": true,
"segment_length_valid": true,
"overlap_detected": false,
"neighbor_distance_valid": true,
"overall_valid": true,
"validation_issues": [],
"validation_warnings": [],
"confidence_score": 0.850,
"confidence_level": "high"
}
}
Validation Results¶
High Confidence Plot (0.8-1.0)¶
- Excellent data quality with minimal issues
- All critical geometric checks pass
- Suitable for sustainability analysis
- No significant issues requiring attention
Medium Confidence Plot (0.5-0.8)¶
- Good data quality with some issues
- Most geometric checks pass
- Suitable for analysis with minor caveats
- Some issues may require attention for optimal results
Low Confidence Plot (0.0-0.5)¶
- Poor data quality with significant issues
- One or more critical geometric checks fail
- Requires correction before analysis
- Specific issues identified in response
Confidence Score Interpretation¶
The confidence score provides a quantitative measure of data quality:
- 0.9-1.0: Exceptional quality, ready for analysis
- 0.8-0.9: High quality, minor issues only
- 0.7-0.8: Good quality, some attention needed
- 0.5-0.7: Moderate quality, several issues present
- 0.3-0.5: Poor quality, significant issues
- 0.0-0.3: Very poor quality, major corrections needed
Common Issues and Solutions¶
1. Small Plot Areas¶
Issue: Plots smaller than minimum threshold Solution:
- Increase
min_areaparameter if appropriate - Combine adjacent small plots
- Use different analysis approach for small plots
2. Self-Intersections¶
Issue: Invalid polygon boundaries Solution:
- Use GIS software to fix geometries
- Re-digitize problematic boundaries
- Use buffer operations to clean boundaries
3. Overlapping Plots¶
Issue: Multiple plots occupy same area Solution:
- Adjust plot boundaries to eliminate overlaps
- Remove duplicate plots
- Use spatial analysis to resolve conflicts
4. Low Coordinate Precision¶
Issue: Insufficient decimal places in coordinates Solution:
- Increase precision in source data
- Adjust
min_precisionparameter if appropriate - Re-digitize with higher precision
5. Isolated Plots¶
Issue: Plots far from nearest neighbors Solution:
- Verify plot location accuracy
- Adjust
max_neighbor_distanceparameter - Handle isolated plots separately in analysis
Performance Considerations¶
- Large Files: Use streaming for files with >1000 plots
- Complex Geometries: MultiPolygons and GeometryCollections require more processing time
- Precision: Higher precision requirements increase processing time
- Spatial Checks: Overlap and neighbor distance checks are computationally intensive
Best Practices¶
1. Parameter Tuning¶
- Start with default values and adjust based on your data characteristics
- Lower angle thresholds (0.5-1.0°) for more sensitive spike detection
- Lower distance thresholds (500-800m) for more sensitive edge detection
- Adjust area thresholds based on your expected plot sizes
- Increase precision requirements for high-accuracy applications
2. Data Preparation¶
- Clean geometries before uploading using GIS software like QGIS
- Fix self-intersections and topological issues
- Ensure adequate coordinate precision (6+ decimal places)
- Remove duplicate vertices and simplify complex geometries
- Validate coordinate systems and ensure proper projection