AGB Calibration¶
The /calibrate_agb endpoint allows users to calibrate the API's above-ground biomass (AGB) model using field-collected tree inventory data. This produces calibrated biomass estimates that account for site-specific allometric relationships, improving accuracy for carbon accounting use cases that require field-validated results.
Overview¶
Epoch's default AGB model is trained on GEDI LiDAR L4A footprints and produces globally consistent predictions. However, for high-confidence carbon accounting (e.g. insetting, voluntary carbon credits, verified removals), field-collected data is needed to calibrate the model to local conditions — species composition, wood density, allometric equations, and stand structure.
The /calibrate_agb endpoint accepts tree inventory data (big trees and small trees CSVs) alongside plot coordinates, and triggers a calibration pipeline that:
- Uploads field data to cloud storage
- Runs the AGB calibration notebook (Vertex AI) which fits a site-specific allometric model
- Produces calibrated biomass estimates for each plot
- Stores results in
*_calibratedtables for retrieval via the Fetch API
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
filename |
string | Yes | - | Collection name to assign to the calibrated results |
big_trees |
File | Yes | - | CSV of large tree measurements (DBH, height, species) |
small_trees |
File | Yes | - | CSV of small tree/sapling measurements |
plot_coordinates |
File | Yes | - | CSV or GeoJSON of field plot coordinates |
table_type |
string | No | batch_calibrated |
Output table type(s). Options: batch_calibrated, stat_calibrated, or both (comma-separated) |
start_date |
string | No | 2017-01-01 |
Start date for zonal statistics (YYYY-MM-DD) |
end_date |
string | No | today | End date for zonal statistics (YYYY-MM-DD) |
wood_density |
float | No | 0.55 |
Wood density ratio (g/cm³) for AGB allometry |
buffer_size |
int | No | 100 |
Buffer size (meters) for batch_buffer table type |
stat_region |
string | No | watershed |
Region for statistical comparison: watershed, admin, or supply_shed |
ignore_strata |
string | No | [] |
Strata numbers to exclude from removals accounting |
Table Types¶
| Table Type | Description |
|---|---|
batch_calibrated |
Plot-level calibrated results (default) |
stat_calibrated |
Statistical region-level calibrated results |
batch |
Standard (non-calibrated) batch results |
stat |
Standard statistical region results |
batch_buffer |
Buffered plot results (uses buffer_size) |
Use table_type=batch_calibrated,stat_calibrated to generate both plot-level and regional calibrated results.
Input File Formats¶
All three CSVs join on plot_id. The calibration notebook accepts
either the canonical column names below or the ODK-export-style aliases
(in brackets) so field crews can deliver their tablet export unchanged.
Big Trees CSV¶
One row per individually-measured tree above the small-tree threshold (typically > 10 cm DBH).
| Column | Required | Description |
|---|---|---|
plot_id |
Yes | Plot identifier. Must equal the plot_uid of a sample point returned by /sampling_design (see the samples GeoJSON) so the calibration joins back to the sampled plot. unique_plot_id is accepted as a legacy alias. |
dbh_cm (or data:circumference (in centimeters)) |
Yes | Diameter at breast height in cm. When the ODK alias is used, the notebook divides by π. |
height_m (or data:tree height (in meters)) |
Yes | Total tree height in m. Zeros are treated as missing. |
subplot_location |
No | Sub-plot label inside the main plot; included in the row id when present. |
count_extent |
No | full_plot (default) or sub-plot. Tells the notebook whether the count applies to the whole plot or a sub-plot. |
subplot_size_m |
No | Side length of the sub-plot the row was counted in (e.g. 5 for a 5 × 5 m sub-plot). Read from the concatenated trees DataFrame, so the column lives on the tree rows. Used to extrapolate sub-plot tree counts to the main plot area; without it the notebook uses the hardcoded 9.0 fallback extrapolation factor. |
species |
No | Free-text species name. |
wood_density |
No | Species-specific wood density (g/cm³). When present, overrides the global wood_density form parameter for that row. |
Small Trees CSV¶
One row per sub-plot count of trees below the big-tree threshold (typically < 10 cm DBH). Measurements are usually counted per sub-plot rather than per tree.
| Column | Required | Description |
|---|---|---|
plot_id |
Yes | Plot identifier (same join key as big trees). unique_plot_id accepted as legacy alias. |
avg_dbh_cm (or data:average circumference (in centimeters)) |
Yes | Average DBH of small trees in the sub-plot (cm). |
avg_height_m (or data:average tree height (in meters)) |
Yes | Average height of small trees in the sub-plot (m). |
tree_count (or tree_count_lt_10cm or data:Tree Count (< 10 cm DBH)) |
Yes | Count of small trees in the sub-plot. |
count_extent (or data:Count Extent) |
No | Defaults to sub-plot. |
subplot_size_m |
No | Side length of the sub-plot the count was taken in (e.g. 5 for a 5 × 5 m sub-plot). Read from the concatenated trees DataFrame; needed for accurate per-ha extrapolation. Without it the notebook falls back to a hardcoded 9.0 extrapolation factor. |
species |
No | Free-text species name. |
Plot Coordinates CSV¶
One row per sample plot. The canonical, minimal-required schema is just the identifier + coordinates:
| Column | Required | Description |
|---|---|---|
plot_id |
Yes | Plot identifier. Must match the plot_uid returned by /sampling_design in the samples GeoJSON. unique_plot_id accepted as legacy alias. |
latitude / longitude |
Yes | Plot centre coordinates in WGS84 (EPSG:4326). |
Optional columns the notebook will pick up if present:
| Column | Required | Description |
|---|---|---|
plot_size_m |
No | Side length of the main square measurement plot in metres (e.g. 35 for a 35 × 35 m plot). Today's deliveries typically omit this and the notebook falls back to DEFAULT_PLOT_SIZE_M = 30.0 m, which makes per-hectare extrapolation less accurate. Populate this column if your plots aren't 30 × 30 m. |
Note: subplot_size_m goes on the tree CSVs (big trees / small trees),
not on plot coordinates. The notebook reads it from the concatenated
trees DataFrame, so a subplot_size_m column added to
plot_coordinates.csv is silently ignored.
Known gap: there is no endpoint-level form parameter for plot size
today, even though plot size is typically a per-deployment constant
set by the methodology (e.g. 35 × 35 m for cashew per TSC-BioCR), not
something that varies between plots. Until a default_plot_size_m
form parameter lands, set the size per row via plot_size_m (every
row to the same value) whenever the methodology size differs from the
30 m default.
Joining to a sampling design¶
The plot_id column is the same identifier as the plot_uid column
emitted by /sampling_design in the samples table / samples GeoJSON
deliverable. Field crews visit the points listed in that file, run
their inventory protocol per the TSC-BioCR white paper, and return
the three CSVs with plot_id equal to the plot_uid of the sample
they measured. The calibration automatically joins the field
measurements back to the stratified sampling design.
For backward compatibility, the notebook still accepts unique_plot_id
when plot_id is absent.
"Plot" means two different things in this pipeline¶
Two distinct plot concepts coexist and they are not interchangeable:
| Concept | What it is | Where it lives |
|---|---|---|
| Sampling-design plot | The land-tenure or intervention-area polygon that the sampling design picked a sample point inside (a farmer's parcel, a cashew block, an intervention sub-area). Variable size, often less than 1 ha. | fenapab_samples.geojson -> plot_uid field. Joins to the polygon in the _plots table for the collection. |
| Field-measurement plot | The square measurement plot the field crew lays out at the sample point (typically 30 x 30 m or 35 x 35 m per TSC-BioCR). Fixed size by methodology, not by parcel boundary. | plot_coordinates.csv -> plot_size_m column. |
The crew does not measure the entire sampling-design polygon —
that would defeat the point of stratified sampling. They lay a fixed
square at each sample point, measure every big tree inside it, count
small trees in a nested sub-plot (subplot_size_m), and the
calibration extrapolates the measurements to t/ha using the
recorded plot_size_m. Use whatever main-plot dimension the
TSC-BioCR white paper specifies for the commodity (it spells this
out: typically 35 x 35 m main with a 5 x 5 m sub-plot for small trees),
applied consistently across every sample site. Only deviate when
terrain forces a smaller main plot, and record the actual side length
on that row.
When plot_size_m is omitted entirely, the notebook falls back to
30 x 30 m (DEFAULT_PLOT_SIZE_M = 30.0) and emits a warning. The
fallback exists for legacy callers; new deliveries should always
populate the column explicitly.
How It Works¶
- Field data files are uploaded to Google Cloud Storage under
field_data/{user_id}/{collection_id}/ - A Vertex AI notebook execution is triggered with the calibration parameters
- The notebook fits an allometric model using the field measurements and the satellite-derived covariates
- Calibrated AGB estimates are produced for each plot and (optionally) aggregated to statistical regions
- Results are stored in BigQuery
*_calibratedtables
The calibration uses the same satellite covariates as the default model (Dynamic World, Sentinel-2, Sentinel-1) but fits site-specific coefficients to the field data, accounting for local wood density, species composition, and stand structure.
Retrieving Results¶
Calibrated results are available through the same Fetch API endpoints as standard results, using the same collection name:
/fetch_biomass_emissions— returns calibrated AGB and emissions when calibrated tables exist/fetch_plots— returns plot-level calibrated data/fetch_stat_regions— returns statistical region calibrated data (whenstat_calibratedwas computed)
The response format is identical to non-calibrated results, with the calibrated values replacing the default model predictions.
Key Considerations¶
- Field data quality is critical — the calibration is only as good as the input measurements. Ensure DBH and height measurements follow standardised forestry inventory protocols.
- Minimum sample size — a sufficient number of field plots (typically 20+) is needed for meaningful calibration. Small sample sizes may produce overfitted models.
- Species-specific wood density — providing species-level wood density values (rather than using the global default) significantly improves calibration accuracy.
- Temporal alignment — field measurements should be temporally aligned with the satellite imagery used for prediction. Calibrating with 2019 field data against 2024 satellite imagery introduces mismatch.
- Processing time — AGB calibration is computationally intensive (Vertex AI notebook execution). Expect processing times of 30 minutes to several hours depending on the number of plots and monitoring period.