Skip to content

AGB Calibration

The /calibrate_agb endpoint allows users to calibrate the API's above-ground biomass (AGB) model using field-collected tree inventory data. This produces calibrated biomass estimates that account for site-specific allometric relationships, improving accuracy for carbon accounting use cases that require field-validated results.

Overview

Epoch's default AGB model is trained on GEDI LiDAR L4A footprints and produces globally consistent predictions. However, for high-confidence carbon accounting (e.g. insetting, voluntary carbon credits, verified removals), field-collected data is needed to calibrate the model to local conditions — species composition, wood density, allometric equations, and stand structure.

The /calibrate_agb endpoint accepts tree inventory data (big trees and small trees CSVs) alongside plot coordinates, and triggers a calibration pipeline that:

  1. Uploads field data to cloud storage
  2. Runs the AGB calibration notebook (Vertex AI) which fits a site-specific allometric model
  3. Produces calibrated biomass estimates for each plot
  4. Stores results in *_calibrated tables for retrieval via the Fetch API

Parameters

Parameter Type Required Default Description
filename string Yes - Collection name to assign to the calibrated results
big_trees File Yes - CSV of large tree measurements (DBH, height, species)
small_trees File Yes - CSV of small tree/sapling measurements
plot_coordinates File Yes - CSV or GeoJSON of field plot coordinates
table_type string No batch_calibrated Output table type(s). Options: batch_calibrated, stat_calibrated, or both (comma-separated)
start_date string No 2017-01-01 Start date for zonal statistics (YYYY-MM-DD)
end_date string No today End date for zonal statistics (YYYY-MM-DD)
wood_density float No 0.55 Wood density ratio (g/cm³) for AGB allometry
buffer_size int No 100 Buffer size (meters) for batch_buffer table type
stat_region string No watershed Region for statistical comparison: watershed, admin, or supply_shed
ignore_strata string No [] Strata numbers to exclude from removals accounting

Table Types

Table Type Description
batch_calibrated Plot-level calibrated results (default)
stat_calibrated Statistical region-level calibrated results
batch Standard (non-calibrated) batch results
stat Standard statistical region results
batch_buffer Buffered plot results (uses buffer_size)

Use table_type=batch_calibrated,stat_calibrated to generate both plot-level and regional calibrated results.

Input File Formats

All three CSVs join on plot_id. The calibration notebook accepts either the canonical column names below or the ODK-export-style aliases (in brackets) so field crews can deliver their tablet export unchanged.

Big Trees CSV

One row per individually-measured tree above the small-tree threshold (typically > 10 cm DBH).

Column Required Description
plot_id Yes Plot identifier. Must equal the plot_uid of a sample point returned by /sampling_design (see the samples GeoJSON) so the calibration joins back to the sampled plot. unique_plot_id is accepted as a legacy alias.
dbh_cm (or data:circumference (in centimeters)) Yes Diameter at breast height in cm. When the ODK alias is used, the notebook divides by π.
height_m (or data:tree height (in meters)) Yes Total tree height in m. Zeros are treated as missing.
subplot_location No Sub-plot label inside the main plot; included in the row id when present.
count_extent No full_plot (default) or sub-plot. Tells the notebook whether the count applies to the whole plot or a sub-plot.
subplot_size_m No Side length of the sub-plot the row was counted in (e.g. 5 for a 5 × 5 m sub-plot). Read from the concatenated trees DataFrame, so the column lives on the tree rows. Used to extrapolate sub-plot tree counts to the main plot area; without it the notebook uses the hardcoded 9.0 fallback extrapolation factor.
species No Free-text species name.
wood_density No Species-specific wood density (g/cm³). When present, overrides the global wood_density form parameter for that row.

Small Trees CSV

One row per sub-plot count of trees below the big-tree threshold (typically < 10 cm DBH). Measurements are usually counted per sub-plot rather than per tree.

Column Required Description
plot_id Yes Plot identifier (same join key as big trees). unique_plot_id accepted as legacy alias.
avg_dbh_cm (or data:average circumference (in centimeters)) Yes Average DBH of small trees in the sub-plot (cm).
avg_height_m (or data:average tree height (in meters)) Yes Average height of small trees in the sub-plot (m).
tree_count (or tree_count_lt_10cm or data:Tree Count (< 10 cm DBH)) Yes Count of small trees in the sub-plot.
count_extent (or data:Count Extent) No Defaults to sub-plot.
subplot_size_m No Side length of the sub-plot the count was taken in (e.g. 5 for a 5 × 5 m sub-plot). Read from the concatenated trees DataFrame; needed for accurate per-ha extrapolation. Without it the notebook falls back to a hardcoded 9.0 extrapolation factor.
species No Free-text species name.

Plot Coordinates CSV

One row per sample plot. The canonical, minimal-required schema is just the identifier + coordinates:

Column Required Description
plot_id Yes Plot identifier. Must match the plot_uid returned by /sampling_design in the samples GeoJSON. unique_plot_id accepted as legacy alias.
latitude / longitude Yes Plot centre coordinates in WGS84 (EPSG:4326).

Optional columns the notebook will pick up if present:

Column Required Description
plot_size_m No Side length of the main square measurement plot in metres (e.g. 35 for a 35 × 35 m plot). Today's deliveries typically omit this and the notebook falls back to DEFAULT_PLOT_SIZE_M = 30.0 m, which makes per-hectare extrapolation less accurate. Populate this column if your plots aren't 30 × 30 m.

Note: subplot_size_m goes on the tree CSVs (big trees / small trees), not on plot coordinates. The notebook reads it from the concatenated trees DataFrame, so a subplot_size_m column added to plot_coordinates.csv is silently ignored.

Known gap: there is no endpoint-level form parameter for plot size today, even though plot size is typically a per-deployment constant set by the methodology (e.g. 35 × 35 m for cashew per TSC-BioCR), not something that varies between plots. Until a default_plot_size_m form parameter lands, set the size per row via plot_size_m (every row to the same value) whenever the methodology size differs from the 30 m default.

Joining to a sampling design

The plot_id column is the same identifier as the plot_uid column emitted by /sampling_design in the samples table / samples GeoJSON deliverable. Field crews visit the points listed in that file, run their inventory protocol per the TSC-BioCR white paper, and return the three CSVs with plot_id equal to the plot_uid of the sample they measured. The calibration automatically joins the field measurements back to the stratified sampling design.

For backward compatibility, the notebook still accepts unique_plot_id when plot_id is absent.

"Plot" means two different things in this pipeline

Two distinct plot concepts coexist and they are not interchangeable:

Concept What it is Where it lives
Sampling-design plot The land-tenure or intervention-area polygon that the sampling design picked a sample point inside (a farmer's parcel, a cashew block, an intervention sub-area). Variable size, often less than 1 ha. fenapab_samples.geojson -> plot_uid field. Joins to the polygon in the _plots table for the collection.
Field-measurement plot The square measurement plot the field crew lays out at the sample point (typically 30 x 30 m or 35 x 35 m per TSC-BioCR). Fixed size by methodology, not by parcel boundary. plot_coordinates.csv -> plot_size_m column.

The crew does not measure the entire sampling-design polygon — that would defeat the point of stratified sampling. They lay a fixed square at each sample point, measure every big tree inside it, count small trees in a nested sub-plot (subplot_size_m), and the calibration extrapolates the measurements to t/ha using the recorded plot_size_m. Use whatever main-plot dimension the TSC-BioCR white paper specifies for the commodity (it spells this out: typically 35 x 35 m main with a 5 x 5 m sub-plot for small trees), applied consistently across every sample site. Only deviate when terrain forces a smaller main plot, and record the actual side length on that row.

When plot_size_m is omitted entirely, the notebook falls back to 30 x 30 m (DEFAULT_PLOT_SIZE_M = 30.0) and emits a warning. The fallback exists for legacy callers; new deliveries should always populate the column explicitly.

How It Works

  1. Field data files are uploaded to Google Cloud Storage under field_data/{user_id}/{collection_id}/
  2. A Vertex AI notebook execution is triggered with the calibration parameters
  3. The notebook fits an allometric model using the field measurements and the satellite-derived covariates
  4. Calibrated AGB estimates are produced for each plot and (optionally) aggregated to statistical regions
  5. Results are stored in BigQuery *_calibrated tables

The calibration uses the same satellite covariates as the default model (Dynamic World, Sentinel-2, Sentinel-1) but fits site-specific coefficients to the field data, accounting for local wood density, species composition, and stand structure.

Retrieving Results

Calibrated results are available through the same Fetch API endpoints as standard results, using the same collection name:

  • /fetch_biomass_emissions — returns calibrated AGB and emissions when calibrated tables exist
  • /fetch_plots — returns plot-level calibrated data
  • /fetch_stat_regions — returns statistical region calibrated data (when stat_calibrated was computed)

The response format is identical to non-calibrated results, with the calibrated values replacing the default model predictions.

Key Considerations

  1. Field data quality is critical — the calibration is only as good as the input measurements. Ensure DBH and height measurements follow standardised forestry inventory protocols.
  2. Minimum sample size — a sufficient number of field plots (typically 20+) is needed for meaningful calibration. Small sample sizes may produce overfitted models.
  3. Species-specific wood density — providing species-level wood density values (rather than using the global default) significantly improves calibration accuracy.
  4. Temporal alignment — field measurements should be temporally aligned with the satellite imagery used for prediction. Calibrating with 2019 field data against 2024 satellite imagery introduces mismatch.
  5. Processing time — AGB calibration is computationally intensive (Vertex AI notebook execution). Expect processing times of 30 minutes to several hours depending on the number of plots and monitoring period.