AGB Calibration¶

The /calibrate_agb endpoint allows users to calibrate the API's above-ground biomass (AGB) model using field-collected tree inventory data. This produces calibrated biomass estimates that account for site-specific allometric relationships, improving accuracy for carbon accounting use cases that require field-validated results.

Overview¶

Epoch's default AGB model is trained on GEDI LiDAR L4A footprints and produces globally consistent predictions. However, for high-confidence carbon accounting (e.g. insetting, voluntary carbon credits, verified removals), field-collected data is needed to calibrate the model to local conditions — species composition, wood density, allometric equations, and stand structure.

The /calibrate_agb endpoint accepts tree inventory data (big trees and small trees CSVs) alongside plot coordinates, and triggers a calibration pipeline that:

Uploads field data to cloud storage
Runs the AGB calibration notebook (Vertex AI) which fits a site-specific allometric model
Produces calibrated biomass estimates for each plot
Stores results in *_calibrated tables for retrieval via the Fetch API

Parameters¶

Parameter	Type	Required	Default	Description
`filename`	string	Yes	-	Collection name to assign to the calibrated results
`big_trees`	File	Yes	-	CSV of large tree measurements (DBH, height, species)
`small_trees`	File	Yes	-	CSV of small tree/sapling measurements
`plot_coordinates`	File	Yes	-	CSV or GeoJSON of field plot coordinates
`table_type`	string	No	`batch_calibrated`	Output table type(s). Options: `batch_calibrated`, `stat_calibrated`, or both (comma-separated)
`start_date`	string	No	`2017-01-01`	Start date for zonal statistics (YYYY-MM-DD)
`end_date`	string	No	today	End date for zonal statistics (YYYY-MM-DD)
`wood_density`	float	No	`0.55`	Wood density ratio (g/cm³) for AGB allometry
`buffer_size`	int	No	`100`	Buffer size (meters) for `batch_buffer` table type
`stat_region`	string	No	`watershed`	Region for statistical comparison: `watershed`, `admin`, or `supply_shed`
`ignore_strata`	string	No	`[]`	Strata numbers to exclude from removals accounting

Table Types¶

Table Type	Description
`batch_calibrated`	Plot-level calibrated results (default)
`stat_calibrated`	Statistical region-level calibrated results
`batch`	Standard (non-calibrated) batch results
`stat`	Standard statistical region results
`batch_buffer`	Buffered plot results (uses `buffer_size`)

Use table_type=batch_calibrated,stat_calibrated to generate both plot-level and regional calibrated results.

Input File Formats¶

All three CSVs join on plot_id. The calibration notebook accepts either the canonical column names below or the ODK-export-style aliases (in brackets) so field crews can deliver their tablet export unchanged.

Big Trees CSV¶

One row per individually-measured tree above the small-tree threshold (typically > 10 cm DBH).

Column	Required	Description
`plot_id`	Yes	Plot identifier. Must equal the `plot_uid` of a sample point returned by `/sampling_design` (see the samples GeoJSON) so the calibration joins back to the sampled plot. `unique_plot_id` is accepted as a legacy alias.
`dbh_cm` (or `data:circumference (in centimeters)`)	Yes	Diameter at breast height in cm. When the ODK alias is used, the notebook divides by π.
`height_m` (or `data:tree height (in meters)`)	Yes	Total tree height in m. Zeros are treated as missing.
`subplot_location`	No	Sub-plot label inside the main plot; included in the row id when present.
`count_extent`	No	`full_plot` (default) or `sub-plot`. Tells the notebook whether the count applies to the whole plot or a sub-plot.
`subplot_size_m`	No	Side length of the sub-plot the row was counted in (e.g. `5` for a 5 × 5 m sub-plot). Read from the concatenated trees DataFrame, so the column lives on the tree rows. Used to extrapolate sub-plot tree counts to the main plot area; without it the notebook uses the hardcoded 9.0 fallback extrapolation factor.
`species`	No	Free-text species name.
`wood_density`	No	Species-specific wood density (g/cm³). When present, overrides the global `wood_density` form parameter for that row.

Small Trees CSV¶

One row per sub-plot count of trees below the big-tree threshold (typically < 10 cm DBH). Measurements are usually counted per sub-plot rather than per tree.

Column	Required	Description
`plot_id`	Yes	Plot identifier (same join key as big trees). `unique_plot_id` accepted as legacy alias.
`avg_dbh_cm` (or `data:average circumference (in centimeters)`)	Yes	Average DBH of small trees in the sub-plot (cm).
`avg_height_m` (or `data:average tree height (in meters)`)	Yes	Average height of small trees in the sub-plot (m).
`tree_count` (or `tree_count_lt_10cm` or `data:Tree Count (< 10 cm DBH)`)	Yes	Count of small trees in the sub-plot.
`count_extent` (or `data:Count Extent`)	No	Defaults to `sub-plot`.
`subplot_size_m`	No	Side length of the sub-plot the count was taken in (e.g. `5` for a 5 × 5 m sub-plot). Read from the concatenated trees DataFrame; needed for accurate per-ha extrapolation. Without it the notebook falls back to a hardcoded 9.0 extrapolation factor.
`species`	No	Free-text species name.

Plot Coordinates CSV¶

One row per sample plot. The canonical, minimal-required schema is just the identifier + coordinates:

Column	Required	Description
`plot_id`	Yes	Plot identifier. Must match the `plot_uid` returned by `/sampling_design` in the samples GeoJSON. `unique_plot_id` accepted as legacy alias.
`latitude` / `longitude`	Yes	Plot centre coordinates in WGS84 (EPSG:4326).

Optional columns the notebook will pick up if present:

Column	Required	Description
`plot_size_m`	No	Side length of the main square measurement plot in metres (e.g. `35` for a 35 × 35 m plot). Today's deliveries typically omit this and the notebook falls back to `DEFAULT_PLOT_SIZE_M = 30.0` m, which makes per-hectare extrapolation less accurate. Populate this column if your plots aren't 30 × 30 m.

Note: subplot_size_m goes on the tree CSVs (big trees / small trees), not on plot coordinates. The notebook reads it from the concatenated trees DataFrame, so a subplot_size_m column added to plot_coordinates.csv is silently ignored.

Known gap: there is no endpoint-level form parameter for plot size today, even though plot size is typically a per-deployment constant set by the methodology (e.g. 35 × 35 m for cashew per TSC-BioCR), not something that varies between plots. Until a default_plot_size_m form parameter lands, set the size per row via plot_size_m (every row to the same value) whenever the methodology size differs from the 30 m default.

Joining to a sampling design¶

The plot_id column is the same identifier as the plot_uid column emitted by /sampling_design in the samples table / samples GeoJSON deliverable. Field crews visit the points listed in that file, run their inventory protocol per the TSC-BioCR white paper, and return the three CSVs with plot_id equal to the plot_uid of the sample they measured. The calibration automatically joins the field measurements back to the stratified sampling design.

For backward compatibility, the notebook still accepts unique_plot_id when plot_id is absent.

"Plot" means two different things in this pipeline¶

Two distinct plot concepts coexist and they are not interchangeable:

Concept	What it is	Where it lives
Sampling-design plot	The land-tenure or intervention-area polygon that the sampling design picked a sample point inside (a farmer's parcel, a cashew block, an intervention sub-area). Variable size, often less than 1 ha.	`fenapab_samples.geojson` -> `plot_uid` field. Joins to the polygon in the `_plots` table for the collection.
Field-measurement plot	The square measurement plot the field crew lays out at the sample point (typically 30 x 30 m or 35 x 35 m per TSC-BioCR). Fixed size by methodology, not by parcel boundary.	`plot_coordinates.csv` -> `plot_size_m` column.

The crew does not measure the entire sampling-design polygon — that would defeat the point of stratified sampling. They lay a fixed square at each sample point, measure every big tree inside it, count small trees in a nested sub-plot (subplot_size_m), and the calibration extrapolates the measurements to t/ha using the recorded plot_size_m. Use whatever main-plot dimension the TSC-BioCR white paper specifies for the commodity (it spells this out: typically 35 x 35 m main with a 5 x 5 m sub-plot for small trees), applied consistently across every sample site. Only deviate when terrain forces a smaller main plot, and record the actual side length on that row.

When plot_size_m is omitted entirely, the notebook falls back to 30 x 30 m (DEFAULT_PLOT_SIZE_M = 30.0) and emits a warning. The fallback exists for legacy callers; new deliveries should always populate the column explicitly.

How It Works¶

Field data files are uploaded to Google Cloud Storage under field_data/{user_id}/{collection_id}/
A Vertex AI notebook execution is triggered with the calibration parameters
The notebook fits an allometric model using the field measurements and the satellite-derived covariates
Calibrated AGB estimates are produced for each plot and (optionally) aggregated to statistical regions
Results are stored in BigQuery *_calibrated tables

The calibration uses the same satellite covariates as the default model (Dynamic World, Sentinel-2, Sentinel-1) but fits site-specific coefficients to the field data, accounting for local wood density, species composition, and stand structure.

Retrieving Results¶

Calibrated results are available through the same Fetch API endpoints as standard results, using the same collection name:

/fetch_biomass_emissions — returns calibrated AGB and emissions when calibrated tables exist
/fetch_plots — returns plot-level calibrated data
/fetch_stat_regions — returns statistical region calibrated data (when stat_calibrated was computed)

The response format is identical to non-calibrated results, with the calibrated values replacing the default model predictions.

Key Considerations¶

Field data quality is critical — the calibration is only as good as the input measurements. Ensure DBH and height measurements follow standardised forestry inventory protocols.
Minimum sample size — a sufficient number of field plots (typically 20+) is needed for meaningful calibration. Small sample sizes may produce overfitted models.
Species-specific wood density — providing species-level wood density values (rather than using the global default) significantly improves calibration accuracy.
Temporal alignment — field measurements should be temporally aligned with the satellite imagery used for prediction. Calibrating with 2019 field data against 2024 satellite imagery introduces mismatch.
Processing time — AGB calibration is computationally intensive (Vertex AI notebook execution). Expect processing times of 30 minutes to several hours depending on the number of plots and monitoring period.