End-to-End GeoAI Workflow for Agricultural Field Boundary Delineation

Delineating agricultural field boundaries from satellite imagery is a common task in remote sensing, but adjacent fields belonging to the same crop type make it a challenging one. Semantic segmentation merges touching fields into a single polygon, which is rarely what you want. Instance segmentation solves this by assigning a unique ID to each individual field, even when they share a boundary.

In this tutorial, I walk through a complete workflow for field boundary delineation using the Fields of the World dataset and the GeoAI package. The workflow covers everything from downloading training data to batch inference, and it can be adapted to detect other object types (buildings, trees, etc.) as long as training data is available.

Video tutorial: End-to-End GeoAI Workflow for Field Boundary Delineation

Resources:

Jupyter Notebook
Fields of the World Web App (PMTiles interactive map)
Fields of the World Website

Why Instance Segmentation?¶

Agricultural fields are often adjacent to each other and belong to the same class (e.g., “cropland”). If you use semantic segmentation, all touching fields merge into one large polygon. Instance segmentation distinguishes individual objects within the same class, producing separate polygons for each field even when they share a border.

This distinction matters for downstream analysis: calculating per-field statistics, tracking crop rotation, or estimating yields all require individual field polygons rather than a single merged region.

Exploring the Fields of the World Dataset¶

The Fields of the World dataset is an open-access benchmark covering 24 countries with over 70,000 samples. It provides Sentinel-2 imagery (two time steps to capture different phenological stages) along with instance segmentation masks. The dataset is hosted on Source Cooperative and organized by country.

To get a feel for the data, you can explore the PMTiles web app, which lets you swipe between imagery and field boundary polygons, click individual polygons to inspect their properties, and pan across all 24 countries.

Downloading and Preparing Training Data¶

The GeoAI package provides a single function to download data for any country in the dataset. For this tutorial, we use Luxembourg as a lightweight example:

import geoai

geoai.download_ftw(country="luxembourg", output_dir="ftw_dataset")

The downloaded data includes Sentinel-2 imagery windows, instance segmentation masks, and a GeoParquet index file. You can visualize the spatial distribution of training, validation, and testing splits on an interactive map:

geoai.view_vector_interactive(gdf)

To convert the raw data into image chips suitable for model training:

geoai.prepare_ftw(input_dir="ftw_dataset/luxembourg", output_dir="field_boundary")

This produces a directory of image chips organized into training and testing sets, ready for the next step.

Training the Instance Segmentation Model¶

Training requires a single function call. You specify the input directories, number of classes (two: background and field), number of channels (four: RGB plus near-infrared), batch size, and number of epochs:

geoai.train_instance_segmentation(
    images_dir="field_boundary/train/images",
    labels_dir="field_boundary/train/labels",
    output_dir="models",
    num_classes=2,
    num_channels=4,
    batch_size=4,
    num_epochs=20,
    val_ratio=0.2,
    instance_labels=True,
)

Training 20 epochs on a single country takes around 10 to 15 minutes depending on your GPU. The function saves two models: the best model (highest validation accuracy) and the final model (last epoch). You can inspect training performance with:

geoai.plot_performance_metrics("models")

Look for decreasing training and validation loss and increasing IoU (Intersection over Union). With 20 epochs on the Luxembourg data, you can expect around 75% IoU.

Running Inference¶

Apply the trained model to new imagery using the instance segmentation function:

result = geoai.instance_segmentation(
    input_path="field_boundary/test/images/image_001.tif",
    output_path="prediction.tif",
    model_path="models/best_model.pth",
    confidence_threshold=0.5,
    class_names=["background", "field"],
    vectorize=True,
)

The confidence_threshold parameter controls how aggressive the detection is. Increase it to reduce false positives; decrease it to capture more objects. The output includes the instance mask (each field gets a unique ID), a confidence score raster, and vectorized polygons.

Post-Processing and Cleanup¶

The raw prediction may contain small artifacts and holes. The GeoAI package includes a cleanup function that removes small objects and fills holes below a specified pixel threshold:

geoai.clean_instance_segmentation(
    input_path="prediction.tif",
    output_path="prediction_clean.tif",
    min_object_size=100,
    max_hole_area=100,
)

After cleanup, you can convert the raster to vector polygons and add geometry properties (area, perimeter, elongation) for further analysis:

geoai.add_geometry_properties(gdf)

This lets you quickly summarize field statistics: total count, median area, size distribution, and shape characteristics. You can also visualize fields colored by any property (area, elongation, confidence score) on an interactive map.

Batch Processing¶

For larger areas with multiple image chips, use the batch function instead of processing files one by one:

geoai.instance_segmentation_batch(
    input_dir="field_boundary/test/images",
    output_dir="predictions",
    model_path="models/best_model.pth",
    confidence_threshold=0.5,
    class_names=["background", "field"],
    vectorize=True,
)

This processes all images in the input directory and produces corresponding outputs, making it straightforward to scale the workflow to large study areas.

Adapting to Other Object Types¶

This workflow is not limited to field boundaries. The same pipeline works for any object type where you have labeled training data: buildings, trees, solar panels, water bodies, and more. The most time-consuming part is typically creating the benchmark dataset. In this case, the Fields of the World dataset provides that foundation, so you can focus on the modeling rather than the labeling.

To get started, check out the full notebook or run it directly in Google Colab. If you run into issues, feel free to report them on the GeoAI GitHub repository.