torchgeo.datasets<a class="headerlink" href="#module-torchgeo.datasets" title="Permalink to this heading">¶

AgriFieldNet¶

class torchgeo.datasets.AgriFieldNet(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 36], bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, cache=True, download=False)[source]¶

Bases: RasterDataset

AgriFieldNet India Challenge dataset.

The AgriFieldNet India Challenge dataset includes satellite imagery from Sentinel-2 cloud free composites (single snapshot) and labels for crop type that were collected by ground survey. The Sentinel-2 data are then matched with corresponding labels. The dataset contains 7081 fields, which have been split into training and test sets (5551 fields in the train and 1530 fields in the test). Satellite imagery and labels are tiled into 256x256 chips adding up to 1217 tiles. The fields are distributed across all chips, some chips may only have train or test fields and some may have both. Since the labels are derived from data collected on the ground, not all the pixels are labeled in each chip. If the field ID for a pixel is set to 0 it means that pixel is not included in either of the train or test set (and correspondingly the crop label will be 0 as well). For this challenge train and test sets have slightly different crop type distributions. The train set follows the distribution of ground reference data which is a skewed distribution with a few dominant crops being over represented. The test set was drawn randomly from an area weighted field list that ensured that fields with less common crop types were better represented in the test set. The original dataset can be downloaded from Source Cooperative.

Dataset format:

images are 12-band Sentinel-2 data
masks are tiff images with unique values representing the class and field id

Dataset classes:

1. No-Data
1. Wheat
1. Mustard
1. Lentil
1. No Crop/Fallow
1. Green pea
1. Sugarcane
1. Garlic
1. Maize
1. Gram
1. Coriander
1. Potato
1. Berseem
1. Rice

If you use this dataset in your research, please cite the following dataset:

https://doi.org/10.34911/rdnt.wu92p1

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

New in version 0.6.

filename_regex = '\n ^ref_agrifieldnet_competition_v1_source_\n (?P<unique_folder_id>[a-z0-9]{5})\n _(?P<band>B[0-9A-Z]{2})_10m\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

rgb_bands: tuple[str, ...] = ('B04', 'B03', 'B02')¶: Names of RGB bands in the dataset, used for plotting

all_bands: tuple[str, ...] = ('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12')¶: Names of all available bands in the dataset

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 8: (111, 166, 0, 255), 9: (0, 175, 73, 255), 13: (222, 166, 9, 255), 14: (222, 166, 9, 255), 15: (124, 211, 255, 255), 16: (226, 0, 124, 255), 36: (137, 96, 83, 255)}¶: Color map for the dataset, used for plotting

__init__(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 36], bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, cache=True, download=False)[source]¶

Initialize a new AgriFieldNet dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search for files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
cache (bool) – if True, cache the dataset in memory
download (bool) – if True, download dataset and store it in the root directory

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

filename_glob = 'ref_agrifieldnet_competition_v1_source_*_{}_10m.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Airphen¶

class torchgeo.datasets.Airphen(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

Airphen dataset.

Airphen is a multispectral scientific camera developed by agronomists and photonics engineers at Hiphen to match plant measurements needs and constraints.

Main characteristics:

6 Synchronized global shutter sensors
Sensor resolution 1280 x 960 pixels
Data format (.tiff, 12 bit)
SD card storage
Metadata information: Exif and XMP
Internal or external GPS
Synchronization with different sensors (TIR, RGB, others)

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.34133/2021/9892647

New in version 0.6.

all_bands: tuple[str, ...] = ('B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8')¶: Names of all available bands in the dataset

rgb_bands: tuple[str, ...] = ('B4', 'B3', 'B1')¶: Names of RGB bands in the dataset, used for plotting

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Aster Global DEM¶

class torchgeo.datasets.AsterGDEM(paths='data', crs=None, res=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

Aster Global Digital Elevation Model Dataset.

The Aster Global Digital Elevation Model dataset is a Digital Elevation Model (DEM) on a global scale. The dataset can be downloaded from the Earth Data website after making an account.

Dataset features:

DEMs at 30 m per pixel spatial resolution (3601x3601 px)
data collected from the Aster instrument

Dataset format:

DEMs are single-channel tif files

New in version 0.3.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

filename_glob = 'ASTGTMV003_*_dem*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n (?P<name>[ASTGTMV003]{10})\n _(?P<id>[A-Z0-9]{7})\n _(?P<data>[a-z]{3})*\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, cache=True)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | list[str | os.PathLike[str]]) – one or more root directories to search or files to load, here the collection of individual zip files for each tile should be found
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Canadian Building Footprints¶

class torchgeo.datasets.CanadianBuildingFootprints(paths='data', crs=None, res=(1e-05, 1e-05), transforms=None, download=False, checksum=False)[source]¶

Bases: VectorDataset

Canadian Building Footprints dataset.

The Canadian Building Footprints dataset contains 11,842,186 computer generated building footprints in all Canadian provinces and territories in GeoJSON format. This data is freely available for download and use.

__init__(paths='data', crs=None, res=(1e-05, 1e-05), transforms=None, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float]) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution.
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by VectorDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Chesapeake Bay Land Use and Land Cover (LULC) Database 2022 Edition

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, it is possible to show subplot titles and/or use a custom suptitle.

Chesapeake Land Cover¶

class torchgeo.datasets.Chesapeake(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: RasterDataset, ABC

Abstract base class for all Chesapeake datasets.

The Chesapeake Bay Land Use and Land Cover Database (LULC) facilitates characterization of the landscape and land change for and between discrete time periods. The database was developed by the University of Vermont’s Spatial Analysis Laboratory in cooperation with Chesapeake Conservancy (CC) and U.S. Geological Survey (USGS) as part of a 6-year Cooperative Agreement between Chesapeake Conservancy and the U.S. Environmental Protection Agency (EPA) and a separate Interagency Agreement between the USGS and EPA to provide geospatial support to the Chesapeake Bay Program Office.

The database contains one-meter 13-class Land Cover (LC) and 54-class Land Use/Land Cover (LULC) for all counties within or adjacent to the Chesapeake Bay watershed for 2013/14 and 2017/18, depending on availability of National Agricultural Imagery Program (NAIP) imagery for each state. Additionally, 54 LULC classes are generalized into 18 LULC classes for ease of visualization and communication of LULC trends. LC change between discrete time periods, detected by spectral changes in NAIP imagery and LiDAR, represents changes between the 12 land cover classes. LULC change uses LC change to identify where changes are happening and then LC is translated to LULC to represent transitions between the 54 LULC classes. The LULCC data is represented as a LULC class change transition matrix which provides users acres of change between multiple classes. It is organized by 18x18 and 54x54 LULC classes. The Chesapeake Bay Water (CBW) indicates raster tabulations were performed for only areas that fall inside the CBW boundary e.g., if user is interested in CBW portion of a county then they will use LULC Matrix CBW. Conversely, if they are interested change transitions across the entire county, they will use LULC Matrix.

If you use this dataset in your research, please cite the following:

https://doi.org/10.5066/P981GV1L

date_format = '%Y'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

abstract property md5s: dict[int, str]¶: Mapping between data year and zip file MD5.

property state: str¶: State abbreviation.

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {11: (0, 92, 230, 255), 12: (0, 92, 230, 255), 13: (0, 92, 230, 255), 14: (0, 92, 230, 255), 15: (0, 92, 230, 255), 21: (0, 0, 0, 255), 22: (235, 6, 2, 255), 23: (89, 89, 89, 255), 24: (138, 138, 136, 255), 25: (138, 138, 136, 255), 26: (138, 138, 136, 255), 27: (115, 115, 0, 255), 28: (233, 255, 190, 255), 29: (255, 255, 115, 255), 41: (38, 115, 0, 255), 42: (56, 168, 0, 255), 51: (255, 255, 115, 255), 52: (255, 255, 115, 255), 53: (255, 255, 115, 255), 54: (170, 255, 0, 255), 55: (170, 255, 0, 255), 56: (170, 255, 0, 255), 62: (77, 209, 148, 255), 63: (77, 209, 148, 255), 64: (56, 168, 0, 255), 65: (38, 115, 0, 255), 72: (186, 245, 217, 255), 73: (186, 245, 217, 255), 74: (56, 168, 0, 255), 75: (38, 115, 0, 255), 83: (255, 211, 127, 255), 84: (255, 211, 127, 255), 85: (255, 211, 127, 255), 91: (0, 168, 132, 255), 92: (0, 168, 132, 255), 93: (0, 168, 132, 255), 94: (56, 168, 0, 255), 95: (38, 115, 0, 255), 127: (255, 255, 255, 255)}¶: Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new Chesapeake instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

filename_glob = '{state}_lulc_*_2022-Edition.tif'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '^{state}_lulc_(?P<date>\\d{{4}})_2022-Edition\\.tif$'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

class torchgeo.datasets.ChesapeakeDC(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for Washington, D.C.

class torchgeo.datasets.ChesapeakeDE(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for Delaware.

class torchgeo.datasets.ChesapeakeMD(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for Maryland.

class torchgeo.datasets.ChesapeakeNY(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for New York.

class torchgeo.datasets.ChesapeakePA(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for Pennsylvania.

class torchgeo.datasets.ChesapeakeVA(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for Virginia.

class torchgeo.datasets.ChesapeakeWV(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: Chesapeake

This subset of the dataset contains data only for West Virginia.

class torchgeo.datasets.ChesapeakeCVPR(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: GeoDataset

CVPR 2019 Chesapeake Land Cover dataset.

The CVPR 2019 Chesapeake Land Cover dataset contains two layers of NAIP aerial imagery, Landsat 8 leaf-on and leaf-off imagery, Chesapeake Bay land cover labels, NLCD land cover labels, and Microsoft building footprint labels.

This dataset was organized to accompany the 2019 CVPR paper, “Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data”.

The paper “Resolving label uncertainty with implicit generative models” added an additional layer of data to this dataset containing a prior over the Chesapeake Bay land cover classes generated from the NLCD land cover labels. For more information about this layer see the dataset documentation.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/cvpr.2019.01301

__init__(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”
layers (Sequence[str]) – a list containing a subset of “naip-new”, “naip-old”, “lc”, “nlcd”, “landsat-leaf-on”, “landsat-leaf-off”, “buildings”, or “prior_from_cooccurrences_101_31_no_osm_no_buildings” indicating which layers to load
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if splits or layers are not valid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.4.

GlobalBuildingMap¶

class torchgeo.datasets.GlobalBuildingMap(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

GlobalBuildingMap dataset.

The GlobalBuildingMap (GBM) dataset provides the highest resolution and highest accuracy building footprint map on a global scale ever created. GBM was generated by training and applying modern deep neural networks on nearly 800,000 satellite images. The dataset is stored in 5 by 5 degree tiles in geotiff format.

The GlobalBuildingMap is generated by applying an ensemble of deep neural networks on nearly 800,000 satellite images of about 3m resolution. The deep neural networks were trained with manually inspected training samples generated from OpenStreetMap.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2404.13911

New in version 0.7.

filename_glob = 'GBM_v1_*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – A sample returned by RasterDataset.__getitem__().
show_titles (bool) – Flag indicating whether to show titles above each panel.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Return type:

Global Mangrove Distribution¶

class torchgeo.datasets.CMSGlobalMangroveCanopy(paths='data', crs=None, res=None, measurement='agb', country='AndamanAndNicobar', transforms=None, cache=True, checksum=False)[source]¶

Bases: RasterDataset

CMS Global Mangrove Canopy dataset.

The CMS Global Mangrove Canopy dataset consists of a single band map at 30m resolution of either aboveground biomass (agb), basal area weighted height (hba95), or maximum canopy height (hmax95).

The dataset needs to be manually downloaded from the above link, where you can make an account and subsequently download the dataset.

New in version 0.3.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

filename_regex = '^\n (?P<mangrove>[A-Za-z]{8})\n _(?P<variable>[a-z0-9]*)\n _(?P<country>[A-Za-z][^.]*)\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

__init__(paths='data', crs=None, res=None, measurement='agb', country='AndamanAndNicobar', transforms=None, cache=True, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | list[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
measurement (str) – which of the three measurements, ‘agb’, ‘hba95’, or ‘hmax95’
country (str) – country for which to retrieve data
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if country or measurement arg are not str or invalid
DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Cropland Data Layer¶

class torchgeo.datasets.CDL(paths='data', crs=None, res=None, years=[2023], classes=[0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 75, 76, 77, 81, 82, 83, 87, 88, 92, 111, 112, 121, 122, 123, 124, 131, 141, 142, 143, 152, 176, 190, 195, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 254], transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: RasterDataset

Cropland Data Layer (CDL) dataset.

The Cropland Data Layer, hosted on CropScape, provides a raster, geo-referenced, crop-specific land cover map for the continental United States. The CDL also includes a crop mask layer and planting frequency layers, as well as boundary, water and road layers. The Boundary Layer options provided are County, Agricultural Statistics Districts (ASD), State, and Region. The data is created annually using moderate resolution satellite imagery and extensive agricultural ground truth.

The dataset contains 134 classes, for a description of the classes see the xls file at the top of this page.

If you use this dataset in your research, please cite it using the following format:

https://www.nass.usda.gov/Research_and_Science/Cropland/sarsfaqs2.php#what.1

filename_glob = '*_30m_cdls.tif'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n ^(?P<date>\\d+)\n _30m_cdls\\..*$\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 10: (111, 166, 0, 255), 11: (0, 175, 73, 255), 12: (222, 166, 9, 255), 13: (222, 166, 9, 255), 14: (124, 211, 255, 255), 21: (226, 0, 124, 255), 22: (137, 96, 83, 255), 23: (217, 181, 107, 255), 24: (166, 111, 0, 255), 25: (213, 158, 188, 255), 26: (111, 111, 0, 255), 27: (171, 0, 124, 255), 28: (160, 88, 137, 255), 29: (111, 0, 73, 255), 30: (213, 158, 188, 255), 31: (209, 255, 0, 255), 32: (124, 153, 255, 255), 33: (213, 213, 0, 255), 34: (209, 255, 0, 255), 35: (0, 175, 73, 255), 36: (255, 166, 226, 255), 37: (166, 241, 139, 255), 38: (0, 175, 73, 255), 39: (213, 158, 188, 255), 41: (168, 0, 226, 255), 42: (166, 0, 0, 255), 43: (111, 37, 0, 255), 44: (0, 175, 73, 255), 45: (175, 124, 255, 255), 46: (111, 37, 0, 255), 47: (255, 102, 102, 255), 48: (255, 102, 102, 255), 49: (255, 204, 102, 255), 50: (255, 102, 102, 255), 51: (0, 175, 73, 255), 52: (0, 222, 175, 255), 53: (83, 255, 0, 255), 54: (241, 162, 120, 255), 55: (255, 102, 102, 255), 56: (0, 175, 73, 255), 57: (124, 211, 255, 255), 58: (232, 190, 255, 255), 59: (175, 255, 222, 255), 60: (0, 175, 73, 255), 61: (190, 190, 120, 255), 63: (147, 204, 147, 255), 64: (198, 213, 158, 255), 65: (204, 190, 162, 255), 66: (255, 0, 255, 255), 67: (255, 143, 171, 255), 68: (185, 0, 79, 255), 69: (111, 69, 137, 255), 70: (0, 120, 120, 255), 71: (175, 153, 111, 255), 72: (255, 255, 124, 255), 74: (181, 111, 92, 255), 75: (0, 166, 130, 255), 76: (232, 213, 175, 255), 77: (175, 153, 111, 255), 81: (241, 241, 241, 255), 82: (153, 153, 153, 255), 83: (73, 111, 162, 255), 87: (124, 175, 175, 255), 88: (232, 255, 190, 255), 92: (0, 255, 255, 255), 111: (73, 111, 162, 255), 112: (211, 226, 249, 255), 121: (153, 153, 153, 255), 122: (153, 153, 153, 255), 123: (153, 153, 153, 255), 124: (153, 153, 153, 255), 131: (204, 190, 162, 255), 141: (147, 204, 147, 255), 142: (147, 204, 147, 255), 143: (147, 204, 147, 255), 152: (198, 213, 158, 255), 176: (232, 255, 190, 255), 190: (124, 175, 175, 255), 195: (124, 175, 175, 255), 204: (0, 255, 139, 255), 205: (213, 158, 188, 255), 206: (255, 102, 102, 255), 207: (255, 102, 102, 255), 208: (255, 102, 102, 255), 209: (255, 102, 102, 255), 210: (255, 143, 171, 255), 211: (51, 73, 51, 255), 212: (226, 111, 37, 255), 213: (255, 102, 102, 255), 214: (255, 102, 102, 255), 215: (102, 153, 77, 255), 216: (255, 102, 102, 255), 217: (175, 153, 111, 255), 218: (255, 143, 171, 255), 219: (255, 102, 102, 255), 220: (255, 143, 171, 255), 221: (255, 102, 102, 255), 222: (255, 102, 102, 255), 223: (255, 143, 171, 255), 224: (0, 175, 73, 255), 225: (255, 211, 0, 255), 226: (255, 211, 0, 255), 227: (255, 102, 102, 255), 228: (255, 211, 0, 255), 229: (255, 102, 102, 255), 230: (137, 96, 83, 255), 231: (255, 102, 102, 255), 232: (255, 37, 37, 255), 233: (226, 0, 124, 255), 234: (255, 158, 9, 255), 235: (255, 158, 9, 255), 236: (166, 111, 0, 255), 237: (255, 211, 0, 255), 238: (166, 111, 0, 255), 239: (37, 111, 0, 255), 240: (37, 111, 0, 255), 241: (255, 211, 0, 255), 242: (0, 0, 153, 255), 243: (255, 102, 102, 255), 244: (255, 102, 102, 255), 245: (255, 102, 102, 255), 246: (255, 102, 102, 255), 247: (255, 102, 102, 255), 248: (255, 102, 102, 255), 249: (255, 102, 102, 255), 250: (255, 102, 102, 255), 254: (37, 111, 0, 255)}¶: Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2023], classes=[0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 75, 76, 77, 81, 82, 83, 87, 88, 92, 111, 112, 121, 122, 123, 124, 131, 141, 142, 143, 152, 176, 190, 195, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 254], transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
years (list[int]) – list of years for which to use cdl layer
classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if years or classes are invalid
DatasetNotFoundError – If dataset is not found and download is False.

New in version 0.5: The years and classes parameters.

Changed in version 0.5: root was renamed to paths.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

EDDMapS¶

class torchgeo.datasets.EDDMapS(root='data')[source]¶

Bases: GeoDataset

Dataset for EDDMapS.

EDDMapS, Early Detection and Distribution Mapping System, is a web-based mapping system for documenting invasive species and pest distribution. Launched in 2005 by the Center for Invasive Species and Ecosystem Health at the University of Georgia, it was originally designed as a tool for state Exotic Pest Plant Councils to develop more complete distribution data of invasive species. Since then, the program has expanded to include the entire US and Canada as well as to document certain native pest species.

EDDMapS query results can be downloaded in CSV, KML, or Shapefile format. This dataset currently only supports CSV files.

If you use an EDDMapS dataset in your research, please cite it like so:

EDDMapS. YEAR. Early Detection & Distribution Mapping System. The University of Georgia - Center for Invasive Species and Ecosystem Health. Available online at https://www.eddmaps.org/; last accessed DATE.

New in version 0.3.

__init__(root='data')[source]¶

Initialize a new Dataset instance.

Parameters:: root (str | os.PathLike[str]) – root directory where dataset can be found
Raises:: DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for Figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.8.

EnMAP¶

class torchgeo.datasets.EnMAP(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

EnMAP dataset.

The Environmental Mapping and Analysis Program (EnMAP) is a German hyperspectral satellite mission that monitors and characterizes Earth’s environment on a global scale. EnMAP measures geochemical, biochemical and biophysical variables providing information on the status and evolution of terrestrial and aquatic ecosystems.

Mission Outline:

Dedicated pushbroom hyperspectral imager mainly based on modified existing or pre-developed technology
Broad spectral range from 420 nm to 1000 nm (VNIR) and from 900 nm to 2450 nm (SWIR) with high radiometric resolution and stability in both spectral ranges
30 km swath width at a spatial resolution of 30 x 30 m, nadir revisit time of 27 days and off-nadir (30°) pointing feature for fast target revisit (4 days)
Sufficient on-board memory to acquire 1,000 km swath length per orbit and a total of 5,000 km per day.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

filename_glob = 'ENMAP*SPECTRAL_IMAGE*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n ^ENMAP\n (?P<satellite>\\d{2})-\n (?P<product_type>____L[12][ABC])-\n (?P<datatake_id>DT\\d{10})_\n (?P<date>\\d{8}T\\d{6})Z_\n (?P<tile_id>\\d{3})_\n (?P<version>V\\d{6})_\n (?P<processing_date>\\d{8}T\\d{6})Z-\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y%m%dT%H%M%S'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

all_bands: tuple[str, ...] = ('B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16', 'B17', 'B18', 'B19', 'B20', 'B21', 'B22', 'B23', 'B24', 'B25', 'B26', 'B27', 'B28', 'B29', 'B30', 'B31', 'B32', 'B33', 'B34', 'B35', 'B36', 'B37', 'B38', 'B39', 'B40', 'B41', 'B42', 'B43', 'B44', 'B45', 'B46', 'B47', 'B48', 'B49', 'B50', 'B51', 'B52', 'B53', 'B54', 'B55', 'B56', 'B57', 'B58', 'B59', 'B60', 'B61', 'B62', 'B63', 'B64', 'B65', 'B66', 'B67', 'B68', 'B69', 'B70', 'B71', 'B72', 'B73', 'B74', 'B75', 'B76', 'B77', 'B78', 'B79', 'B80', 'B81', 'B82', 'B83', 'B84', 'B85', 'B86', 'B87', 'B88', 'B89', 'B90', 'B91', 'B92', 'B93', 'B94', 'B95', 'B96', 'B97', 'B98', 'B99', 'B100', 'B101', 'B102', 'B103', 'B104', 'B105', 'B106', 'B107', 'B108', 'B109', 'B110', 'B111', 'B112', 'B113', 'B114', 'B115', 'B116', 'B117', 'B118', 'B119', 'B120', 'B121', 'B122', 'B123', 'B124', 'B125', 'B126', 'B127', 'B128', 'B129', 'B130', 'B131', 'B132', 'B133', 'B134', 'B135', 'B136', 'B137', 'B138', 'B139', 'B140', 'B141', 'B142', 'B143', 'B144', 'B145', 'B146', 'B147', 'B148', 'B149', 'B150', 'B151', 'B152', 'B153', 'B154', 'B155', 'B156', 'B157', 'B158', 'B159', 'B160', 'B161', 'B162', 'B163', 'B164', 'B165', 'B166', 'B167', 'B168', 'B169', 'B170', 'B171', 'B172', 'B173', 'B174', 'B175', 'B176', 'B177', 'B178', 'B179', 'B180', 'B181', 'B182', 'B183', 'B184', 'B185', 'B186', 'B187', 'B188', 'B189', 'B190', 'B191', 'B192', 'B193', 'B194', 'B195', 'B196', 'B197', 'B198', 'B199', 'B200', 'B201', 'B202', 'B203', 'B204', 'B205', 'B206', 'B207', 'B208', 'B209', 'B210', 'B211', 'B212', 'B213', 'B214', 'B215', 'B216', 'B217', 'B218', 'B219', 'B220', 'B221', 'B222', 'B223', 'B224')¶: Names of all available bands in the dataset

rgb_bands: tuple[str, ...] = ('B48', 'B30', 'B16')¶: Names of RGB bands in the dataset, used for plotting

__init__(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Initialize a new EnMAP instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (collections.abc.Sequence[str] | None) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

DatasetNotFoundError – If dataset is not found.

plot(sample, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – A sample returned by RasterDataset.__getitem__().
suptitle (str | None) – optional string to use as a suptitle

Returns:

A matplotlib Figure with the rendered sample.

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

EnviroAtlas¶

class torchgeo.datasets.EnviroAtlas(root='data', splits=['pittsburgh_pa-2010_1m-train'], layers=['naip', 'prior'], transforms=None, prior_as_input=False, cache=True, download=False, checksum=False)[source]¶

Bases: GeoDataset

EnviroAtlas dataset covering four cities with prior and weak input data layers.

The EnviroAtlas dataset contains NAIP aerial imagery, NLCD land cover labels, OpenStreetMap roads, water, waterways, and waterbodies, Microsoft building footprint labels, high-resolution land cover labels from the EPA EnviroAtlas dataset, and high-resolution land cover prior layers.

This dataset was organized to accompany the 2022 paper, “Resolving label uncertainty with implicit generative models”. More details can be found at https://github.com/estherrolf/implicit-posterior.

If you use this dataset in your research, please cite the following paper:

https://openreview.net/forum?id=AEa_UepnMDX

New in version 0.3.

__init__(root='data', splits=['pittsburgh_pa-2010_1m-train'], layers=['naip', 'prior'], transforms=None, prior_as_input=False, cache=True, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”
layers (Sequence[str]) – a list containing a subset of valid_layers indicating which layers to load
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
prior_as_input (bool) – bool describing whether the prior is used as an input (True) or as supervision (False)
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if splits or layers are not valid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Note: only plots the “naip” and “lc” layers.

Parameters:

sample (dict[str, Any]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

ValueError – if the NAIP layer isn’t included in self.layers

Return type:

Esri2020¶

class torchgeo.datasets.Esri2020(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: RasterDataset

Esri 2020 Land Cover Dataset.

The Esri 2020 Land Cover dataset consists of a global single band land use/land cover map derived from ESA Sentinel-2 imagery at 10m resolution with a total of 10 classes. It was published in July 2021 and used the Universal Transverse Mercator (UTM) projection. This dataset only contains labels, no raw satellite imagery.

The 10 classes are:

No Data
Water
Trees
Grass
Flooded Vegetation
Crops
Scrub/Shrub
Built Area
Bare Ground
Snow/Ice
Clouds

A more detailed explanation of the individual classes can be found here.

If you use this dataset please cite the following paper:

https://ieeexplore.ieee.org/document/9553499

New in version 0.3.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

filename_glob = '*_20200101-20210101.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '^\n (?P<id>[0-9][0-9][A-Z])\n _(?P<date>\\d{8})\n -(?P<processing_date>\\d{8})\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

EU-DEM¶

class torchgeo.datasets.EUDEM(paths='data', crs=None, res=None, transforms=None, cache=True, checksum=False)[source]¶

Bases: RasterDataset

European Digital Elevation Model (EU-DEM) Dataset.

EU-DEM is a Digital Elevation Model of reference for the entire European region.

Dataset features:

DEMs at 25 m per pixel spatial resolution (~40,000x40,0000 px)
vertical accuracy of +/- 7 m RMSE
data fused from ASTER GDEM, SRTM and Russian topomaps

Dataset format:

DEMs are single-channel tif files

New in version 0.3.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

filename_glob = 'eu_dem_v11_*.TIF'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '(?P<name>[eudem_v11]{10})_(?P<id>[A-Z0-9]{6})'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, cache=True, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load, here the collection of individual zip files for each tile should be found
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

EuroCrops¶

class torchgeo.datasets.EuroCrops(paths='data', crs=<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, res=(1e-05, 1e-05), classes=None, transforms=None, download=False, checksum=False)[source]¶

Bases: VectorDataset

EuroCrops Dataset (Version 9).

The EuroCrops dataset combines “all publicly available self-declared crop reporting datasets from countries of the European Union” into a unified format. The dataset is released under CC BY 4.0 Deed.

The dataset consists of shapefiles containing a total of 22M polygons. Each polygon is tagged with a “EC_hcat_n” attribute indicating the harmonized crop name grown within the polygon in the year associated with the shapefile.

If you use this dataset in your research, please follow the citation guidelines at:

https://github.com/maja601/EuroCrops#reference.

New in version 0.6.

filename_glob = '*_EC*.shp'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n ^(?P<country>[A-Z]{2})\n (_(?P<region>[A-Z]+))?\n _\n (?P<date>\\d{4})\n _\n (?P<suffix>EC(?:21)?)\n \\.shp$\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion

date_format = '%Y'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group.

__init__(paths='data', crs=<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, res=(1e-05, 1e-05), classes=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new EuroCrops instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search for files to load
crs (CRS) – coordinate reference system (CRS) to warp to (defaults to WGS-84)
res (float | tuple[float, float]) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution.
classes (list[str] | None) – list of classes to include (specified by their HCAT code), the rest will be mapped to 0 (defaults to all classes)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

get_label(feature)[source]¶

Get label value to use for rendering a feature.

Parameters:: feature (Feature) – the fiona.model.Feature from which to extract the label.
Returns:: the integer label, or 0 if the feature should not be rendered.
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by VectorDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

GBIF¶

class torchgeo.datasets.GBIF(root='data')[source]¶

Bases: GeoDataset

Dataset for the Global Biodiversity Information Facility.

GBIF, the Global Biodiversity Information Facility, is an international network and data infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth.

This dataset is intended for use with GBIF’s occurrence records. It may or may not work for other GBIF datasets. Data for a particular species or region of interest can be downloaded from the above link.

If you use a GBIF dataset in your research, please cite it according to:

https://www.gbif.org/citation-guidelines

New in version 0.3.

__init__(root='data')[source]¶

Initialize a new Dataset instance.

Parameters:: root (str | os.PathLike[str]) – root directory where dataset can be found
Raises:: DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for Figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.8.

GlobBiomass¶

class torchgeo.datasets.GlobBiomass(paths='data', crs=None, res=None, measurement='agb', transforms=None, cache=True, checksum=False)[source]¶

Bases: RasterDataset

GlobBiomass dataset.

The GlobBiomass dataset consists of global pixelwise aboveground biomass (AGB) and growth stock volume (GSV) maps.

Definitions:

AGB: the mass, expressed as oven-dry weight of the woody parts (stem, bark, branches and twigs) of all living trees excluding stump and roots.
GSV: volume of all living trees more than 10 cm in diameter at breast height measured over bark from ground or stump height to a top stem diameter of 0 cm.

Units:

AGB: m3/ha
GSV: tons/ha (i.e., Mg/ha)

Dataset features:

Global estimates of AGB and GSV at ~100 m per pixel resolution (45,000 x 45,000 px)
Per-pixel uncertainty expressed as standard error

Dataset format:

Estimate maps are single-channel
Uncertainty maps are single-channel

The data can be manually downloaded from this website.

If you use this dataset in your research, please cite the following dataset:

https://doi.org/10.1594/PANGAEA.894711

New in version 0.3.

filename_regex = '\n ^(?P<tile>[NS][\\d]{2}[EW][\\d]{3})\n _(?P<measurement>(agb|gsv))\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

mint: datetime = datetime.datetime(2010, 1, 1, 0, 0)¶: Minimum timestamp if not in filename

maxt: datetime = datetime.datetime(2010, 12, 31, 23, 59, 59, 999999)¶: Maximum timestamp if not in filename

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

dtype = torch.float32¶

__init__(paths='data', crs=None, res=None, measurement='agb', transforms=None, cache=True, checksum=False)[source]¶

Initialize a new GlobBiomass instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
measurement (str) – use data from ‘agb’ or ‘gsv’ measurement
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If measurement is not valid.
DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

filename_glob = '*_{}.tif'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

iNaturalist¶

class torchgeo.datasets.INaturalist(root='data')[source]¶

Bases: GeoDataset

Dataset for iNaturalist.

iNaturalist is a joint initiative of the California Academy of Sciences and the National Geographic Society. It allows citizen scientists to upload observations of organisms that can be downloaded by scientists and researchers.

If you use an iNaturalist dataset in your research, please cite it according to:

https://help.inaturalist.org/en/support/solutions/articles/151000170344-how-should-i-cite-inaturalist-

New in version 0.3.

__init__(root='data')[source]¶

Initialize a new Dataset instance.

Parameters:: root (str | os.PathLike[str]) – root directory where dataset can be found
Raises:: DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for Figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.8.

I/O Bench¶

class torchgeo.datasets.IOBench(root='data', split='preprocessed', crs=None, res=None, bands=['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7', 'SR_QA_AEROSOL'], classes=[0], transforms=None, cache=True, download=False, checksum=False)[source]¶

I/O Bench dataset.

I/O Bench is a dataset designed to benchmark the I/O performance of TorchGeo. It contains a single Landsat 9 scene and CDL file from 2023, and consists of the following splits

original: the original files as downloaded from USGS Earth Explorer and USDA CropScape
raw: the same files with compression and with CDL clipped to the bounds of the Landsat scene
preprocessed: the same files with compression, reprojected to the same CRS, as COGs, with TAP

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1145/3557915.3560953

New in version 0.6.

__init__(root='data', split='preprocessed', crs=None, res=None, bands=['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7', 'SR_QA_AEROSOL'], classes=[0], transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new IOBench instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (str) – One of ‘original’, ‘raw’, or ‘preprocessed’.
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – Resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (collections.abc.Sequence[str] | None) – Bands to return (defaults to all bands).
classes (list[int]) – List of classes to include, the rest will be mapped to 0.
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes an input sample and returns a transformed version.
cache (bool) – If True, cache file handle to speed up repeated sampling.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

AssertionError – If split argument is invalid.
DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – A sample returned by IntersectionDataset.__getitem__().
show_titles (bool) – Flag indicating whether to show titles above each panel.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

L7 Irish¶

class torchgeo.datasets.L7Irish(paths='data', crs=<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, res=None, bands=('B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80'), transforms=None, cache=True, download=False, checksum=False)[source]¶

L7 Irish dataset.

The L7 Irish dataset is based on Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1G scenes. Manually generated cloud masks are used to train and validate cloud cover assessment algorithms, which in turn are intended to compute the percentage of cloud cover in each scene.

Dataset features:

Images divided between 9 unique biomes
206 scenes from Landsat 7 ETM+ sensor
Imagery from global tiles between June 2000–December 2001
9 Level-1 spectral bands with 30 m per pixel resolution

Dataset format:

Images are composed of single multiband geotiffs
Labels are multiclass, stored in single geotiffs
Level-1 metadata (MTL.txt file)
Landsat 7 ETM+ bands: (B10, B20, B30, B40, B50, B61, B62, B70, B80)

Dataset classes:

Fill
Cloud Shadow
Clear
Thin Cloud
Cloud

If you use this dataset in your research, please cite the following:

New in version 0.5.

__init__(paths='data', crs=<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, res=None, bands=('B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80'), transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new L7Irish instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to EPSG:3857)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

L8 Biome¶

class torchgeo.datasets.L8Biome(paths, crs=<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, res=None, bands=('B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11'), transforms=None, cache=True, download=False, checksum=False)[source]¶

L8 Biome dataset.

The L8 Biome dataset is a validation dataset for cloud cover assessment algorithms, consisting of Pre-Collection Landsat 8 Operational Land Imager (OLI) Thermal Infrared Sensor (TIRS) terrain-corrected (Level-1T) scenes.

Dataset features:

Images evenly divided between 8 unique biomes
96 scenes from Landsat 8 OLI/TIRS sensors
Imagery from global tiles between April 2013–October 2014
11 Level-1 spectral bands with 30 m per pixel resolution

Dataset format:

Images are composed of single multiband geotiffs
Labels are multiclass, stored in single geotiffs
Quality assurance bands, stored in single geotiffs
Level-1 metadata (MTL.txt file)
Landsat 8 OLI/TIRS bands: (B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, B11)

Dataset classes:

Fill
Cloud Shadow
Clear
Thin Cloud
Cloud

If you use this dataset in your research, please cite the following:

New in version 0.5.

__init__(paths, crs=<Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World between 85.06°S and 85.06°N. - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, res=None, bands=('B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11'), transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new L8Biome instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to EPSG:3857)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

LandCover.ai Geo¶

class torchgeo.datasets.LandCoverAIBase(root='data', download=False, checksum=False)[source]¶

Bases: Dataset[dict[str, Any]], ABC

Abstract base class for LandCover.ai Geo and NonGeo datasets.

The LandCover.ai (Land Cover from Aerial Imagery) dataset is a dataset for automatic mapping of buildings, woodlands, water and roads from aerial images. This implementation is specifically for Version 1 of LandCover.ai.

Dataset features:

land cover from Poland, Central Europe
three spectral bands - RGB
33 orthophotos with 25 cm per pixel resolution (~9000x9500 px)
8 orthophotos with 50 cm per pixel resolution (~4200x4700 px)
total area of 216.27 km²

Dataset format:

rasters are three-channel GeoTiffs with EPSG:2180 spatial reference system
masks are single-channel GeoTiffs with EPSG:2180 spatial reference system

Dataset classes:

building (1.85 km²)
woodland (72.02 km²)
water (13.15 km²)
road (3.5 km²)

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2005.02264v4

New in version 0.5.

__init__(root='data', download=False, checksum=False)[source]¶

Initialize a new LandCover.ai dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms – a function/transform that takes input sample and its target as entry and returns a transformed version
cache – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

abstract __getitem__(query)[source]¶

Retrieve image, mask and metadata indexed by index.

Parameters:: query (Any) – coordinates or an index
Returns:: sample of image, mask and metadata at that index
Raises:: IndexError – if query is not found in the index
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

class torchgeo.datasets.LandCoverAIGeo(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: LandCoverAIBase, RasterDataset

LandCover.ai Geo dataset.

See the abstract LandCoverAIBase class to find out more.

New in version 0.5.

filename_glob = 'images/*.tif'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '.*tif'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

__init__(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new LandCover.ai NonGeo dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

Landsat¶

class torchgeo.datasets.Landsat(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset, ABC

Abstract base class for all Landsat datasets.

Landsat is a joint NASA/USGS program, providing the longest continuous space-based record of Earth’s land in existence.

If you use this dataset in your research, please cite it using the following format:

https://www.usgs.gov/centers/eros/data-citation

If you use any of the following Level-2 products, there may be additional citation requirements, including papers you can cite. See the “Citation Information” section of the following pages:

filename_regex = '\n ^L\n (?P<sensor>[COTEM])\n (?P<satellite>\\d{2})\n _(?P<processing_correction_level>[A-Z0-9]{4})\n _(?P<wrs_path>\\d{3})\n (?P<wrs_row>\\d{3})\n _(?P<date>\\d{8})\n _(?P<processing_date>\\d{8})\n _(?P<collection_number>\\d{2})\n _(?P<collection_category>[A-Z0-9]{2})\n _(?P<band>[A-Z0-9_]+)\n \\.\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

separate_files = True¶: True if data is stored in a separate file for each band, else False.

abstract property default_bands: tuple[str, ...]¶: Bands to load by default.

__init__(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (collections.abc.Sequence[str] | None) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

class torchgeo.datasets.Landsat9(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat8

Landsat 9 Operational Land Imager (OLI-2) and Thermal Infrared Sensor (TIRS-2).

filename_glob = 'LC09_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat8(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat

Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS).

filename_glob = 'LC08_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: tuple[str, ...] = ('SR_B4', 'SR_B3', 'SR_B2')¶: Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat7(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat

Landsat 7 Enhanced Thematic Mapper Plus (ETM+).

filename_glob = 'LE07_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: tuple[str, ...] = ('SR_B3', 'SR_B2', 'SR_B1')¶: Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat5TM(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat4TM

Landsat 5 Thematic Mapper (TM).

filename_glob = 'LT05_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat5MSS(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat4MSS

Landsat 4 Multispectral Scanner (MSS).

filename_glob = 'LM04_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat4TM(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat

Landsat 4 Thematic Mapper (TM).

filename_glob = 'LT04_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: tuple[str, ...] = ('SR_B3', 'SR_B2', 'SR_B1')¶: Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat4MSS(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat

Landsat 4 Multispectral Scanner (MSS).

filename_glob = 'LM04_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: tuple[str, ...] = ('B3', 'B2', 'B1')¶: Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat3(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat1

Landsat 3 Multispectral Scanner (MSS).

filename_glob = 'LM03_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat2(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat1

Landsat 2 Multispectral Scanner (MSS).

filename_glob = 'LM02_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat1(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: Landsat

Landsat 1 Multispectral Scanner (MSS).

filename_glob = 'LM01_*_{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: tuple[str, ...] = ('B6', 'B5', 'B4')¶: Names of RGB bands in the dataset, used for plotting

MMFlood¶

class torchgeo.datasets.MMFlood(root='data', crs=None, res=None, split='train', include_dem=False, include_hydro=False, transforms=None, download=False, checksum=False, cache=False)[source]¶

MMFlood dataset.

MMFlood dataset is a multimodal flood delineation dataset. Sentinel-1 data is matched with masks and DEM data for all available tiles. If hydrography maps are loaded, only a subset of the dataset is loaded, since only 1,012 Sentinel-1 tiles have a corresponding hydrography map. Some Sentinel-1 tiles have missing data, which are automatically set to 0. Corresponding pixels in masks are set to 255 and should be ignored in performance computation.

Dataset features:

1,748 Sentinel-1 tiles of varying pixel dimensions
multimodal dataset
95 flood events from 42 different countries
includes DEMs
includes hydrography maps (available for 1,012 tiles out of 1,748)
flood delineation maps (ground truth) is obtained from Copernicus EMS

Dataset classes:

no flood
flood

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/ACCESS.2022.3205419

New in version 0.7.

__init__(root='data', crs=None, res=None, split='train', include_dem=False, include_hydro=False, transforms=None, download=False, checksum=False, cache=False)[source]¶

Initialize a new MMFlood dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
split (str) – train/val/test split to load
include_dem (bool) – If True, DEM data is concatenated after Sentinel-1 bands.
include_hydro (bool) – If True, hydrography data is concatenated as last channel. Only a smaller subset of the original dataset is loaded in this case.
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
AssertionError – If split is invalid.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

NAIP¶

class torchgeo.datasets.NAIP(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

National Agriculture Imagery Program (NAIP) dataset.

The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. A primary goal of the NAIP program is to make digital ortho photography available to governmental agencies and the public within a year of acquisition.

NAIP is administered by the USDA’s Farm Service Agency (FSA) through the Aerial Photography Field Office in Salt Lake City. This “leaf-on” imagery is used as a base layer for GIS programs in FSA’s County Service Centers, and is used to maintain the Common Land Unit (CLU) boundaries.

If you use this dataset in your research, please cite it using the following format:

https://www.fisheries.noaa.gov/inport/item/49508/citation

filename_glob = 'm_*.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n ^m\n _(?P<quadrangle>\\d+)\n _(?P<quarter_quad>[a-z]+)\n _(?P<utm_zone>\\d+)\n _(?P<resolution>\\d+)\n _(?P<date>\\d+)\n (?:_(?P<processing_date>\\d+))?\n \\..*$\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

all_bands: tuple[str, ...] = ('R', 'G', 'B', 'NIR')¶: Names of all available bands in the dataset

rgb_bands: tuple[str, ...] = ('R', 'G', 'B')¶: Names of RGB bands in the dataset, used for plotting

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

NCCM¶

class torchgeo.datasets.NCCM(paths='data', crs=None, res=None, years=[2019], transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: RasterDataset

The Northeastern China Crop Map Dataset.

Link: https://www.nature.com/articles/s41597-021-00827-9

This dataset produced annual 10-m crop maps of the major crops (maize, soybean, and rice) in Northeast China from 2017 to 2019, using hierarchial mapping strategies, random forest classifiers, interpolated and smoothed 10-day Sentinel-2 time series data and optimized features from spectral, temporal and textural characteristics of the land surface. The resultant maps have high overall accuracies (OA) based on ground truth data. The dataset contains information specific to three years: 2017, 2018, 2019.

The dataset contains 5 classes:

paddy rice
maize
soybean
others crops and lands
nodata

Dataset format:

Three .TIF files containing the labels
JavaScript code to download images from the dataset.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1038/s41597-021-00827-9

New in version 0.6.

filename_regex = 'CDL(?P<date>\\d{4})_clip'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

filename_glob = 'CDL*.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

date_format = '%Y'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {0: (0, 255, 0, 255), 1: (255, 0, 0, 255), 2: (255, 255, 0, 255), 3: (128, 128, 128, 255), 15: (255, 255, 255, 255)}¶: Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2019], transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new dataset.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
years (list[int]) – list of years for which to use nccm layers
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by NCCM.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

NLCD¶

class torchgeo.datasets.NLCD(paths='data', crs=None, res=None, years=[2023], classes=[0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95], transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: RasterDataset

Annual National Land Cover Database (NLCD) dataset.

The Annual NLCD products is an annual land cover product for the conterminous U.S. initially covering the period from 1985 to 2023. The product is a joint effort between the United States Geological Survey (USGS) and the Multi-Resolution Land Characteristics Consortium (MRLC).

The dataset contains the following 17 classes:

Background
Open Water
Perennial Ice/Snow
Developed, Open Space
Developed, Low Intensity
Developed, Medium Intensity
Developed, High Intensity
Barren Land (Rock/Sand/Clay)
Deciduous Forest
Evergreen Forest
Mixed Forest
Shrub/Scrub
Grassland/Herbaceous
Pasture/Hay
Cultivated Crops
Woody Wetlands
Emergent Herbaceous Wetlands

Detailed descriptions of the classes can be found here.

Dataset format:

single channel .img file with integer class labels

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.5066/P94UXNTS

New in version 0.5.

filename_glob = 'Annual_NLCD_LndCov_*_CU_C1V0.tif'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = 'Annual_NLCD_LndCov_(?P<date>\\d{4})_CU_C1V0\\.tif'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {0: (0, 0, 0, 0), 11: (70, 107, 159, 255), 12: (209, 222, 248, 255), 21: (222, 197, 197, 255), 22: (217, 146, 130, 255), 23: (235, 0, 0, 255), 24: (171, 0, 0, 255), 31: (179, 172, 159, 255), 41: (104, 171, 95, 255), 42: (28, 95, 44, 255), 43: (181, 197, 143, 255), 52: (204, 184, 121, 255), 71: (223, 223, 194, 255), 81: (220, 217, 57, 255), 82: (171, 108, 40, 255), 90: (184, 217, 235, 255), 95: (108, 159, 184, 255)}¶: Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2023], classes=[0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95], transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
years (list[int]) – list of years for which to use nlcd layer
classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

AssertionError – if years or classes are invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Open Buildings¶

class torchgeo.datasets.OpenBuildings(paths='data', crs=None, res=0.0001, transforms=None, checksum=False)[source]¶

Bases: VectorDataset

Open Buildings dataset.

The Open Buildings dataset consists of computer generated building detections across the African continent.

Dataset features:

516M building detections as polygons with centroid lat/long
covering area of 19.4M km² (64% of the African continent)
confidence score and Plus Code

Dataset format:

csv files containing building detections compressed as csv.gz
meta data geojson file

The data can be downloaded from here. Additionally, the meta data geometry file also needs to be placed in root as tiles.geojson.

If you use this dataset in your research, please cite the following technical report:

https://arxiv.org/abs/2107.12283

New in version 0.3.

filename_glob = '*_buildings.csv'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

__init__(paths='data', crs=None, res=0.0001, transforms=None, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float]) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution.
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

PRISMA¶

class torchgeo.datasets.PRISMA(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

PRISMA dataset.

Hyperspectral Precursor and Application Mission PRISMA (PRecursore IperSpettrale della Missione Applicativa) is a medium-resolution hyperspectral imaging satellite, developed, owned, and operated by the Italian Space Agency ASI (Agenzia Spaziale Italiana). It is the successor to the discontinued HypSEO (Hyperspectral Satellite for Earth Observation) mission.

PRISMA carries two sensor instruments, the HYC (Hyperspectral Camera) module and the PAN (Panchromatic Camera) module. The HYC sensor is a prism spectrometer for two bands, VIS/NIR (Visible/Near Infrared) and NIR/SWIR (Near Infrared/Shortwave Infrared), with a total of 237 channels across both bands. Its primary mission objective is the high resolution hyperspectral imaging of land, vegetation, inner waters and coastal zones. The second sensor module, PAN, is a high resolution optical imager, and is co-registered with HYC data to allow testing of image fusion techniques.

The HYC module has a spatial resolution of 30 m and operates in two bands, a 66 channel VIS/NIR band with a spectral interval of 400-1010 nm, and a 171 channel NIR/SWIR band with a spectral interval of 920-2505 nm. It uses a pushbroom scanning technique with a swath width of 30 km, and a field of regard of 1000 km either side. The PAN module also uses a pushbroom scanning technique, with identical swath width and field of regard but spatial resolution of 5 m.

PRISMA is in a sun-synchronous orbit, with an altitude of 614 km, an inclination of 98.19° and its LTDN (Local Time on Descending Node) is at 1030 hours.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/IGARSS.2018.8517785

Note

PRISMA imagery is distributed as HDF5 files. However, TorchGeo does not yet have support for reprojecting and windowed reading of HDF5 files. This data loader requires you to first convert all files from HDF5 to GeoTIFF using something like this script.

New in version 0.6.

filename_glob = 'PRS_*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n ^PRS\n _(?P<level>[A-Z\\d]+)\n _(?P<product>[A-Z]+)\n (_(?P<order>[A-Z_]+))?\n _(?P<start>\\d{14})\n _(?P<stop>\\d{14})\n _(?P<version>\\d{4})\n (_(?P<valid>\\d))?\n \\.\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y%m%d%H%M%S'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Sentinel¶

class torchgeo.datasets.Sentinel(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: RasterDataset

Abstract base class for all Sentinel datasets.

Sentinel is a family of satellites launched by the European Space Agency (ESA) under the Copernicus Programme.

If you use this dataset in your research, please cite it using the following format:

https://asf.alaska.edu/datasets/daac/sentinel-1/

class torchgeo.datasets.Sentinel1(paths='data', crs=None, res=(10, 10), bands=['VV', 'VH'], transforms=None, cache=True)[source]¶

Bases: Sentinel

Sentinel-1 dataset.

The Sentinel-1 mission comprises a constellation of two polar-orbiting satellites, operating day and night performing C-band synthetic aperture radar imaging, enabling them to acquire imagery regardless of the weather.

Data can be downloaded from:

Product Types:

Level-0: Raw (RAW)
Level-1: Single Look Complex (SLC)
Level-1: Ground Range Detected (GRD)
Level-2: Ocean (OCN)

Polarizations:

HH: horizontal transmit, horizontal receive
HV: horizontal transmit, vertical receive
VV: vertical transmit, vertical receive
VH: vertical transmit, horizontal receive

Acquisition Modes:

Note

At the moment, this dataset only supports the GRD product type. Data must be radiometrically terrain corrected (RTC). This can be done manually using a DEM, or you can download an On Demand RTC product from ASF DAAC.

Note

Mixing $\gamma_0$ and $\sigma_0$ backscatter coefficient data is not recommended. Similarly, power, decibel, and amplitude scale data should not be mixed, and TorchGeo does not attempt to convert all data to a common scale.

New in version 0.4.

filename_regex = '\n ^S1(?P<mission>[AB])\n _(?P<mode>SM|IW|EW|WV)\n _(?P<date>\\d{8}T\\d{6})\n _(?P<polarization>[DS][HV])\n (?P<orbit>[PRO])\n _RTC(?P<spacing>\\d{2})\n _(?P<package>G)\n _(?P<backscatter>[gs])\n (?P<scale>[pda])\n (?P<mask>[uw])\n (?P<filter>[nf])\n (?P<area>[ec])\n (?P<matching>[dm])\n _(?P<product>[0-9A-Z]{4})\n _(?P<band>[VH]{2})\n \\.\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y%m%dT%H%M%S'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

all_bands: tuple[str, ...] = ('HH', 'HV', 'VV', 'VH')¶: Names of all available bands in the dataset

separate_files = True¶: True if data is stored in a separate file for each band, else False.

__init__(paths='data', crs=None, res=(10, 10), bands=['VV', 'VH'], transforms=None, cache=True)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | list[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float]) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to [“VV”, “VH”])
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

AssertionError – if bands is invalid
DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

filename_glob = 'S1*{}.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

class torchgeo.datasets.Sentinel2(paths='data', crs=None, res=10, bands=None, transforms=None, cache=True)[source]¶

Bases: Sentinel

Sentinel-2 dataset.

The Copernicus Sentinel-2 mission comprises a constellation of two polar-orbiting satellites placed in the same sun-synchronous orbit, phased at 180° to each other. It aims at monitoring variability in land surface conditions, and its wide swath width (290 km) and high revisit time (10 days at the equator with one satellite, and 5 days with 2 satellites under cloud-free conditions which results in 2-3 days at mid-latitudes) will support monitoring of Earth’s surface changes.

date_format = '%Y%m%dT%H%M%S'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

all_bands: tuple[str, ...] = ('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12')¶: Names of all available bands in the dataset

rgb_bands: tuple[str, ...] = ('B04', 'B03', 'B02')¶: Names of RGB bands in the dataset, used for plotting

separate_files = True¶: True if data is stored in a separate file for each band, else False.

__init__(paths='data', crs=None, res=10, bands=None, transforms=None, cache=True)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float]) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
bands (collections.abc.Sequence[str] | None) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths

filename_glob = 'T*_*_{}*.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n ^T(?P<tile>\\d{{2}}[A-Z]{{3}})\n _(?P<date>\\d{{8}}T\\d{{6}})\n _(?P<band>B[018][\\dA])\n (?:_(?P<resolution>{}m))?\n \\..*$\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

South Africa Crop Type¶

class torchgeo.datasets.SouthAfricaCropType(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, download=False)[source]¶

Bases: RasterDataset

South Africa Crop Type Challenge dataset.

The South Africa Crop Type Challenge dataset includes satellite imagery from Sentinel-1 and Sentinel-2 and labels for crop type that were collected by aerial and vehicle survey from May 2017 to March 2018. Data was provided by the Western Cape Department of Agriculture and is available via the Radiant Earth Foundation. For each field id the dataset contains time series imagery and a single label mask. Since TorchGeo does not yet support timeseries datasets, the first available imagery in July will be returned for each field. Note that the dates for S1 and S2 imagery for a given field are not guaranteed to be the same. Due to this date mismatch only S1 or S2 bands may be queried at a time, a mix of both is not supported. Each pixel in the label contains an integer field number and crop type class.

Dataset format:

images are 2-band Sentinel 1 and 12-band Sentinel-2 data with a cloud mask
masks are tiff images with unique values representing the class and field id.

Dataset classes:

No Data
Lucerne/Medics
Planted pastures (perennial)
Fallow
Wine grapes
Weeds
Small grain grazing
Wheat
Canola
Rooibos

If you use this dataset in your research, please cite the following dataset:

Western Cape Department of Agriculture, Radiant Earth Foundation (2021) “Crop Type Classification Dataset for Western Cape, South Africa”, Version 1.0, Radiant MLHub, https://doi.org/10.34911/rdnt.j0co8q

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

New in version 0.6.

filename_regex = '\n ^(?P<field_id>\\d+)\n _(?P<date>\\d{4}_07_\\d{2})\n _(?P<band>[BHV\\d]+)\n _10m\n '¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y_%m_%d'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

rgb_bands: tuple[str, ...] = ('B04', 'B03', 'B02')¶: Names of RGB bands in the dataset, used for plotting

all_bands: tuple[str, ...] = ('VH', 'VV', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12')¶: Names of all available bands in the dataset

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 7: (222, 166, 9, 255), 8: (111, 166, 0, 255), 9: (0, 175, 73, 255)}¶: Color map for the dataset, used for plotting

__init__(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, download=False)[source]¶

Initialize a new South Africa Crop Type dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – paths directory where dataset can be found
crs (pyproj.crs.crs.CRS | None) – coordinate reference system to be used
classes (Sequence[int]) – crop type classes to be included
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

filename_glob = '*_07_*_{}_10m.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

South America Soybean¶

class torchgeo.datasets.SouthAmericaSoybean(paths='data', crs=None, res=None, years=[2021], transforms=None, cache=True, download=False, checksum=False)[source]¶

Bases: RasterDataset

South America Soybean Dataset.

This dataset produced annual 30-m soybean maps of South America from 2001 to 2021.

Link: https://www.nature.com/articles/s41893-021-00729-z

Dataset contains 2 classes:

other
soybean

Dataset Format:

21 .tif files

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1038/s41893-021-00729-z

New in version 0.6.

filename_glob = 'SouthAmerica_Soybean_*.*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = 'SouthAmerica_Soybean_(?P<year>\\d{4})'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

__init__(paths='data', crs=None, res=None, years=[2021], transforms=None, cache=True, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
years (list[int]) – list of years for which to use the South America Soybean layer
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Non-geospatial Datasets¶

NonGeoDataset is designed for datasets that lack geospatial information. These datasets can still be combined using ConcatDataset.

C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection, IC = image captioning¶
Dataset	Task	Source	License	# Samples	# Classes	Size (px)	Resolution (m)	Bands
ADVANCE	C	Google Earth, Freesound	CC-BY-4.0	5,075	13	512x512	0.5	RGB
Benin Cashew Plantations	S	Airbus Pléiades	CC-BY-4.0	70	6	1,122x1,186	10	MSI
BigEarthNet	C	Sentinel-1/2	CDLA-Permissive-1.0	590,326	19–43	120x120	10	SAR, MSI
BioMassters	R	Sentinel-1/2 and Lidar	CC-BY-4.0			256x256	10	SAR, MSI
BRIGHT	CD	MAXAR, NAIP, Capella, Umbra	CC-BY-4.0 AND CC-BY-NC-4.0	3239	4	1024x1024	0.1–1	RGB, SAR
CaBuAr	CD	Sentinel-2	OpenRAIL	424	2	512x512	20	MSI
CaFFe	S	Sentinel-1, TerraSAR-X, TanDEM-X, ENVISAT, ERS-1/2, ALOS PALSAR, and RADARSAT-1	CC-BY-4.0	19092	2 or 4	512x512	6-20	SAR
ChaBuD	CD	Sentinel-2	OpenRAIL	356	2	512x512	10	MSI
Cloud Cover Detection	S	Sentinel-2	CC-BY-4.0	22,728	2	512x512	10	MSI
Copernicus-Pretrain	T	Sentinel-1/2/3/5P, DEM	CC-BY-4.0	18.7M		264x264 or 96x96 or 28x28 or 960x960	10–1000	SAR, MSI, Air Pollutants, DEM
COWC	C, R	CSUAV AFRL, ISPRS, LINZ, AGRC	AGPL-3.0-only	388,435	2	256x256	0.15	RGB
CropHarvest	C	Sentinel-1/2, SRTM, ERA5	CC-BY-SA-4.0	70,213	351	1x1	10	SAR, MSI, SRTM
Kenya Crop Type	S	Sentinel-2	CC-BY-SA-4.0	4,688	7	3,035x2,016	10	MSI
DeepGlobe Land Cover	S	DigitalGlobe +Vivid		803	7	2,448x2,448	0.5	RGB
DFC2022	S	Aerial	CC-BY-4.0	3,981	15	2,000x2,000	0.5	RGB
DIOR	OD	Aerial	CC-BY-NC-4.0	23,463	20	800x800	0.5	RGB
Digital Typhoon	C, R	Himawari	CC-BY-4.0	189,364	8	512	5000	Infrared
DL4GAM	S	Sentinel-2	CC-BY-4.0	2,251 or 11,440	2	256x256	10	MSI
DOTA	OD	Google Earth, Gaofen-2, Jilin-1, CycloMedia B.V.	non-commercial	5,229	15	800–4000		RGB
ETCI2021 Flood Detection	S	Sentinel-1		66,810	2	256x256	5–20	SAR
EuroSAT	C	Sentinel-2	MIT	27,000	10	64x64	10	MSI
EverWatch	OD	Aerial	CC0-1.0	5,325	8	1,500x1500p	0.01	RGB
FAIR1M	OD	Gaofen/Google Earth	CC-BY-NC-SA-3.0	15,000	37	1,024x1,024	0.3–0.8	RGB
Fields Of The World	S,I	Sentinel-2	Various	70795	2,3	256x256	10	MSI
FireRisk	C	NAIP Aerial	CC-BY-NC-4.0	91,872	7	320x320	1	RGB
Forest Damage	OD	Drone imagery	CDLA-Permissive-1.0	1,543	4	1,500x1,500		RGB
GeoNRW	S	Aerial	CC-BY-4.0	7,783	11	1,000x1,000	1	RGB, DEM
GID-15	S	Gaofen-2		150	15	6,800x7,200	3	RGB
HySpecNet-11k		EnMAP	CC0-1.0	11k		128	30	HSI
IDTReeS	OD,C	Aerial	CC-BY-4.0	591	33	200x200	0.1–1	RGB
Inria Aerial Image Labeling	S	Aerial		360	2	5,000x5,000	0.3	RGB
LandCover.ai	S	Aerial	CC-BY-NC-SA-4.0	10,674	5	512x512	0.25–0.5	RGB
LEVIR-CD	CD	Google Earth		637	2	1,024x1,024	0.5	RGB
LEVIR-CD+	CD	Google Earth		985	2	1,024x1,024	0.5	RGB
LoveDA	S	Google Earth	CC-BY-NC-SA-4.0	5,987	7	1,024x1,024	0.3	RGB
MapInWild	S	Sentinel-1/2, ESA WorldCover, NOAA VIIRS DNB	CC-BY-4.0	1018	1	1920x1920	10–463.83	SAR, MSI, 2020_Map, avg_rad
MDAS	S	Sentinel-1/2,EnMAP,HySpex	CC-BY-SA-4.0	3	20	100x120, 300x360, 1364x1636, 10000x12000, 15000x18000	0.3–30	HSI
Million-AID	C	Google Earth		1M	51–73		0.5–153	RGB
MMEarth	C, S	Aster, Sentinel, ERA5	CC-BY-4.0	100K–1M		128x128 or 64x64	10	MSI
NASA Marine Debris	OD	PlanetScope	Apache-2.0	707	1	256x256	3	RGB
OSCD	CD	Sentinel-2	CC-BY-4.0	24	2	40–1,180	60	MSI
PASTIS	I	Sentinel-1/2	CC-BY-4.0	2,433	19	128x128xT	10	MSI
PatternNet	C	Google Earth	CC-BY-4.0	30,400	38	256x256	0.06–5	RGB
Potsdam	S	Aerial		38	6	6,000x6,000	0.05	MSI
QuakeSet	C, R	Sentinel-1	OpenRAIL	3,327	2	512x512	10	SAR
ReforesTree	OD, R	Aerial	CC-BY-4.0	100	6	4,000x4,000	0.02	RGB
RESISC45	C	Google Earth	CC-BY-NC-4.0	31,500	45	256x256	0.2–30	RGB
Rwanda Field Boundary	S	Planetscope	NICFI AND CC-BY-4.0	70	2	256x256	4.7	RGB + NIR
SatlasPretrain	C, R, S, I, OD	NAIP, Landsat, Sentinel	ESA AND CC0-1.0 AND ODbL-1.0 AND CC-BY-4.0	302M	137	512	0.6–30	SAR, MSI
Seasonal Contrast	T	Sentinel-2	CC-BY-4.0	100K–1M		264x264	10	MSI
SeasoNet	S	Sentinel-2	CC-BY-4.0	1,759,830	33	120x120	10	MSI
SEN12MS	S	Sentinel-1/2, MODIS	CC-BY-4.0	180,662	33	256x256	10	SAR, MSI
SKIPP’D	R	Fish-eye	CC-BY-4.0	363,375		64x64		RGB
SkyScript	IC	NAIP, orthophotos, Planet SkySat, Sentinel-2, Landsat 8–9	MIT	5.2M		100–1000	0.1–30	RGB
So2Sat	C	Sentinel-1/2	CC-BY-4.0	400,673	17	32x32	10	SAR, MSI
SODA	OD	Aerial	CC-BY-NC-4.0	2513	9	~2700x~4800	RGB
Solar Plants Brazil	S	Aerial	CC-BY-NC-4.0	272	2	256x256	10	RGB + NIR
SSL4EO-L	T	Landsat	CC0-1.0	1M		264x264	30	MSI
SSL4EO-S12	T	Sentinel-1/2	CC-BY-4.0	1M		264x264	10	SAR, MSI
SSL4EO-L Benchmark	S	Lansat & CDL	CC0-1.0	25K	134	264x264	30	MSI
SSL4EO-L Benchmark	S	Lansat & NLCD	CC0-1.0	25K	17	264x264	30	MSI
Substation	S	OpenStreetMap & Sentinel-2	CC-BY-4.0	27K	2	228x228	10	MSI
SustainBench Crop Yield	R	MODIS	CC-BY-SA-4.0	11k		32x32		MSI
TreeSatAI	C, R, S	Aerial, Sentinel-1/2	CC-BY-4.0	50K	12, 15, 20	6, 20, 304	0.2, 10	CIR, MSI, SAR
Tropical Cyclone	R	GOES 8–16	CC-BY-4.0	108,110		256x256	4K–8K	MSI
UC Merced	C	USGS National Map	public domain	2,100	21	256x256	0.3	RGB
USAVars	R	NAIP Aerial	CC-BY-4.0	100K			4	RGB, NIR
Vaihingen	S	Aerial		33	6	1,281–3,816	0.09	RGB
VHR-10	I	Google Earth, Vaihingen	CC-BY-NC-4.0	800	10	358–1,728	0.08–2	RGB
Western USA Live Fuel Moisture	R	Landsat8, Sentinel-1	CC-BY-NC-ND-4.0	2615
xView2	CD	Maxar	CC-BY-NC-SA-4.0	3,732	4	1,024x1,024	0.8	RGB
ZueriCrop	I, T	Sentinel-2	CC-BY-NC-4.0	116K	48	24x24	10	MSI

ADVANCE¶

class torchgeo.datasets.ADVANCE(root='data', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

ADVANCE dataset.

The ADVANCE dataset is a dataset for audio visual scene recognition.

Dataset features:

5,075 pairs of geotagged audio recordings and images
three spectral bands - RGB (512x512 px)
10-second audio recordings

Dataset format:

images are three-channel jpgs
audio files are in wav format

Dataset classes:

airport
beach
bridge
farmland
forest
grassland
harbour
lake
orchard
residential
sparse shrub land
sports land
train station

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1007/978-3-030-58586-0_5

Note

This dataset requires the following additional library to be installed:

scipy to load the audio files to tensors

__init__(root='data', transforms=None, download=False, checksum=False)[source]¶

Initialize a new ADVANCE dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If scipy is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

Benin Cashew Plantations¶

class torchgeo.datasets.BeninSmallHolderCashews(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False)[source]¶

Bases: NonGeoDataset

Smallholder Cashew Plantations in Benin dataset.

This dataset contains labels for cashew plantations in a 120 km² area in the center of Benin. Each pixel is classified for Well-managed plantation, Poorly-managed plantation, No plantation and other classes. The labels are generated using a combination of ground data collection with a handheld GPS device, and final corrections based on Airbus Pléiades imagery. See this website for dataset details.

Specifically, the data consists of Sentinel 2 imagery from a 120 km² area in the center of Benin over 71 points in time from 11/05/2019 to 10/30/2020 and polygon labels for 6 classes:

No data
Well-managed plantation
Poorly-managed planatation
Non-plantation
Residential
Background
Uncertain

If you use this dataset in your research, please cite the following:

https://beta.source.coop/technoserve/cashews-benin/

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

__init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False)[source]¶

Initialize a new Benin Smallholder Cashew Plantations Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
chip_size (int) – size of chips
stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

AssertionError – If bands is invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: a dict containing image, mask, transform, crs, and metadata at index.
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of chips in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, time_step=0, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step (int) – time step at which to access image, beginning with 0
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

New in version 0.2.

BigEarthNet¶

class torchgeo.datasets.BigEarthNet(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

BigEarthNet dataset.

The BigEarthNet dataset is a dataset for multilabel remote sensing image scene classification.

Dataset features:

590,326 patches from 125 Sentinel-1 and Sentinel-2 tiles
Imagery from tiles in Europe between Jun 2017 - May 2018
12 spectral bands with 10-60 m per pixel resolution (base 120x120 px)
2 synthetic aperture radar bands (120x120 px)
43 or 19 scene classes from the 2018 CORINE Land Cover database (CLC 2018)

Dataset format:

images are composed of multiple single channel geotiffs
labels are multiclass, stored in a single json file per image
mapping of Sentinel-1 to Sentinel-2 patches are within Sentinel-1 json files
Sentinel-1 bands: (VV, VH)
Sentinel-2 bands: (B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)
All bands: (VV, VH, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)
Sentinel-2 bands are of different spatial resolutions and upsampled to 10m

Dataset classes (43):

Continuous urban fabric
Discontinuous urban fabric
Industrial or commercial units
Road and rail networks and associated land
Port areas
Airports
Mineral extraction sites
Dump sites
Construction sites
Green urban areas
Sport and leisure facilities
Non-irrigated arable land
Permanently irrigated land
Rice fields
Vineyards
Fruit trees and berry plantations
Olive groves
Pastures
Annual crops associated with permanent crops
Complex cultivation patterns
Land principally occupied by agriculture, with significant areas of natural vegetation
Agro-forestry areas
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grassland
Moors and heathland
Sclerophyllous vegetation
Transitional woodland/shrub
Beaches, dunes, sands
Bare rock
Sparsely vegetated areas
Burnt areas
Inland marshes
Peatbogs
Salt marshes
Salines
Intertidal flats
Water courses
Water bodies
Coastal lagoons
Estuaries
Sea and ocean

Dataset classes (19):

Urban fabric
Industrial or commercial units
Arable land
Permanent crops
Pastures
Complex cultivation patterns
Land principally occupied by agriculture, with significant areas of natural vegetation
Agro-forestry areas
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grassland and sparsely vegetated areas
Moors, heathland and sclerophyllous vegetation
Transitional woodland, shrub
Beaches, dunes, sands
Inland wetlands
Coastal wetlands
Inland waters
Marine waters

The source for the above dataset classes, their respective ordering, and 43-to-19-class mappings can be found here:

https://git.tu-berlin.de/rsim/BigEarthNet-S2_19-classes_models/-/blob/master/label_indices.json

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/IGARSS.2019.8900532

__init__(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]¶

Initialize a new BigEarthNet dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – train/val/test split to load
bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
num_classes (int) – number of classes to load in target. one of {19, 43}
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

class torchgeo.datasets.BigEarthNetV2(root='data', split='train', bands='all', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

BigEarthNetV2 dataset.

The BigEarthNet V2 dataset contains improved labels, improved geospatial data splits and additionally pixel-level labels from CORINE Land Cover (CLC) map of 2018. Additionally, some problematic patches from V1 have been removed.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2407.03653

New in version 0.7.

__init__(root='data', split='train', bands='all', transforms=None, download=False, checksum=False)[source]¶

Initialize a new BigEarthNet V2 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – train/val/test split to load
bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
AssertionError – If split, or bands, are not valid.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

BioMassters¶

class torchgeo.datasets.BioMassters(root='data', split='train', sensors=['S1', 'S2'], as_time_series=False)[source]¶

Bases: NonGeoDataset

BioMassters Dataset for Aboveground Biomass prediction.

Dataset intended for Aboveground Biomass (AGB) prediction over Finnish forests based on Sentinel 1 and 2 data with corresponding target AGB mask values generated by Light Detection and Ranging (LiDAR).

Dataset Format:

.tif files for Sentinel 1 and 2 data
.tif file for pixel wise AGB target mask
.csv files for metadata regarding features and targets

Dataset Features:

13,000 target AGB masks of size (256x256px)
12 months of data per target mask
Sentinel 1 and Sentinel 2 data for each location
Sentinel 1 available for every month
Sentinel 2 available for almost every month (not available for every month due to ESA acquisition halt over the region during particular periods)

If you use this dataset in your research, please cite the following paper:

https://nascetti-a.github.io/BioMasster/

Note

This dataset can be downloaded from Torchgeo Hugging Face Hub.

New in version 0.5.

__init__(root='data', split='train', sensors=['S1', 'S2'], as_time_series=False)[source]¶

Initialize a new instance of BioMassters dataset.

If as_time_series=False (the default), each time step becomes its own sample with the target being shared across multiple samples.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – train or test split
sensors (Sequence[str]) – which sensors to consider for the sample, Sentinel 1 and/or Sentinel 2 (‘S1’, ‘S2’)
as_time_series (bool) – whether or not to return all available time-steps or just a single one for a given target location

Raises:

AssertionError – if split or sensors is invalid
DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and labels at that index
Raises:: IndexError – if index is out of range of the dataset
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the length of the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

BRIGHT¶

class torchgeo.datasets.BRIGHTDFC2025(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

BRIGHT DFC2025 dataset.

The BRIGHT dataset consists of bi-temporal high-resolution multimodal images for building damage assessment. The dataset is part of the 2025 IEEE GRSS Data Fusion Contest. The pre-disaster images are optical images and the post-disaster images are SAR images, and targets were manually annotated. The dataset is split into train, val, and test splits, but the test split does not contain targets in this version.

More information can be found at the Challenge website.

Dataset Features:

Pre-disaster optical images from MAXAR, NAIP, NOAA Digital Coast Raster Datasets, and the National Plan for Aerial Orthophotography Spain
Post-disaster SAR images from Capella Space and Umbra
high image resolution of 0.3-1m

Dataset Format:

Images are in GeoTIFF format with pixel dimensions of 1024x1024
Pre-disaster are three channel images
Post-disaster SAR images are single channel but repeated to have 3 channels

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2501.06019

New in version 0.7.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new BRIGHT DFC2025 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – train/val/test split to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
AssertionError – If split is not one of ‘train’, ‘val’, or ‘test.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and target at that index, pre and post image are returned under separate image keys
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: number of samples in the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

CaBuAr¶

class torchgeo.datasets.CaBuAr(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

CaBuAr dataset.

CaBuAr is a dataset for Change detection for Burned area Delineation and part of the splits are used for the ChaBuD ECML-PKDD 2023 Discovery Challenge.

Dataset features:

Sentinel-2 multispectral imagery
binary masks of burned areas
12 multispectral bands
424 pairs of pre and post images with 20 m per pixel resolution (512x512 px)

Dataset format:

single hdf5 dataset containing images and masks

Dataset classes:

no change
burned area

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/MGRS.2023.3292467

Note

This dataset requires the following additional library to be installed:

h5py to load the dataset

New in version 0.6.

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new CaBuAr dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, “test”
bands (tuple[str, ...]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If split or bands arguments are invalid.
DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: sample containing image and mask
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

CaFFe¶

class torchgeo.datasets.CaFFe(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

CaFFe (CAlving Fronts and where to Find thEm) dataset.

The CaFFe dataset is a semantic segmentation dataset of marine-terminating glaciers.

Dataset features:

13,090 train, 2,241 validation, and 3,761 test images
varying spatial resolution of 6-20m
paired binary calving front segmentation masks
paired multi-class land cover segmentation masks

Dataset format:

images are single-channel pngs with dimension 512x512
segmentation masks are single-channel pngs

Dataset classes:

N/A
rock
glacier
ocean/ice melange

If you use this dataset in your research, please cite the following paper:

https://essd.copernicus.org/articles/14/4287/2022/

New in version 0.7.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new instance of CaFFe dataset.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of images in the dataset.

__getitem__(idx)[source]¶

Return the image and mask at the given index.

Parameters:: idx (int) – index of the image and mask to return
Returns:: a dict containing the image and mask
Return type:: dict

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by CaFFe.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

ChaBuD¶

class torchgeo.datasets.ChaBuD(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

ChaBuD dataset.

ChaBuD is a dataset for Change detection for Burned area Delineation and is used for the ChaBuD ECML-PKDD 2023 Discovery Challenge.

Dataset features:

Sentinel-2 multispectral imagery
binary masks of burned areas
12 multispectral bands
356 pairs of pre and post images with 10 m per pixel resolution (512x512 px)

Dataset format:

single hdf5 dataset containing images and masks

Dataset classes:

no change
burned area

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1016/j.rse.2021.112603

Note

This dataset requires the following additional library to be installed:

h5py to load the dataset

New in version 0.6.

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new ChaBuD dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “val”
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If split or bands arguments are invalid.
DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: sample containing image and mask
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

Cloud Cover Detection¶

class torchgeo.datasets.CloudCoverDetection(root='data', split='train', bands=('B02', 'B03', 'B04', 'B08'), transforms=None, download=False)[source]¶

Bases: NonGeoDataset

Sentinel-2 Cloud Cover Segmentation Dataset.

This training dataset was generated as part of a crowdsourcing competition on DrivenData.org, and later on was validated using a team of expert annotators. See this website for dataset details.

The dataset consists of Sentinel-2 satellite imagery and corresponding cloudy labels stored as GeoTiffs. There are 22,728 chips in the training data, collected between 2018 and 2020.

Each chip has:

4 multi-spectral bands from Sentinel-2 L2A product. The four bands are [B02, B03, B04, B08] (refer to Sentinel-2 documentation for more information about the bands).
Label raster for the corresponding source tile representing a binary classification for if the pixel is a cloud or not.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.34911/RDNT.HFQ6M7

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

New in version 0.4.

__init__(root='data', split='train', bands=('B02', 'B03', 'B04', 'B08'), transforms=None, download=False)[source]¶

Initiatlize a CloudCoverDetection instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – ‘train’ or ‘test’
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

AssertionError – If split or bands are invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of items in the dataset.

Returns:: length of dataset in integer
Return type:: int

__getitem__(index)[source]¶

Returns a sample from dataset.

Parameters:: index (int) – index to return
Returns:: data and label at given index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step – time step at which to access image, beginning with 0
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Copernicus-Pretrain¶

class torchgeo.datasets.CopernicusPretrain(*args, **kwargs)[source]¶

Bases: IterableDataset[dict[str, Any]]

Copernicus-Pretrain dataset.

Copernicus-Pretrain is an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P). The images are organized into ~310K regional grids (0.25°x0.25°, consistent with ERA5), densely covering the whole land surface and near-land ocean with time series from eight distinct Sentinel modalities.

This dataset class uses WebDataset for efficient data loading in distributed environments, which returns a PyTorch IterableDataset that is compatible with Pytorch DataLoader. Note: it is recommended to further use webdataset.WebLoader (a wrapper on DataLoader) for more features in data loading.

The full dataset has varying number of modalities, S1/2 local patches, and timestamps for different grids. For simplicity, the current dataset class provides a minimum example:

only use grids with all modalities (220k)
sample one local patch for S1 and S2
sample one timestamp for each modality

Therefore, each sample contains 8 tensors (S1, S2, S3, S5P NO2/CO/SO2/O3, DEM) and a JSON metadata.

Example:

dataset = CopernicusPretrain(
    urls='data/example-{000000..000009}.tar', shardshuffle=True, resampled=True
)

# Check the first sample
sample = next(iter(dataset))
s1 = sample['s1_grd.pth']
s2 = sample['s2_toa.pth']
s3 = sample['s3_olci.pth']
s5p_co = sample['s5p_co.pth']
s5p_no2 = sample['s5p_no2.pth']
s5p_o3 = sample['s5p_o3.pth']
s5p_so2 = sample['s5p_so2.pth']
dem = sample['dem.pth']

# Create a DataLoader for distributed training on 2 GPUs
dataset = dataset.dataset.batched(10) # batch size
dataloader = webdataset.WebLoader(
    dataset, batch_size=None, num_workers=2
)
# Unbatch, shuffle, and rebatch to mix samples from different workers
dataloader = dataloader.unbatched().shuffle(100).batched(10)
# A resampled dataset is infinite size, but we can recreate a fixed epoch length
# Total number of samples / (batch size * world size)
number_of_batches = 1000 // (10 * 2)
dataloader = dataloader.with_epoch(number_of_batches)

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2503.11849

Note

This dataset requires the following additional library to be installed:

https://pypi.org/project/webdataset/ to load the dataset.

New in version 0.7.

__init__(*args, **kwargs)[source]¶

Initialize a new CopernicusPretrain instance.

Parameters:

*args (Any) – Arguments passed to WebDataset base class.
**kwargs (Any) – Keyword arguments passed to WebDataset base class.

__iter__()[source]¶

Iterate over images and metadata in the dataset.

Yields:: sample of images and metadata

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – A sample returned by __iter__().
show_titles (bool) – Flag indicating whether to show titles above each panel.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Return type:

COWC¶

class torchgeo.datasets.COWC(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset, ABC

Abstract base class for the COWC dataset.

The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.

The dataset has the following attributes:

Data from overhead at 15 cm per pixel resolution at ground (all data is EO).
Data from six distinct locations: Toronto, Canada; Selwyn, New Zealand; Potsdam and Vaihingen, Germany; Columbus, Ohio and Utah, United States.
32,716 unique annotated cars. 58,247 unique negative examples.
Intentional selection of hard negative examples.
Established baseline for detection and counting tasks.
Extra testing scenes for use after validation.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1007/978-3-319-46487-9_48

abstract property base_url: str¶: Base URL to download dataset from.

abstract property filenames: tuple[str, ...]¶: List of files to download.

abstract property md5s: tuple[str, ...]¶: List of MD5 checksums of files to download.

abstract property filename: str¶: Filename containing train/test split and target labels.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new COWC dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

class torchgeo.datasets.COWCCounting(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: COWC

COWC Dataset for car counting.

class torchgeo.datasets.COWCDetection(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: COWC

COWC Dataset for car detection.

CropHarvest¶

class torchgeo.datasets.CropHarvest(root='data', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

CropHarvest dataset.

CropHarvest is a crop classification dataset.

Dataset features:

single pixel time series with crop-type labels
18 bands per image over 12 months

Dataset format:

arrays are 12x18 with 18 bands over 12 months

Dataset properties:

is_crop - whether or not a single pixel contains cropland
classification_label - optional field identifying a specific crop type
dataset - source dataset for the imagery
lat - latitude
lon - longitude

If you use this dataset in your research, please cite the following paper:

https://openreview.net/forum?id=JtjzUXPEaCu

This dataset requires the following additional library to be installed:

h5py to load the dataset

New in version 0.6.

__init__(root='data', transforms=None, download=False, checksum=False)[source]¶

Initialize a new CropHarvest dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: single pixel time-series array and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, suptitle=None)[source]¶

Plot a sample from the dataset using bands for Agriculture RGB composite.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

Kenya Crop Type¶

class torchgeo.datasets.CV4AKenyaCropType(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False)[source]¶

Bases: NonGeoDataset

CV4A Kenya Crop Type Competition dataset.

The CV4A Kenya Crop Type Competition dataset was produced as part of the Crop Type Detection competition at the Computer Vision for Agriculture (CV4A) Workshop at the ICLR 2020 conference. The objective of the competition was to create a machine learning model to classify fields by crop type from images collected during the growing season by the Sentinel-2 satellites.

See the dataset documentation for details.

Consists of 4 tiles of Sentinel 2 imagery from 13 different points in time.

Each tile has:

13 multi-band observations throughout the growing season. Each observation includes 12 bands from Sentinel-2 L2A product, and a cloud probability layer. The twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12] (refer to Sentinel-2 documentation for more information about the bands). The cloud probability layer is a product of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) and provides an estimated cloud probability (0-100%) per pixel. All of the bands are mapped to a common 10 m spatial resolution grid.
A raster layer indicating the crop ID for the fields in the training set.
A raster layer indicating field IDs for the fields (both training and test sets). Fields with a crop ID 0 are the test fields.

There are 3,286 fields in the train set and 1,402 fields in the test set.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.34911/RDNT.DW605X

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

__init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False)[source]¶

Initialize a new CV4A Kenya Crop Type Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
chip_size (int) – size of chips
stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

AssertionError – If bands are invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data, labels, field ids, and metadata at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of chips in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, time_step=0, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step (int) – time step at which to access image, beginning with 0
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

New in version 0.2.

DeepGlobe Land Cover¶

class torchgeo.datasets.DeepGlobeLandCover(root='data', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

DeepGlobe Land Cover Classification Challenge dataset.

The DeepGlobe Land Cover Classification Challenge dataset offers high-resolution sub-meter satellite imagery focusing for the task of semantic segmentation to detect areas of urban, agriculture, rangeland, forest, water, barren, and unknown. It contains 1,146 satellite images of size 2448 x 2448 pixels in total, split into training/validation/test sets, the original dataset can be downloaded from Kaggle. However, we only use the training dataset with 803 images since the original test and valid dataset are not accompanied by labels. The dataset that we use with a custom train/test split can be downloaded from Kaggle (created as a part of Computer Vision by Deep Learning (CS4245) course offered at TU Delft).

Dataset format:

images are RGB data
masks are RGB image with with unique RGB values representing the class

Dataset classes:

Urban land
Agriculture land
Rangeland
Forest land
Water
Barren land
Unknown

File names for satellite images and the corresponding mask image are id_sat.jpg and id_mask.png, where id is an integer assigned to every image.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/pdf/1805.06561

Note

This dataset can be downloaded using:

$ pip install kaggle  # place api key at ~/.kaggle/kaggle.json
$ kaggle datasets download -d geoap96/deepglobe2018-landcover-segmentation-traindataset
$ unzip deepglobe2018-landcover-segmentation-traindataset.zip

New in version 0.3.

__init__(root='data', split='train', transforms=None, checksum=False)[source]¶

Initialize a new DeepGlobeLandCover dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery

Returns:

a matplotlib Figure with the rendered sample

Return type:

DFC2022¶

class torchgeo.datasets.DFC2022(root='data', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

DFC2022 dataset.

The DFC2022 dataset is used as a benchmark dataset for the 2022 IEEE GRSS Data Fusion Contest and extends the MiniFrance dataset for semi-supervised semantic segmentation. The dataset consists of a train set containing labeled and unlabeled imagery and an unlabeled validation set. The dataset can be downloaded from the IEEEDataPort DFC2022 website.

Dataset features:

RGB aerial images at 0.5 m per pixel spatial resolution (~2,000x2,0000 px)
DEMs at 1 m per pixel spatial resolution (~1,000x1,0000 px)
Masks at 0.5 m per pixel spatial resolution (~2,000x2,0000 px)
16 land use/land cover categories
Images collected from the IGN BD ORTHO database
DEMs collected from the IGN RGE ALTI database
Labels collected from the UrbanAtlas 2012 database
Data collected from 19 regions in France

Dataset format:

images are three-channel geotiffs
DEMS are single-channel geotiffs
masks are single-channel geotiffs with the pixel values represent the class

Dataset classes:

No information
Urban fabric
Industrial, commercial, public, military, private and transport units
Mine, dump and construction sites
Artificial non-agricultural vegetated areas
Arable land (annual crops)
Permanent crops
Pastures
Complex and mixed cultivation patterns
Orchards at the fringe of urban classes
Forests
Herbaceous vegetation associations
Open spaces with little or no vegetation
Wetlands
Water
Clouds and Shadows

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1007/s10994-020-05943-y

New in version 0.3.

__init__(root='data', split='train', transforms=None, checksum=False)[source]¶

Initialize a new DFC2022 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split is invalid
DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

DIOR¶

class torchgeo.datasets.DIOR(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

DIOR dataset.

DIOR dataset contains horizontal bounding box annotations of Google Earth Aerial RGB imagery. The test split does not contain bounding box annotations and labels.

Dataset features:

20 classes
192,472 manually annotated bounding box instances

Dataset format:

Images are three channel .jpg files.
Annotations are in Pascal VOC XML format

Classes:

Airplane
Airport
Baseball Field
Basketball Court
Bridge
Chimney
Dam
Expressway Service Area
Expressway Toll Station
Golf Field
Ground Track Field
Harbor
Overpass
Ship
Stadium
Storage Tank
Tennis Court
Train Station
Vehicle
Windmill

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/1909.00133

New in version 0.7.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new DIOR dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (Literal['train', 'val', 'test']) – split of the dataset to use, one of ‘train’, ‘val’, ‘test’
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found or corrupted and download is False.
AssertionError – If split argument is invalid.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(idx)[source]¶

Return an index within the dataset.

Parameters:: idx (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None, box_alpha=0.7)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
box_alpha (float) – alpha value for boxes

Returns:

a matplotlib Figure with the rendered sample

Return type:

Digital Typhoon¶

class torchgeo.datasets.DigitalTyphoon(root='data', task='regression', features=['wind'], targets=['wind'], sequence_length=3, min_feature_value=None, max_feature_value=None, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

Digital Typhoon Dataset for Analysis Task.

This dataset contains typhoon-centered images, derived from hourly infrared channel images captured by meteorological satellites. It incorporates data from multiple generations of the Himawari weather satellite, dating back to 1978. These images have been transformed into brightness temperatures and adjusted for varying satellite sensor readings, yielding a consistent spatio-temporal dataset that covers over four decades.

See the Digital Typhoon website for more information about the dataset.

Dataset features:

infrared channel images from the Himawari weather satellite (512x512 px) at 5km spatial resolution
auxiliary features such as wind speed, pressure, and more that can be used for regression or classification tasks
1,099 typhoons and 189,364 images

Dataset format:

hdf5 files containing the infrared channel images
.csv files containing the metadata for each image

If you use this dataset in your research, please cite the following papers:

https://doi.org/10.20783/DIAS.664

New in version 0.6.

__init__(root='data', task='regression', features=['wind'], targets=['wind'], sequence_length=3, min_feature_value=None, max_feature_value=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new Digital Typhoon dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
task (str) – whether to load ‘regression’ or ‘classification’ labels
features (Sequence[str]) – which auxiliary features to return
targets (Sequence[str]) – which auxiliary features to use as targets
sequence_length (int) – length of the sequence to return
min_feature_value (dict[str, float] | None) – minimum value for each feature
max_feature_value (dict[str, float] | None) – maximum value for each feature
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If any arguments are invalid.
DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data, labels, and metadata at that index
Return type:: dict[str, Any]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

DL4GAM¶

class torchgeo.datasets.DL4GAMAlps(root='data', split='train', cv_iter=1, version='small', bands=('B4', 'B3', 'B2', 'B8', 'B11'), extra_features=None, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

A Multi-modal Dataset for Glacier Mapping (Segmentation) in the European Alps.

The dataset consists of Sentinel-2 images from 2015 (mainly), 2016 and 2017, and binary segmentation masks for glaciers, based on an inventory built by glaciology experts (Paul et al. 2020).

Given that glacier ice is not always visible in the images, due to seasonal snow, shadow/cloud cover and, most importantly, debris cover, the dataset also includes additional features that can help in the segmentation task.

Dataset features:

Sentinel-2 images (all bands, including cloud and shadow masks which can be used for loss masking)
glacier mask (0: no glacier, 1: glacier)
debris mask (0: no debris, 1: debris) based on a mix of three publications (Scherler et al. 2018, Herreid & Pellicciotti 2020, Linsbauer et al. 2021)
DEM (Copernicus GLO-30) + five derived features (using xDEM): slope, aspect, terrain ruggedness index, planform and profile curvatures
dh/dt (surface elevation change) map over 2010-2015 (Hugonnet et al. 2021)
v (surface velocity) map over 2015 (ITS_LIVE)

Other specifications:

temporal coverage: one acquisition per glacier, from either 2015 (mainly), 2016, or 2017
spatial coverage: only glaciers larger than 0.1 km² are considered (n=1593, after manual QC), totalling ~1685 km² which represents ~93% of the total inventory area for this region
2251 patches sampled with overlap from the 1593 glaciers; or 11440 for the large version, obtained with an increased sampling overlap
the dataset download size is 5.8 GB (11 GB when unarchived); or 29.5 GB (52 GB when unarchived) for the large version
the dataset is provided at 10m GSD (after bilinearly resampling some of the Sentinel-2 bands and the additional features which come at a lower resolution)
the dataset provides fixed training, validation, and test geographical splits (70-10-20, by glacier area)
five different splits are provided, according to a five-fold cross-validation scheme
all the features/masks are stacked and provided as NetCDF files (one or more per glacier), structured as data/{glacier_id}/{glacier_id}_{patch_number}_{center_x}_{center_y}.nc
data is projected and geocoded in local UTM zones

For more details check also: https://huggingface.co/datasets/dcodrut/dl4gam_alps

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.22541/essoar.173557607.70204641/v1

Note

This dataset requires the following additional libraries to be installed:

xarray
netcdf4 or h5netcdf

New in version 0.7.

__init__(root='data', split='train', cv_iter=1, version='small', bands=('B4', 'B3', 'B2', 'B8', 'B11'), extra_features=None, transforms=None, download=False, checksum=False)[source]¶

Initialize the dataset.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
cv_iter (int) – one of 1, 2, 3, 4, 5 (for the five-fold geographical cross-validation scheme)
version (str) – one of “small” or “large” (controls the sampling overlap)
bands (Sequence[str]) – the Sentinel-2 bands to use as input (default: RGB + NIR + SWIR)
extra_features (collections.abc.Sequence[str] | None) – additional features to include (default: None; see the class attribute for the available)
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if any parameters are invalid.
DatasetNotFoundError – if dataset is not found and download is False.
DependencyNotFoundError – if xarray is not installed.

__len__()[source]¶

The length of the dataset.

Returns:: the number of patches in the dataset
Return type:: int

__getitem__(index)[source]¶

Load the NetCDF file for the given index and return the sample as a dict.

Parameters:

index (int) – index of the sample to return

Returns:

a dictionary containing the sample with the following:

the Sentinel-2 image (selected bands)

the glacier mask (binary mask with all the glaciers in the current patch)

the debris mask

the cloud and shadow mask

the additional features (DEM, derived features, etc.) if required

Return type:

dict

plot(sample, show_titles=True, suptitle=None, clip_extrema=True)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by DL4GAMAlps.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
clip_extrema (bool) – flag indicating whether to clip the lowest/highest 2.5% of the values for contrast enhancement

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

DOTA¶

class torchgeo.datasets.DOTA(root='data', split='train', version='2.0', bbox_orientation='oriented', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

DOTA dataset.

DOTA is a large-scale object detection dataset for aerial imagery containing RGB and gray-scale imagery from Google Earth, GF-2 and JL-1 satellites as well as additional aerial imagery from CycloMedia. There are three versions of the dataset: v1.0, v1.5, and v2.0, where, v1.0 and v1.5 have the same images but different annotations, and v2.0 extends both the images and annotations with more samples

Dataset features:

1869 samples in v1.0 and v1.5 and 2423 samples in v2.0
multi-class object detection (15 classes in v1.0 and v1.5 and 18 classes in v2.0)
horizontal and oriented bounding boxes

Dataset format:

images are three channel PNGs with various pixel sizes
annotations are text files with one line per bounding box

Classes:

plane
ship
storage-tank
baseball-diamond
tennis-court
basketball-court
ground-track-field
harbor
bridge
large-vehicle
small-vehicle
helicopter
roundabout
soccer-ball-field
swimming-pool
container-crane (v1.5+)
airport (v2.0+)
helipad (v2.0+)

If you use this work in your research, please cite the following papers:

New in version 0.7.

__init__(root='data', split='train', version='2.0', bbox_orientation='oriented', transforms=None, download=False, checksum=False)[source]¶

Initialize a new DOTA dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (Literal['train', 'val']) – split of the dataset to use, one of [‘train’, ‘val’]
version (Literal['1.0', '1.5', '2.0']) – version of the dataset to use, one of [‘1.0’, ‘1.5’, ‘2.0’]
bbox_orientation (Literal['horizontal', 'oriented']) – bounding box orientation, one of [‘horizontal’, ‘oriented’], where horizontal returnx xyxy format and oriented returns x1y1x2y2x3y3x4y4 format
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split, version, or bbox_orientation argument are not valid
DatasetNotFoundError – if dataset is not found or corrupted, and download is False

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None, box_alpha=0.7)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__
show_titles (bool) – flag indicating whether to show titles
suptitle (str | None) – optional string to use as a suptitle
box_alpha (float) – alpha value for boxes

Returns:

a matplotlib Figure with the rendered sample

Return type:

ETCI2021 Flood Detection¶

class torchgeo.datasets.ETCI2021(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

ETCI 2021 Flood Detection dataset.

The ETCI2021 dataset is a dataset for flood detection

Dataset features:

33,405 VV & VH Sentinel-1 Synthetic Aperture Radar (SAR) images
2 binary masks per image representing water body & flood, respectively
2 polarization band images (VV, VH) of 3 RGB channels per band
3 RGB channels per band generated by the Hybrid Pluggable Processing Pipeline (hyp3)
Images with 5x20m per pixel resolution (256x256) px) taken in Interferometric Wide Swath acquisition mode
Flood events from 5 different regions

Dataset format:

VV band three-channel png
VH band three-channel png
water body mask single-channel png where no water body = 0, water body = 255
flood mask single-channel png where no flood = 0, flood = 255

Dataset classes:

no flood/water
flood/water

If you use this dataset in your research, please add the following to your acknowledgements section:

The authors would like to thank the NASA Earth Science Data Systems Program,
NASA Digital Transformation AI/ML thrust, and IEEE GRSS for organizing
the ETCI competition.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new ETCI 2021 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

EuroSAT¶

class torchgeo.datasets.EuroSAT(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B8A'), transforms=None, download=False, checksum=False)[source]¶

EuroSAT dataset.

The EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consists of 10 target classes with a total of 27,000 labeled and geo-referenced images.

Dataset format:

rasters are 13-channel GeoTiffs
labels are values in the range [0,9]

Dataset classes:

Annual Crop
Forest
Herbaceous Vegetation
Highway
Industrial Buildings
Pasture
Permanent Crop
Residential Buildings
River
Sea & Lake

This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:

https://arxiv.org/abs/1911.06721

If you use this dataset in your research, please cite the following papers:

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B8A'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new EuroSAT dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
bands (Sequence[str]) – a sequence of band names to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

New in version 0.3: The bands parameter.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

New in version 0.2.

class torchgeo.datasets.EuroSATSpatial(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B8A'), transforms=None, download=False, checksum=False)[source]¶

Bases: EuroSAT

Overrides the default EuroSAT dataset splits.

Splits the data into training, validation, and test sets based on longitude. The splits are distributed as 60%, 20%, and 20% respectively.

New in version 0.6.

class torchgeo.datasets.EuroSAT100(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B8A'), transforms=None, download=False, checksum=False)[source]¶

Bases: EuroSAT

Subset of EuroSAT containing only 100 images.

Intended for tutorials and demonstrations, not for benchmarking.

Maintains the same file structure, classes, and train-val-test split. Each class has 10 images (6 train, 2 val, 2 test), for a total of 100 images.

New in version 0.5.

EverWatch¶

class torchgeo.datasets.EverWatch(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

EverWatch Bird Detection dataset.

The EverWatch Bird Detection dataset contains high-resolution aerial images of birds in the Everglades National Park. Seven bird species haven been annotated and classified.

Dataset features:

5128 training images with 50491 annotations
197 test images with 4113 annotations
seven different bird species

Dataset format:

images are three-channel pngs
annotations are csv file

Dataset Classes:

White Ibis (Eudocimus albus)
Great Egret (Ardea alba)
Great Blue Heron (Ardea herodias)
Snowy Egret (Egretta thula)
Wood Stork (Mycteria americana)
Roseate Spoonbill (Platalea ajaja)
Anhinga (Anhinga anhinga)
Unknown White (only present in test split)

If you use this dataset in your research, please cite the following source:

https://doi.org/10.5281/zenodo.11165946

New in version 0.7.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new EverWatch dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of {‘train’, ‘val’, ‘test’} to specify the dataset split
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
AssertionError – If split argument is invalid.

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, suptitle=None, box_alpha=0.7)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
suptitle (str | None) – optional string to use as a suptitle
box_alpha (float) – alpha value for boxes

Returns:

a matplotlib Figure with the rendered sample

Return type:

FAIR1M¶

class torchgeo.datasets.FAIR1M(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

FAIR1M dataset.

The FAIR1M dataset is a dataset for remote sensing fine-grained oriented object detection.

Dataset features:

15,000+ images with 0.3-0.8 m per pixel resolution (1,000-10,000 px)
1 million object instances
5 object categories, 37 object sub-categories
three spectral bands - RGB
images taken by Gaofen satellites and Google Earth

Dataset format:

images are three-channel tiffs
labels are xml files with PASCAL VOC like annotations

Dataset classes:

Passenger Ship
Motorboat
Fishing Boat
Tugboat
other-ship
Engineering Ship
Liquid Cargo Ship
Dry Cargo Ship
Warship
Small Car
Bus
Cargo Truck
Dump Truck
other-vehicle
Van
Trailer
Tractor
Excavator
Truck Tractor
Boeing737
Boeing747
Boeing777
Boeing787
ARJ21
C919
A220
A321
A330
A350
other-airplane
Baseball Field
Basketball Court
Football Field
Tennis Court
Roundabout
Intersection
Bridge

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1016/j.isprsjprs.2021.12.004

New in version 0.2.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new FAIR1M dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: Added split and download parameters.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Fields Of The World¶

class torchgeo.datasets.FieldsOfTheWorld(root='data', split='train', target='2-class', countries=['austria'], transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

Fields Of The World dataset.

The Fields Of The World datataset is a semantic and instance segmentation dataset for delineating field boundaries.

Dataset features:

70462 patches across 24 countries
Each country has a train, val, and test split
Semantic segmentations masks with and without the field boundary class
Instance segmentation masks

Dataset format:

images are four-channel GeoTIFFs with dimension 256x256
segmentation masks (both two and three class) are single-channel GeoTIFFs
instance masks are single-channel GeoTIFFs

Dataset classes:

background
field
field-boundary (three-class only)
unlabeled (kenya, rwanda, brazil and india have presence only labels)

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.48550/arXiv.2409.16252

New in version 0.7.

__init__(root='data', split='train', target='2-class', countries=['austria'], transforms=None, download=False, checksum=False)[source]¶

Initialize a new Fields Of The World dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
target (str) – one of “2-class”, “3-class”, or “instance” specifying which kind of target mask to load
countries (str | collections.abc.Sequence[str]) – which set of countries to load data from
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If any arguments are invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: image and mask at that index with image of dimension 3x1024x1024 and mask of dimension 1024x1024
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of datapoints in the dataset.

Returns:: length of dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

FireRisk¶

class torchgeo.datasets.FireRisk(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

FireRisk dataset.

The FireRisk dataset is a dataset for remote sensing fire risk classification.

Dataset features:

91,872 images with 1 m per pixel resolution (320x320 px)
70,331 and 21,541 train and val images, respectively
three spectral bands - RGB
7 fire risk classes
images extracted from NAIP tiles

Dataset format:

images are three-channel pngs

Dataset classes:

high
low
moderate
non-burnable
very_high
very_low
water

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2303.07035

New in version 0.5.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new FireRisk dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “val”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Forest Damage¶

class torchgeo.datasets.ForestDamage(root='data', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

Forest Damage dataset.

The ForestDamage dataset contains drone imagery that can be used for tree identification, as well as tree damage classification for larch trees.

Dataset features:

1543 images
101,878 tree annotations
subset of 840 images contain 44,522 annotations about tree health (Healthy (H), Light Damage (LD), High Damage (HD)), all other images have “other” as damage level

Dataset format:

images are three-channel jpgs
annotations are in Pascal VOC XML format

Dataset Classes:

other
healthy
light damage
high damage

If the download fails or stalls, it is recommended to try azcopy as suggested here. It is expected that the downloaded data file with name Data_Set_Larch_Casebearer can be found in root.

If you use this dataset in your research, please use the following citation:

Swedish Forest Agency (2021): Forest Damages - Larch Casebearer 1.0. National Forest Data Lab. Dataset.

New in version 0.3.

__init__(root='data', transforms=None, download=False, checksum=False)[source]¶

Initialize a new ForestDamage dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

GeoNRW¶

class torchgeo.datasets.GeoNRW(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

GeoNRW dataset.

This datasets contains RGB, DEM and segmentation label data from North Rhine-Westphalia, Germany.

Dataset features:

7298 training and 485 test samples
RGB images, 1000x1000px normalized to [0, 1]
DEM images, unnormalized
segmentation labels

Dataset format:

RGB images are three-channel jp2
DEM images are single-channel tif
segmentation labels are single-channel tif

Dataset classes:

background
forest
water
agricultural
residential,commercial,industrial
grassland,swamp,shrubbery
railway,trainstation
highway,squares
airport,shipyard
roads
buildings

Additional information about the dataset can be found on this site.

If you use this dataset in your research, please cite the following paper:

https://ieeexplore.ieee.org/document/9406194

New in version 0.6.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize the GeoNRW dataset.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

GID-15¶

class torchgeo.datasets.GID15(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

GID-15 dataset.

The GID-15 dataset is a dataset for semantic segmentation.

Dataset features:

images taken by the Gaofen-2 (GF-2) satellite over 60 cities in China
masks representing 15 semantic categories
three spectral bands - RGB
150 with 3 m per pixel resolution (6800x7200 px)

Dataset format:

images are three-channel pngs
masks are single-channel pngs
colormapped masks are 3 channel tifs

Dataset classes:

background
industrial_land
urban_residential
rural_residential
traffic_land
paddy_field
irrigated_land
dry_cropland
garden_plot
arbor_woodland
shrub_land
natural_grassland
artificial_grassland
river
lake
pond

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1016/j.rse.2019.111322

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new GID-15 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample return by __getitem__()
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

HySpecNet-11k¶

class torchgeo.datasets.HySpecNet11k(root='data', split='train', strategy='easy', bands=None, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

HySpecNet-11k dataset.

HySpecNet-11k is a large-scale benchmark dataset for hyperspectral image compression and self-supervised learning. It is made up of 11,483 nonoverlapping image patches acquired by the EnMAP satellite. Each patch is a portion of 128 x 128 pixels with 224 spectral bands and with a ground sample distance of 30 m.

To construct HySpecNet-11k, a total of 250 EnMAP tiles acquired during the routine operation phase between 2 November 2022 and 9 November 2022 were considered. The considered tiles are associated with less than 10% cloud and snow cover. The tiles were radiometrically, geometrically and atmospherically corrected (L2A water & land product). Then, the tiles were divided into nonoverlapping image patches. The cropped patches at the borders of the tiles were eliminated. As a result, more than 45 patches per tile are obtained, resulting in 11,483 patches for the full dataset.

We provide predefined splits obtained by randomly dividing HySpecNet into:

a training set that includes 70% of the patches,
a validation set that includes 20% of the patches, and
a test set that includes 10% of the patches.

Depending on the way that we used for splitting the dataset, we define two different splits:

an easy split, where patches from the same tile can be present in different sets (patchwise splitting); and
a hard split, where all patches from one tile belong to the same set (tilewise splitting).

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2306.00385

New in version 0.7.

__init__(root='data', split='train', strategy='easy', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new HySpecNet11k instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (str) – One of ‘train’, ‘val’, or ‘test’.
strategy (str) – Either ‘easy’ for patchwise splitting or ‘hard’ for tilewise splitting.
bands (collections.abc.Sequence[str] | None) – Bands to return.
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: Length of the dataset.
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and label at that index.
Return type:: dict[str, torch.Tensor]

plot(sample, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – A sample returned by __getitem__().
suptitle (str | None) – optional string to use as a suptitle

Returns:

A matplotlib Figure with the rendered sample.

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

IDTReeS¶

class torchgeo.datasets.IDTReeS(root='data', split='train', task='task1', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

IDTReeS dataset.

The IDTReeS dataset is a dataset for tree crown detection.

Dataset features:

RGB Image, Canopy Height Model (CHM), Hyperspectral Image (HSI), LiDAR Point Cloud
Remote sensing and field data generated by the National Ecological Observatory Network (NEON)
0.1 - 1m resolution imagery
Task 1 - object detection (tree crown delination)
Task 2 - object classification (species classification)
Train set contains 85 images
Test set (task 1) contains 153 images
Test set (task 2) contains 353 images and tree crown polygons

Dataset format:

optical - three-channel RGB 200x200 geotiff
canopy height model - one-channel 20x20 geotiff
hyperspectral - 369-channel 20x20 geotiff
point cloud - Nx3 LAS file (.las), some files contain RGB colors per point
shapely files (.shp) containing polygons
csv file containing species labels and other metadata for each polygon

Dataset classes:

ACPE
ACRU
ACSA3
AMLA
BETUL
CAGL8
CATO6
FAGR
GOLA
LITU
LYLU3
MAGNO
NYBI
NYSY
OXYDE
PEPA37
PIEL
PIPA2
PINUS
PITA
PRSE2
QUAL
QUCO2
QUGE2
QUHE2
QULA2
QULA3
QUMO4
QUNI
QURU
QUERC
ROPS
TSCA

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1101/2021.08.06.453503

This dataset requires the following additional library to be installed:

laspy to read lidar point clouds

New in version 0.2.

__init__(root='data', split='train', task='task1', transforms=None, download=False, checksum=False)[source]¶

Initialize a new IDTReeS dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
task (str) – ‘task1’ for detection, ‘task2’ for detection + classification (only relevant for split=’test’)
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If laspy is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, hsi_indices=(0, 1, 2))[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
hsi_indices (tuple[int, int, int]) – tuple of indices to create HSI false color image

Returns:

a matplotlib Figure with the rendered sample

Return type:

Inria Aerial Image Labeling¶

class torchgeo.datasets.InriaAerialImageLabeling(root='data', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

Inria Aerial Image Labeling Dataset.

The Inria Aerial Image Labeling dataset is a building detection dataset over dissimilar settlements ranging from densely populated areas to alpine towns. Refer to the dataset homepage to download the dataset.

Dataset features:

Coverage of 810 km² (405 km² for training and 405 km² for testing)
Aerial orthorectified color imagery with a spatial resolution of 0.3 m
Number of images: 360 (train: 180, test: 180)
Train cities: Austin, Chicago, Kitsap, West Tyrol, Vienna
Test cities: Bellingham, Bloomington, Innsbruck, San Francisco, East Tyrol

Dataset format:

Imagery - RGB aerial GeoTIFFs of shape 5000 x 5000
Labels - RGB aerial GeoTIFFs of shape 5000 x 5000

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/IGARSS.2017.8127684

New in version 0.3.

Changed in version 0.5: Added support for a val split.

__init__(root='data', split='train', transforms=None, checksum=False)[source]¶

Initialize a new InriaAerialImageLabeling Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – train/val/test split
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version.
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split is invalid
DatasetNotFoundError – If dataset is not found.

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

LandCover.ai¶

class torchgeo.datasets.LandCoverAI(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: LandCoverAIBase, NonGeoDataset

LandCover.ai dataset.

See the abstract LandCoverAIBase class to find out more.

Note

This dataset requires the following additional library to be installed:

opencv-python to generate the train/val/test split

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new LandCover.ai dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

class torchgeo.datasets.LandCoverAI100(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: LandCoverAI

Subset of LandCoverAI containing only 100 images.

Intended for tutorials and demonstrations, not for benchmarking.

Maintains the same file structure, classes, and train-val-test split.

New in version 0.7.

LEVIR-CD¶

class torchgeo.datasets.LEVIRCDBase(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset, ABC

Abstract base class for the LEVIRCD datasets.

New in version 0.6.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new LEVIR-CD base dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

class torchgeo.datasets.LEVIRCD(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: LEVIRCDBase

LEVIR-CD dataset.

The LEVIR-CD dataset is a dataset for building change detection.

Dataset features:

image pairs of 20 different urban regions across Texas between 2002-2018
binary change masks representing building change
three spectral bands - RGB
637 image pairs with 50 cm per pixel resolution (~1024x1024 px)

Dataset format:

images are three-channel pngs
masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

no change
change

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.3390/rs12101662

New in version 0.6.

LEVIR-CD+¶

class torchgeo.datasets.LEVIRCDPlus(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: LEVIRCDBase

LEVIR-CD+ dataset.

The LEVIR-CD+ dataset is a dataset for building change detection.

Dataset features:

image pairs of 20 different urban regions across Texas between 2002-2020
binary change masks representing building change
three spectral bands - RGB
985 image pairs with 50 cm per pixel resolution (~1024x1024 px)

Dataset format:

images are three-channel pngs
masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

no change
change

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2107.09244

LoveDA¶

class torchgeo.datasets.LoveDA(root='data', split='train', scene=['urban', 'rural'], transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

LoveDA dataset.

The LoveDA datataset is a semantic segmentation dataset.

Dataset features:

2713 urban scene and 3274 rural scene HSR images, spatial resolution of 0.3m
image source is Google Earth platform
total of 166768 annotated objects from Nanjing, Changzhou and Wuhan cities
dataset comes with predefined train, validation, and test set
dataset differentiates between ‘rural’ and ‘urban’ images

Dataset format:

images are three-channel pngs with dimension 1024x1024
segmentation masks are single-channel pngs

Dataset classes:

background
building
road
water
barren
forest
agriculture

No-data regions assigned with 0 and should be ignored.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2110.08733

New in version 0.2.

__init__(root='data', split='train', scene=['urban', 'rural'], transforms=None, download=False, checksum=False)[source]¶

Initialize a new LoveDA dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
scene (Sequence[str]) – specify whether to load only ‘urban’, only ‘rural’ or both
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split or scene arguments are invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: image and mask at that index with image of dimension 3x1024x1024 and mask of dimension 1024x1024
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of datapoints in the dataset.

Returns:: length of dataset
Return type:: int

plot(sample, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample return by __getitem__()
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

MapInWild¶

class torchgeo.datasets.MapInWild(root='data', modality=['mask', 'esa_wc', 'viirs', 's2_summer'], split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

MapInWild dataset.

The MapInWild dataset is curated for the task of wilderness mapping on a pixel-level. MapInWild is a multi-modal dataset and comprises various geodata acquired and formed from different RS sensors over 1018 locations: dual-pol Sentinel-1, four-season Sentinel-2 with 10 bands, ESA WorldCover map, and Visible Infrared Imaging Radiometer Suite NightTime Day/Night band. The dataset consists of 8144 images with the shape of 1920 x 1920 pixels. The images are weakly annotated from the World Database of Protected Areas (WDPA).

Dataset features:

1018 areas globally sampled from the WDPA
10-Band Sentinel-2
Dual-pol Sentinel-1
ESA WorldCover Land Cover
Visible Infrared Imaging Radiometer Suite NightTime Day/Night Band

If you use this dataset in your research, please cite the following paper:

https://ieeexplore.ieee.org/document/10089830

New in version 0.5.

__init__(root='data', modality=['mask', 'esa_wc', 'viirs', 's2_summer'], split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new MapInWild dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
modality (list[str]) – the modality to download. Choose from: “mask”, “esa_wc”, “viirs”, “s1”, “s2_temporal_subset”, “s2_[season]”.
split (str) – one of “train”, “validation”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample image-mask pair returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

MDAS¶

class torchgeo.datasets.MDAS(root='data', subareas=['sub_area_1'], modalities=['3K_RGB', 'HySpex', 'Sentinel_2'], transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

MDAS dataset.

The MDAS multimodal dataset is a comprehensive dataset for the city of Augsburg, Germany, collected on 7th May 2018. It includes SAR, multispectral, hyperspectral, DSM, and GIS data, providing comprehensive options for data fusion research. MDAS supports applications like resolution enhancement, spectral unmixing, and land cover classification.

Dataset features:

3K DSM data
3K high resolution RGB images
Original very high resolution HySpex airborne imagery
EeteS simulated imagery with 10m GSD and EnMAP spectral bands
EeteS simulated imagery with 30m GSD and EnMAP spectral bands
EeteS simulated imagery with 10m GSD and Sentinel-2 spectral bands
Sentinel-2 L2A product
Sentinel-1 GRD product
Open Street Map (OSM) labels, see this table for a table of the label distribution

Dataset format:

3K_RGB.tif (Shape: (4, 15000, 18000)px, Data Type: uint8)
3K_dsm.tif (Shape: (1, 10000, 12000)px, Data Type: float32)
HySpex.tif (Shape: (368, 1364, 1636)px, Data Type: int16)
EeteS_EnMAP_2dot2m.tif (Shape: (242, 1364, 1636)px, Data Type: float32)
EeteS_EnMAP_10m.tif (Shape: (242, 300, 360)px, Data Type: uint16)
EeteS_EnMAP_30m.tif (Shape: (242, 100, 120)px, Data Type: uint16)
EeteS_Sentinel_2_10m.tif (Shape: (4, 300, 360)px, Data Type: uint16)
Sentinel_2.tif (Shape: (12, 300, 360)px, Data Type: uint16)
Sentinel_1.tif (Shape: (2, 300, 360)px, Data Type: float32)
osm_buildings.tif (Shape: (1, 1364, 1636)px, Data Type: uint8)
osm_landuse.tif (Shape: (1, 1364, 1636)px, Data Type: float64)
osm_water.tif (Shape: (1, 1364, 1636)px, Data Type: float64)

If you use this dataset in your research, please cite the following paper:

https://essd.copernicus.org/articles/15/113/2023/

New in version 0.7.

__init__(root='data', subareas=['sub_area_1'], modalities=['3K_RGB', 'HySpex', 'Sentinel_2'], transforms=None, download=False, checksum=False)[source]¶

Initialize a new MDAS dataset instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where the dataset should be stored.
subareas (list[str]) – The subareas to load. Options are ‘sub_area_1’, ‘sub_area_2’, ‘sub_area_3’.
modalities (list[str]) – The modalities to load. Options are ‘3K_DSM’, ‘3K_RGB’, ‘HySpex’, ‘EeteS_EnMAP_10m’, ‘EeteS_EnMAP_30m’, ‘EeteS_Sentinel_2_10m’, ‘Sentinel-2’, ‘Sentinel-1’, ‘OSM_label’.
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes in a dictionary and returns a transformed version.
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – If True, check the integrity of the dataset after download.

Raises:

AssertionError – If the subareas or modalities are not valid.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: the length of the dataset
Return type:: int

__getitem__(idx)[source]¶

Return the dataset sample at the given index.

Parameters:: idx (int) – The index of the sample to return
Returns:: a dictionary containing the data of chosen modalities
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – A sample returned by __getitem__.
show_titles (bool) – Whether to display titles on the subplots.
suptitle (str | None) – An optional super title for the plot.

Returns:

a matplotlib Figure with the rendered sample

Return type:

Million-AID¶

class torchgeo.datasets.MillionAID(root='data', task='multi-class', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

Million-AID Dataset.

The MillionAID dataset consists of one million aerial images from Google Earth Engine that offers either a multi-class learning task with 51 classes or a multi-label learning task with 73 different possible labels. For more details please consult the accompanying paper.

Dataset features:

RGB aerial images with varying resolutions from 0.5 m to 153 m per pixel
images within classes can have different pixel dimension

Dataset format:

images are three-channel jpg

If you use this dataset in your research, please cite the following paper:

https://ieeexplore.ieee.org/document/9393553

New in version 0.3.

__init__(root='data', task='multi-class', split='train', transforms=None, checksum=False)[source]¶

Initialize a new MillionAID dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
task (str) – type of task, either “multi-class” or “multi-label”
split (str) – train or test split
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

MMEarth¶

class torchgeo.datasets.MMEarth(root='data', subset='MMEarth', modalities=('aster', 'biome', 'canopy_height_eth', 'dynamic_world', 'eco_region', 'era5', 'esa_worldcover', 'sentinel1_asc', 'sentinel1_desc', 'sentinel2', 'sentinel2_cloudmask', 'sentinel2_cloudprod', 'sentinel2_scl'), modality_bands=None, normalization_mode='z-score', transforms=None)[source]¶

Bases: NonGeoDataset

MMEarth dataset.

There are three different versions of the dataset, that vary in image size and the number of tiles:

MMEarth: 128x128 px, 1.2M tiles, 579 GB
MMEarth64: 64x64 px, 1.2M tiles, 162 GB
MMEarth100k: 128x128 px, 100K tiles, 48 GB

The dataset consists of 12 modalities:

Aster: elevation and slope
Biome: 14 terrestrial ecosystem categories
ETH Canopy Height: Canopy height and standard deviation
Dynamic World: 9 landcover categories
Ecoregion: 846 ecoregion categories
ERA5: Climate reanalysis data for temperature mean, min, and max of [year, month, previous month] and precipitation total of [year, month, previous month] (counted as separate modalities)
ESA World Cover: 11 landcover categories
Sentinel-1: VV, VH, HV, HH for ascending/descending orbit
Sentinel-2: multi-spectral B1-B12 for L1C/L2A products
Geolocation: cyclic encoding of latitude and longitude
Date: cyclic encoding of month

Additionally, there are three masks available as modalities:

Sentinel-2 Cloudmask: Sentinel-2 cloud mask
Sentinel-2 Cloud probability: Sentinel-2 cloud probability
Sentinel-2 SCL: Sentinel-2 scene classification

that are synchronized across tiles.

Dataset format:

Dataset in single HDF5 file
JSON files for band statistics, splits, and tile information

For additional information, as well as bash scripts to download the data, please refer to the official repository.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2405.02771

Note

This dataset requires the following additional library to be installed:

h5py to load the dataset

New in version 0.7.

__init__(root='data', subset='MMEarth', modalities=('aster', 'biome', 'canopy_height_eth', 'dynamic_world', 'eco_region', 'era5', 'esa_worldcover', 'sentinel1_asc', 'sentinel1_desc', 'sentinel2', 'sentinel2_cloudmask', 'sentinel2_cloudprod', 'sentinel2_scl'), modality_bands=None, normalization_mode='z-score', transforms=None)[source]¶

Initialize the MMEarth dataset.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
subset (str) – one of “MMEarth”, “MMEarth64”, or “MMEarth100k”
modalities (Sequence[str]) – list of modalities to load
modality_bands (dict[str, list[str]] | None) – dictionary of modality bands, see
normalization_mode (str) – one of “z-score” or “min-max”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample dictionary and returns a transformed version

Raises:

AssertionError – if normalization_mode or subset
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return a sample from the dataset.

Normalization is applied to the data with chosen normalization_mode. In addition to the modalities, the sample contains the following raw metadata:

lat: latitude
lon: longitude
date: date
crs: coordinate reference system
tile_id: tile identifier

Parameters:: index (int) – index to return
Returns:: dictionary containing the modalities and metadata of the sample
Return type:: dict[str, Any]

get_sample_specific_band_names(tile_info)[source]¶

Retrieve the sample specific band names.

Parameters:: tile_info (dict[str, Any]) – tile information for a sample
Returns:: dictionary containing the specific band names for each modality
Return type:: dict[str, list[str]]

get_intersection_dict(tile_info)[source]¶

Get intersection of requested and available bands.

Parameters:: tile_info (dict[str, Any]) – tile information for a sample
Returns:: Dictionary with intersected keys and lists.
Return type:: dict[str, list[str]]

__len__()[source]¶

Return the length of the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset as shown in fig. 2 from https://arxiv.org/pdf/2405.02771.

Parameters:

sample (dict[str, Any]) – A sample returned by __getitem__().
show_titles (bool) – Flag indicating whether to show titles above each panel.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Return type:

NASA Marine Debris¶

class torchgeo.datasets.NASAMarineDebris(root='data', transforms=None, download=False)[source]¶

Bases: NonGeoDataset

NASA Marine Debris dataset.

The NASA Marine Debris dataset is a dataset for detection of floating marine debris in satellite imagery.

Dataset features:

707 patches with 3 m per pixel resolution (256x256 px)
three spectral bands - RGB
1 object class: marine_debris
images taken by Planet Labs PlanetScope satellites
imagery taken from 2016-2019 from coasts of Greece, Honduras, and Ghana

Dataset format:

images are three-channel geotiffs in uint8 format
labels are numpy files (.npy) containing bounding box (xyxy) coordinates
additional: images in jpg format and labels in geojson format

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.34911/rdnt.9r6ekg

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

New in version 0.2.

__init__(root='data', transforms=None, download=False)[source]¶

Initialize a new NASA Marine Debris Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and labels at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

OSCD¶

class torchgeo.datasets.OSCD(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

OSCD dataset.

The Onera Satellite Change Detection dataset addresses the issue of detecting changes between satellite images from different dates. Imagery comes from Sentinel-2 which contains varying resolutions per band.

Dataset format:

images are 13-channel tifs
masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

no change
change

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/IGARSS.2018.8518015

New in version 0.2.

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new OSCD dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

PASTIS¶

class torchgeo.datasets.PASTIS(root='data', folds=(1, 2, 3, 4, 5), bands='s2', mode='semantic', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

PASTIS dataset.

The PASTIS dataset is a dataset for time-series panoptic segmentation of agricultural parcels.

Dataset features:

support for the original PASTIS and PASTIS-R versions of the dataset
2,433 time-series with 10 m per pixel resolution (128x128 px)
18 crop categories, 1 background category, 1 void category
semantic and instance annotations
3 Sentinel-1 Ascending bands
3 Sentinel-1 Descending bands
10 Sentinel-2 L2A multispectral bands

Dataset format:

time-series and annotations are in numpy format (.npy)

Dataset classes:

Background
Meadow
Soft Winter Wheat
Corn
Winter Barley
Winter Rapeseed
Spring Barley
Sunflower
Grapevine
Beet
Winter Triticale
Winter Durum Wheat
Fruits Vegetables Flowers
Potatoes
Leguminous Fodder
Soybeans
Orchard
Mixed Cereal
Sorghum
Void Label

If you use this dataset in your research, please cite the following papers:

New in version 0.5.

__init__(root='data', folds=(1, 2, 3, 4, 5), bands='s2', mode='semantic', transforms=None, download=False, checksum=False)[source]¶

Initialize a new PASTIS dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
folds (Sequence[int]) – a sequence of integers from 0 to 4 specifying which of the five dataset folds to include
bands (str) – load Sentinel-1 ascending path data (s1a), Sentinel-1 descending path data (s1d), or Sentinel-2 data (s2)
mode (str) – load semantic (semantic) or instance (instance) annotations
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

PatternNet¶

class torchgeo.datasets.PatternNet(root='data', transforms=None, download=False, checksum=False)[source]¶

PatternNet dataset.

The PatternNet dataset is a dataset for remote sensing scene classification and image retrieval.

Dataset features:

30,400 images with 6-50 cm per pixel resolution (256x256 px)
three spectral bands - RGB
38 scene classes, 800 images per class

Dataset format:

images are three-channel jpgs

Dataset classes:

airplane
baseball_field
basketball_court
beach
bridge
cemetery
chaparral
christmas_tree_farm
closed_road
coastal_mansion
crosswalk
dense_residential
ferry_terminal
football_field
forest
freeway
golf_course
harbor
intersection
mobile_home_park
nursing_home
oil_gas_field
oil_well
overpass
parking_lot
parking_space
railway
river
runway
runway_marking
shipping_yard
solar_panel
sparse_residential
storage_tank
swimming_pool
tennis_court
transformer_station
wastewater_treatment_plant

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1016/j.isprsjprs.2018.01.004

__init__(root='data', transforms=None, download=False, checksum=False)[source]¶

Initialize a new PatternNet dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

Potsdam¶

class torchgeo.datasets.Potsdam2D(root='data', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

Potsdam 2D Semantic Segmentation dataset.

The Potsdam dataset is a dataset for urban semantic segmentation used in the 2D Semantic Labeling Contest - Potsdam. This dataset uses the “4_Ortho_RGBIR.zip” and “5_Labels_all.zip” files to create the train/test sets used in the challenge. The dataset can be requested at the challenge homepage. Note, the server contains additional data for 3D Semantic Labeling which are currently not supported.

Dataset format:

images are 4-channel geotiffs
masks are 3-channel geotiffs with unique RGB values representing the class

Dataset classes:

Clutter/background
Impervious surfaces
Building
Low Vegetation
Tree
Car

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.5194/isprsannals-I-3-293-2012

New in version 0.2.

__init__(root='data', split='train', transforms=None, checksum=False)[source]¶

Initialize a new Potsdam dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If split is invalid.
DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery

Returns:

a matplotlib Figure with the rendered sample

Return type:

QuakeSet¶

class torchgeo.datasets.QuakeSet(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

QuakeSet dataset.

QuakeSet is a dataset for Earthquake Change Detection and Magnitude Estimation and is used for the Seismic Monitoring and Analysis (SMAC) ECML-PKDD 2024 Discovery Challenge.

Dataset features:

Sentinel-1 SAR imagery
before/pre/post imagery of areas affected by earthquakes
2 SAR bands (VV/VH)
3,327 pairs of pre and post images with 5 m per pixel resolution (512x512 px)
2 classification labels (unaffected / affected by earthquake)
pre/post image pairs represent earthquake affected areas
before/pre image pairs represent hard negative unaffected areas
earthquake magnitudes for each sample

Dataset format:

single hdf5 dataset containing images, magnitudes, hypercenters, and splits

Dataset classes:

unaffected area
earthquake affected area

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2403.18116

Note

This dataset requires the following additional library to be installed:

h5py to load the dataset

New in version 0.6.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new QuakeSet dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If split argument is invalid.
DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: sample containing image and mask
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

ReforesTree¶

class torchgeo.datasets.ReforesTree(root='data', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

ReforesTree dataset.

The ReforesTree dataset contains drone imagery that can be used for tree crown detection, tree species classification and Aboveground Biomass (AGB) estimation.

Dataset features:

100 high resolution RGB drone images at 2 cm/pixel of size 4,000 x 4,000 px
more than 4,600 tree crown box annotations
tree crown matched with field measurements of diameter at breast height (DBH), and computed AGB and carbon values

Dataset format:

images are three-channel pngs
annotations are csv file

Dataset Classes:

other
banana
cacao
citrus
fruit
timber

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2201.11192

New in version 0.3.

__init__(root='data', transforms=None, download=False, checksum=False)[source]¶

Initialize a new ReforesTree dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

RESISC45¶

class torchgeo.datasets.RESISC45(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

NWPU-RESISC45 dataset.

The RESISC45 dataset is a dataset for remote sensing image scene classification.

Dataset features:

31,500 images with 0.2-30 m per pixel resolution (256x256 px)
three spectral bands - RGB
45 scene classes, 700 images per class
images extracted from Google Earth from over 100 countries
images conditions with high variability (resolution, weather, illumination)

Dataset format:

images are three-channel jpgs

Dataset classes:

airplane
airport
baseball_diamond
basketball_court
beach
bridge
chaparral
church
circular_farmland
cloud
commercial_area
dense_residential
desert
forest
freeway
golf_course
ground_track_field
harbor
industrial_area
intersection
island
lake
meadow
medium_residential
mobile_home_park
mountain
overpass
palace
parking_lot
railway
railway_station
rectangular_farmland
river
roundabout
runway
sea_ice
ship
snowberg
sparse_residential
stadium
storage_tank
tennis_court
terrace
thermal_power_station
wetland

This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:

https://arxiv.org/abs/1911.06721

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/jproc.2017.2675998

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new RESISC45 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

Rwanda Field Boundary¶

class torchgeo.datasets.RwandaFieldBoundary(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04'), transforms=None, download=False)[source]¶

Bases: NonGeoDataset

Rwanda Field Boundary Competition dataset.

This dataset contains field boundaries for smallholder farms in eastern Rwanda. The Nasa Harvest program funded a team of annotators from TaQadam to label Planet imagery for the 2021 growing season for the purpose of conducting the Rwanda Field boundary detection Challenge. The dataset includes rasterized labeled field boundaries and time series satellite imagery from Planet’s NICFI program. Planet’s basemap imagery is provided for six months (March, April, August, October, November and December). Note: only fields that were big enough to be differentiated on the Planetscope imagery were labeled, only fields that were fully contained within the chips were labeled. The paired dataset is provided in 256x256 chips for a total of 70 tiles covering 1532 individual fields.

The labels are provided as binary semantic segmentation labels:

No field-boundary
Field-boundary

If you use this dataset in your research, please cite the following:

https://doi.org/10.34911/RDNT.G580WW

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

New in version 0.5.

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04'), transforms=None, download=False)[source]¶

Initialize a new RwandaFieldBoundary instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

AssertionError – If split or bands are invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of chips in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: a dict containing image and mask at index.
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, time_step=0, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step (int) – time step at which to access image, beginning with 0
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

SatlasPretrain¶

class torchgeo.datasets.SatlasPretrain(root='data', split='train_lowres', good_images='good_images_lowres_all', image_times='image_times', images=('sentinel1', 'sentinel2', 'landsat'), labels=('land_cover',), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SatlasPretrain dataset.

SatlasPretrain is a large-scale pre-training dataset for tasks that involve understanding satellite images. Regularly-updated satellite data is publicly available for much of the Earth through sources such as Sentinel-2 and NAIP, and can inform numerous applications from tackling illegal deforestation to monitoring marine infrastructure. However, developing automatic computer vision systems to parse these images requires a huge amount of manual labeling of training data. By combining over 30 TB of satellite images with 137 label categories, SatlasPretrain serves as an effective pre-training dataset that greatly reduces the effort needed to develop robust models for downstream satellite image applications.

Reference implementation:

https://github.com/allenai/satlas/blob/main/satlas/model/dataset.py

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.48550/arXiv.2211.15660

New in version 0.7.

Note

This dataset requires the following additional library to be installed:

AWS CLI: to download the dataset from AWS.

__init__(root='data', split='train_lowres', good_images='good_images_lowres_all', image_times='image_times', images=('sentinel1', 'sentinel2', 'landsat'), labels=('land_cover',), transforms=None, download=False, checksum=False)[source]¶

Initialize a new SatlasPretrain instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (str) – Metadata split to load.
good_images (str) – Metadata mapping between col/row and directory.
image_times (str) – Metadata mapping between directory and ISO time.
images (Iterable[str]) – List of image products.
labels (Iterable[str]) – List of label products.
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

AssertionError – If images is invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of locations in the dataset.

Returns:: Length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and label at that index.
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – A sample returned by __getitem__().
show_titles (bool) – Flag indicating whether to show titles above each panel.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Return type:

Seasonal Contrast¶

class torchgeo.datasets.SeasonalContrastS2(root='data', version='100k', seasons=1, bands=('B4', 'B3', 'B2'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

Sentinel 2 imagery from the Seasonal Contrast paper.

The Seasonal Contrast imagery dataset contains Sentinel 2 imagery patches sampled from different points in time around the 10k most populated cities on Earth.

Dataset features:

Two versions: 100K and 1M patches
12 band Sentinel 2 imagery from 5 points in time at each location

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/pdf/2103.16607

__init__(root='data', version='100k', seasons=1, bands=('B4', 'B3', 'B2'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new SeasonalContrastS2 instance.

New in version 0.5: The seasons parameter.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
version (str) – one of “100k” or “1m” for the version of the dataset to use
seasons (int) – number of seasonal patches to sample per location, 1–5
bands (Sequence[str]) – list of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if version argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: sample with an “image” in SCxHxW format where S is the number of seasons
Return type:: dict[str, torch.Tensor]

Changed in version 0.5: Image shape changed from 5xCxHxW to SCxHxW

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.
ValueError – if sample contains a “prediction” key

Return type:

New in version 0.2.

SeasoNet¶

class torchgeo.datasets.SeasoNet(root='data', split='train', seasons=frozenset({'Fall', 'Snow', 'Spring', 'Summer', 'Winter'}), bands=('10m_RGB', '10m_IR', '20m', '60m'), grids=[1, 2], concat_seasons=1, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SeasoNet Semantic Segmentation dataset.

The SeasoNet dataset consists of 1,759,830 multi-spectral Sentinel-2 image patches, taken from 519,547 unique locations, covering the whole surface area of Germany. Annotations are provided in the form of pixel-level land cover and land usage segmentation masks from the German land cover model LBM-DE2018 with land cover classes based on the CORINE Land Cover database (CLC) 2018. The set is split into two overlapping grids, consisting of roughly 880,000 samples each, which are shifted by half the patch size in both dimensions. The images in each of the both grids themselves do not overlap.

Dataset format:

images are 16-bit GeoTiffs, split into separate files based on resolution
images include 12 spectral bands with 10, 20 and 60 m per pixel resolutions
masks are single-channel 8-bit GeoTiffs

Dataset classes:

Continuous urban fabric
Discontinuous urban fabric
Industrial or commercial units
Road and rail networks and associated land
Port areas
Airports
Mineral extraction sites
Dump sites
Construction sites
Green urban areas
Sport and leisure facilities
Non-irrigated arable land
Vineyards
Fruit trees and berry plantations
Pastures
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grasslands
Moors and heathland
Transitional woodland/shrub
Beaches, dunes, sands
Bare rock
Sparsely vegetated areas
Inland marshes
Peat bogs
Salt marshes
Intertidal flats
Water courses
Water bodies
Coastal lagoons
Estuaries
Sea and ocean

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/IGARSS46834.2022.9884079

New in version 0.5.

__init__(root='data', split='train', seasons=frozenset({'Fall', 'Snow', 'Spring', 'Summer', 'Winter'}), bands=('10m_RGB', '10m_IR', '20m', '60m'), grids=[1, 2], concat_seasons=1, transforms=None, download=False, checksum=False)[source]¶

Initialize a new SeasoNet dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val” or “test”
seasons (Collection[str]) – list of seasons to load
bands (Iterable[str]) – list of bands to load
grids (Iterable[int]) – which of the overlapping grids to load
concat_seasons (int) – number of seasonal images to return per sample. if 1, each seasonal image is returned as its own sample, otherwise seasonal images are randomly picked from the seasons specified in seasons and returned as stacked tensors
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: sample at that index containing the image with shape SCxHxW and the mask with shape HxW, where S = self.concat_seasons
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, show_legend=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
show_legend (bool) – flag indicating whether to show a legend for the segmentation masks
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

SEN12MS¶

class torchgeo.datasets.SEN12MS(root='data', split='train', bands=('VV', 'VH', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

SEN12MS dataset.

The SEN12MS dataset contains 180,662 patch triplets of corresponding Sentinel-1 dual-pol SAR data, Sentinel-2 multi-spectral images, and MODIS-derived land cover maps. The patches are distributed across the land masses of the Earth and spread over all four meteorological seasons. This is reflected by the dataset structure. All patches are provided in the form of 16-bit GeoTiffs containing the following specific information:

Sentinel-1 SAR: 2 channels corresponding to sigma nought backscatter values in dB scale for VV and VH polarization.
Sentinel-2 Multi-Spectral: 13 channels corresponding to the 13 spectral bands (B1, B2, B3, B4, B5, B6, B7, B8, B8a, B9, B10, B11, B12).
MODIS Land Cover: 4 channels corresponding to IGBP, LCCS Land Cover, LCCS Land Use, and LCCS Surface Hydrology layers.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.5194/isprs-annals-IV-2-W7-153-2019

Note

This dataset can be automatically downloaded using the following bash script:

for season in 1158_spring 1868_summer 1970_fall 2017_winter
do
    for source in lc s1 s2
    do
        wget "ftp://m1474000:m1474000@dataserv.ub.tum.de/ROIs${season}_${source}.tar.gz"
        tar xvzf "ROIs${season}_${source}.tar.gz"
    done
done

for split in train test
do
    wget "https://raw.githubusercontent.com/schmitt-muc/SEN12MS/3a41236a28d08d253ebe2fa1a081e5e32aa7eab4/splits/${split}_list.txt"
done

or manually downloaded from https://dataserv.ub.tum.de/s/m1474000 and https://github.com/schmitt-muc/SEN12MS/tree/master/splits. This download will likely take several hours.

__init__(root='data', split='train', bands=('VV', 'VH', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, checksum=False)[source]¶

Initialize a new SEN12MS dataset instance.

The bands argument allows for the subsetting of bands returned by the dataset. Integers in bands index into a stack of Sentinel 1 and Sentinel 2 imagery. Indices 0 and 1 correspond to the Sentinel 1 imagery where indices 2 through 14 correspond to the Sentinel 2 imagery.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
bands (Sequence[str]) – a sequence of band indices to use where the indices correspond to the array index of combined Sentinel 1 and Sentinel 2
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

New in version 0.2.

SKIPP’D¶

class torchgeo.datasets.SKIPPD(root='data', split='trainval', task='nowcast', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SKy Images and Photovoltaic Power Dataset (SKIPP’D).

The SKIPP’D dataset contains ground-based fish-eye photos of the sky for solar forecasting tasks.

Dataset Format:

.hdf5 file containing images and labels
.npy files with corresponding datetime timestamps

Dataset Features:

fish-eye RGB images (64x64px)
power output measurements from 30-kW rooftop PV array
1-min interval across 3 years (2017-2019)

Nowcast task:

349,372 images under the split key trainval
14,003 images under the split key test

Forecast task:

130,412 images under the split key trainval
2,462 images under the split key test
consists of a concatenated RGB time-series of 16 time-steps

If you use this dataset in your research, please cite:

https://doi.org/10.48550/arXiv.2207.00913

Note

This dataset requires the following additional library to be installed:

https://pypi.org/project/h5py/ to load the dataset

New in version 0.5.

__init__(root='data', split='trainval', task='nowcast', transforms=None, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “trainval”, or “test”
task (str) – one of “nowcast”, or “forecast”
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

AssertionError – if task or split is invalid
DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, str | torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

In the forecast task the latest image is plotted.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

SkyScript¶

class torchgeo.datasets.SkyScript(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SkyScript dataset.

SkyScript is a large and semantically diverse image-text dataset for remote sensing images. It contains 5.2 million remote sensing image-text pairs in total, covering more than 29K distinct semantic tags.

If you use this dataset in your research, please cite it using the following format:

https://arxiv.org/abs/2312.12856

New in version 0.6.

caption_files: ClassVar[dict[str, str]] = {'test': 'SkyScript_test_30K_filtered_by_CLIP_openai.csv', 'train': 'SkyScript_train_top30pct_filtered_by_CLIP_openai.csv', 'val': 'SkyScript_val_5K_filtered_by_CLIP_openai.csv'}¶: Can be modified in subclasses to change train/val/test split

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new SkyScript instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (str) – One of ‘train’, ‘val’, ‘test’.
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

AssertionError – If split is invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of images in the dataset.

Returns:: Length of the dataset.
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: A dict containing image and caption at index.
Return type:: dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

So2Sat¶

class torchgeo.datasets.So2Sat(root='data', version='2', split='train', bands=('S1_B1', 'S1_B2', 'S1_B3', 'S1_B4', 'S1_B5', 'S1_B6', 'S1_B7', 'S1_B8', 'S2_B02', 'S2_B03', 'S2_B04', 'S2_B05', 'S2_B06', 'S2_B07', 'S2_B08', 'S2_B8A', 'S2_B11', 'S2_B12'), transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

So2Sat dataset.

The So2Sat dataset consists of corresponding synthetic aperture radar and multispectral optical image data acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and a corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world, and comes with a variety of different splits.

This implementation covers the 2nd and 3rd versions of the dataset as described in the author’s github repository: https://github.com/zhu-xlab/So2Sat-LCZ42.

The different versions are as follows:

Version 2: This version contains imagery from 52 cities and is split into train/val/test as follows:

Training: 42 cities around the world
Validation: western half of 10 other cities covering 10 cultural zones
Testing: eastern half of the 10 other cities

Version 3: A version of the dataset with 3 different train/test splits, as follows:

Random split: every city 80% training / 20% testing (randomly sampled)
Block split: every city is split in a geospatial 80%/20%-manner
Cultural 10: 10 cities from different cultural zones are held back for testing purposes

Dataset classes:

Compact high rise
Compact middle rise
Compact low rise
Open high rise
Open mid rise
Open low rise
Lightweight low rise
Large low rise
Sparsely built
Heavy industry
Dense trees
Scattered trees
Bush, scrub
Low plants
Bare rock or paved
Bare soil or sand
Water

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/MGRS.2020.2964708

Note

The version 2 dataset can be automatically downloaded using the following bash script:

for split in training validation testing
do
    wget ftp://m1483140:m1483140@dataserv.ub.tum.de/$split.h5
done

or manually downloaded from https://dataserv.ub.tum.de/index.php/s/m1483140 This download will likely take several hours.

The version 3 datasets can be downloaded using the following bash script:

for version in random block culture_10
do
  for split in training testing
  do
    wget -P $version/ ftp://m1613658:m1613658@dataserv.ub.tum.de/$version/$split.h5
  done
done

or manually downloaded from https://mediatum.ub.tum.de/1613658

Note

This dataset requires the following additional library to be installed:

https://pypi.org/project/h5py/ to load the dataset

__init__(root='data', version='2', split='train', bands=('S1_B1', 'S1_B2', 'S1_B3', 'S1_B4', 'S1_B5', 'S1_B6', 'S1_B7', 'S1_B8', 'S2_B02', 'S2_B03', 'S2_B04', 'S2_B05', 'S2_B06', 'S2_B07', 'S2_B08', 'S2_B8A', 'S2_B11', 'S2_B12'), transforms=None, checksum=False)[source]¶

Initialize a new So2Sat dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
version (str) – one of “2”, “3_random”, “3_block”, or “3_culture_10”
split (str) – one of “train”, “validation”, or “test”
bands (Sequence[str]) – a sequence of band names to use where the indices correspond to the array index of combined Sentinel 1 and Sentinel 2
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found.
DependencyNotFoundError – If h5py is not installed.

New in version 0.3: The bands parameter.

New in version 0.5: The version parameter.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

New in version 0.2.

Solar Plants Brazil¶

class torchgeo.datasets.SolarPlantsBrazil(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

Solar Plants Brazil dataset (semantic segmentation for photovoltaic detection).

The Solar Plants Brazil dataset provides satellite imagery and pixel-level annotations for detecting photovoltaic solar power stations.

Dataset features:

272 RGB+NIR GeoTIFF images (256x256 pixels)
Binary masks indicating presence of solar panels (1 = panel, 0 = background)
Organized into train, val, and test splits
Float32 GeoTIFF files for both input and mask images
Spatial metadata included (CRS, bounding box), but not used directly for training

Folder structure:

root/train/input/img(123).tif
root/train/labels/target(123).tif

Access:

Dataset is hosted on Hugging Face: https://huggingface.co/datasets/FederCO23/solar-plants-brazil
Code and preprocessing steps available at: https://github.com/FederCO23/UCSD_MLBootcamp_Capstone

New in version 0.8.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a SolarPlantsBrazil dataset split.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (Literal['train', 'val', 'test']) – dataset split to use, one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
ValueError – If split is invalid.

__getitem__(index)[source]¶

Return the image and mask at the given index.

Parameters:: index (int) – index of the image and mask to return
Returns:: image and mask at given index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: The number of image-mask pairs in the dataset.
Return type:: int

plot(sample, suptitle=None)[source]¶

Plot a sample from the SolarPlantsBrazil dataset.

Parameters:

sample (dict[str, torch.Tensor]) – A dictionary with ‘image’ and ‘mask’ tensors.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered image and mask.

Return type:

SODA¶

class torchgeo.datasets.SODAA(root='data', split='train', bbox_orientation='horizontal', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SODA-A dataset.

The SODA-A dataset is a high resolution aerial imagery dataset for small object detection.

Dataset features:

2513 images
872,069 annotations with oriented bounding boxes
9 classes

Dataset format:

Images are three channel .jpg files.
Annotations are in json files

Classes:

Airplane
Helicopter
Small vehicle
Large vehicle
Ship
Container
Storage tank
Swimming-pool
Windmill
Other

If you use this dataset in your research, please cite the following paper:

https://ieeexplore.ieee.org/document/10168277

New in version 0.7.

__init__(root='data', split='train', bbox_orientation='horizontal', transforms=None, download=False, checksum=False)[source]¶

Initialize a new instance of SODA-A dataset.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (Literal['train', 'val', 'test']) – one of “train”, “val”, or “test”
bbox_orientation (Literal['oriented', 'horizontal']) – one of “oriented” or “horizontal”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split or bbox_orientation argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of samples in the dataset.

__getitem__(idx)[source]¶

Return the sample at the given index.

Parameters:: idx (int) – index of the sample to return
Returns:: the sample at the given index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None, box_alpha=0.7)[source]¶

Plot a sample from the dataset with legend.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
box_alpha (float) – alpha value for boxes

Returns:

a matplotlib Figure with the rendered sample

Return type:

SSL4EO¶

class torchgeo.datasets.SSL4EO[source]¶

Bases: NonGeoDataset

Base class for all SSL4EO datasets.

Self-Supervised Learning for Earth Observation (SSL4EO) is a collection of large-scale multimodal multitemporal datasets for unsupervised/self-supervised pre-training in Earth observation.

New in version 0.5.

class torchgeo.datasets.SSL4EOL(root='data', split='oli_sr', seasons=1, transforms=None, download=False, checksum=False)[source]¶

Bases: SSL4EO

SSL4EO-L dataset.

Landsat version of SSL4EO.

The dataset consists of a parallel corpus (same locations and dates for SR/TOA) for the following sensors:

Split	Satellites	Sensors	Level	# Bands	Link
tm_toa	Landsat 4–5	TM	TOA	7	GEE
etm_sr	Landsat 7	ETM+	SR	6	GEE
etm_toa	Landsat 7	ETM+	TOA	9	GEE
oli_tirs_toa	Landsat 8–9	OLI+TIRS	TOA	11	GEE
oli_sr	Landsat 8–9	OLI	SR	7	GEE

Each patch has the following properties:

264 x 264 pixels
Resampled to 30 m resolution (7920 x 7920 m)
4 seasonal timestamps
Single multispectral GeoTIFF file

Note

Each split is 300–400 GB and requires 3x that to concatenate and extract tarballs. Tarballs can be safely deleted after extraction to save space. The dataset takes about 1.5 hrs to download and checksum and another 3 hrs to extract.

If you use this dataset in your research, please cite the following paper:

https://proceedings.neurips.cc/paper_files/paper/2023/hash/bbf7ee04e2aefec136ecf60e346c2e61-Abstract-Datasets_and_Benchmarks.html

New in version 0.5.

__init__(root='data', split='oli_sr', seasons=1, transforms=None, download=False, checksum=False)[source]¶

Initialize a new SSL4EOL instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (Literal['tm_toa', 'etm_toa', 'etm_sr', 'oli_tirs_toa', 'oli_sr']) – one of [‘tm_toa’, ‘etm_toa’, ‘etm_sr’, ‘oli_tirs_toa’, ‘oli_sr’]
seasons (Literal[1, 2, 3, 4]) – number of seasonal patches to sample per location, 1–4
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: image sample
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

class torchgeo.datasets.SSL4EOS12(root='data', split='s2c', seasons=1, transforms=None, download=False, checksum=False)[source]¶

Bases: SSL4EO

SSL4EO-S12 dataset.

Sentinel-1/2 version of SSL4EO.

The dataset consists of a parallel corpus (same locations and dates) for the following satellites:

Split	Satellite	Level	# Bands	Link
s1	Sentinel-1	GRD	2	GEE
s2c	Sentinel-2	TOA	12	GEE
s2a	Sentinel-2	SR	13	GEE

Each patch has the following properties:

264 x 264 pixels
Resampled to 10 m resolution (2640 x 2640 m)
4 seasonal timestamps

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2211.07044

Note

The dataset is about 1.5 TB when compressed and 3.7 TB when uncompressed.

New in version 0.5.

__init__(root='data', split='s2c', seasons=1, transforms=None, download=False, checksum=False)[source]¶

Initialize a new SSL4EOS12 instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (Literal['s1', 's2c', 's2a']) – one of “s1” (Sentinel-1 GRD dual-pol SAR), “s2c” (Sentinel-2 Level-1C top-of-atmosphere reflectance), or “s2a” (Sentinel-2 Level-2A surface reflectance)
seasons (Literal[1, 2, 3, 4]) – number of seasonal patches to sample per location, 1–4
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

New in version 0.7: The download parameter.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: image sample
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

SSL4EO-L Benchmark¶

class torchgeo.datasets.SSL4EOLBenchmark(root='data', sensor='oli_sr', product='cdl', split='train', classes=None, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SSL4EO Landsat Benchmark Evaluation Dataset.

Dataset is intended to be used for evaluation of SSL techniques. Each benchmark dataset consists of 25,000 images with corresponding land cover classification masks.

Dataset format:

Input landsat image and single channel mask
25,000 total samples split into train, val, test (70%, 15%, 15%)
NLCD dataset version has 17 classes
CDL dataset version has 134 classes

Each patch has the following properties:

264 x 264 pixels
Resampled to 30 m resolution (7920 x 7920 m)
Single multispectral GeoTIFF file

If you use this dataset in your research, please cite the following paper:

https://proceedings.neurips.cc/paper_files/paper/2023/hash/bbf7ee04e2aefec136ecf60e346c2e61-Abstract-Datasets_and_Benchmarks.html

New in version 0.5.

__init__(root='data', sensor='oli_sr', product='cdl', split='train', classes=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new SSL4EO Landsat Benchmark instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
sensor (str) – one of [‘etm_toa’, ‘etm_sr’, ‘oli_tirs_toa, ‘oli_sr’]
product (str) – mask target, one of [‘cdl’, ‘nlcd’]
split (str) – dataset split, one of [‘train’, ‘val’, ‘test’]
classes (list[int] | None) – list of classes to include, the rest will be mapped to 0 (defaults to all classes for the chosen product)
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

AssertionError – if any arguments are invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: image and sample
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

retrieve_sample_collection()[source]¶

Retrieve paths to samples in data directory.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Substation¶

class torchgeo.datasets.Substation(root='data', bands=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), mask_2d=True, num_of_timepoints=4, timepoint_aggregation='concat', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

Substation dataset.

The Substation dataset is curated by TransitionZero and sourced from publicly available data repositories, including OpenSreetMap (OSM) and Copernicus Sentinel data. The dataset consists of Sentinel-2 images from 27k+ locations; the task is to segment power-substations, which appear in the majority of locations in the dataset. Most locations have 4-5 images taken at different timepoints (i.e., revisits).

Dataset Format:

.npz file for each datapoint

Dataset Features:

26,522 image-mask pairs stored as numpy files.
Data from 5 revisits for most locations.
Multi-temporal, multi-spectral images (13 channels) paired with masks, with a spatial resolution of 228x228 pixels

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.48550/arXiv.2409.17363

__init__(root='data', bands=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), mask_2d=True, num_of_timepoints=4, timepoint_aggregation='concat', transforms=None, download=False, checksum=False)[source]¶

Initialize the Substation.

Parameters:

root (str | os.PathLike[str]) – Path to the directory containing the dataset.
bands (Sequence[int]) – Channels to use from the image.
mask_2d (bool) – Whether to use a 2D mask.
num_of_timepoints (int) – Number of timepoints to use for each image.
timepoint_aggregation (Optional[Literal['concat', 'median', 'first', 'random']]) – How to aggregate multiple timepoints.
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A transform takes input sample and returns a transformed version.
download (bool) – Whether to download the dataset if it is not found.
checksum (bool) – Whether to verify the dataset after downloading.

__getitem__(index)[source]¶

Get an item from the dataset by index.

Parameters:: index (int) – Index of the item to retrieve.
Returns:: A dictionary containing the image and corresponding mask.
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Returns the number of items in the dataset.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

A matplotlib Figure containing the rendered sample.

Return type:

SustainBench Crop Yield¶

class torchgeo.datasets.SustainBenchCropYield(root='data', split='train', countries=['usa'], transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

SustainBench Crop Yield Dataset.

This dataset contains MODIS band histograms and soybean yield estimates for selected counties in the USA, Argentina and Brazil. The dataset is part of the SustainBench datasets for tackling the UN Sustainable Development Goals (SDGs).

Dataset Format:

.npz files of stacked samples

Dataset Features:

input histogram of 7 surface reflectance and 2 surface temperature bands from MODIS pixel values in 32 ranges across 32 timesteps resulting in 32x32x9 input images
regression target value of soybean yield in metric tonnes per harvested hectare

If you use this dataset in your research, please cite:

New in version 0.5.

__init__(root='data', split='train', countries=['usa'], transforms=None, download=False, checksum=False)[source]¶

Initialize a new Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “dev”, or “test”
countries (list[str]) – which countries to include in the dataset
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

AssertionError – if countries contains invalid countries or if split is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, band_idx=0, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
band_idx (int) – which of the nine histograms to index
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

TreeSatAI¶

class torchgeo.datasets.TreeSatAI(root='data', split='train', sensors=('aerial', 's1', 's2'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

TreeSatAI Benchmark Archive.

TreeSatAI Benchmark Archive is a multi-sensor, multi-label dataset for tree species classification in remote sensing. It was created by combining labels from the federal forest inventory of Lower Saxony, Germany with 20 cm Color-Infrared (CIR) and 10 m Sentinel imagery.

The TreeSatAI Benchmark Archive contains:

50,381 image triplets (aerial, Sentinel-1, Sentinel-2)
synchronized time steps and locations
all original spectral bands/polarizations from the sensors
20 species classes (single labels)
12 age classes (single labels)
15 genus classes (multi labels)
60 m and 200 m patches
fixed split for train (90%) and test (10%) data
additional single labels such as English species name, genus, forest stand type, foliage type, land cover

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.5194/essd-15-681-2023

New in version 0.7.

__init__(root='data', split='train', sensors=('aerial', 's1', 's2'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new TreeSatAI instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (str) – Either ‘train’ or ‘test’.
sensors (Sequence[str]) – One or more of ‘aerial’, ‘s1’, and/or ‘s2’.
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

AssertionError – If invalid sensors are chosen.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: Length of the dataset.
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and label at that index.
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – A sample returned by __getitem__().
show_titles (bool) – Flag indicating whether to show titles above each panel.

Returns:

A matplotlib Figure with the rendered sample.

Return type:

Tropical Cyclone¶

class torchgeo.datasets.TropicalCyclone(root='data', split='train', transforms=None, download=False)[source]¶

Bases: NonGeoDataset

Tropical Cyclone Wind Estimation Competition dataset.

A collection of tropical storms in the Atlantic and East Pacific Oceans from 2000 to 2019 with corresponding maximum sustained surface wind speed. This dataset is split into training and test categories for the purpose of a competition. Read more about the competition here: https://www.drivendata.org/competitions/72/predict-wind-speeds/.

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1109/JSTARS.2020.3011907

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

Changed in version 0.4: Class name changed from TropicalCycloneWindEstimation to TropicalCyclone to be consistent with TropicalCycloneDataModule.

__init__(root='data', split='train', transforms=None, download=False)[source]¶

Initialize a new TropicalCyclone instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data, labels, field ids, and metadata at that index
Return type:: dict[str, Any]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, Any]) – a sample return by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

UC Merced¶

class torchgeo.datasets.UCMerced(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

UC Merced Land Use dataset.

The UC Merced Land Use dataset is a land use classification dataset of 2.1k 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection with 21 land use classes (100 images per class).

Dataset features:

land use class labels from around the U.S.
three spectral bands - RGB
21 classes

Dataset classes:

agricultural
airplane
baseballdiamond
beach
buildings
chaparral
denseresidential
forest
freeway
golfcourse
harbor
intersection
mediumresidential
mobilehomepark
overpass
parkinglot
river
runway
sparseresidential
storagetanks
tenniscourt

This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:

https://arxiv.org/abs/1911.06721

If you use this dataset in your research, please cite the following paper:

https://dl.acm.org/doi/10.1145/1869790.1869829

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶

Initialize a new UC Merced dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

USAVars¶

class torchgeo.datasets.USAVars(root='data', split='train', labels=('treecover', 'elevation', 'population'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

USAVars dataset.

The USAVars dataset is reproduction of the dataset used in the paper “A generalizable and accessible approach to machine learning with global satellite imagery”. Specifically, this dataset includes 1 sq km. crops of NAIP imagery resampled to 4m/px cenetered on ~100k points that are sampled randomly from the contiguous states in the USA. Each point contains three continuous valued labels (taken from the dataset released in the paper): tree cover percentage, elevation, and population density.

Dataset format:

images are 4-channel GeoTIFFs
labels are singular float values

Dataset labels:

tree cover
elevation
population density

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1038/s41467-021-24638-z

New in version 0.3.

__init__(root='data', split='train', labels=('treecover', 'elevation', 'population'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new USAVars dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – train/val/test split to load
labels (Sequence[str]) – list of labels to include
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if invalid labels are provided
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_labels=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_labels (bool) – flag indicating whether to show labels above panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Vaihingen¶

class torchgeo.datasets.Vaihingen2D(root='data', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

Vaihingen 2D Semantic Segmentation dataset.

The Vaihingen dataset is a dataset for urban semantic segmentation used in the 2D Semantic Labeling Contest - Vaihingen. This dataset uses the “ISPRS_semantic_labeling_Vaihingen.zip” and “ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip” files to create the train/test sets used in the challenge. The dataset can be downloaded from here. Note, the server contains additional data for 3D Semantic Labeling which are currently not supported.

Dataset format:

images are 3-channel RGB geotiffs
masks are 3-channel geotiffs with unique RGB values representing the class

Dataset classes:

Clutter/background
Impervious surfaces
Building
Low Vegetation
Tree
Car

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.5194/isprsannals-I-3-293-2012

New in version 0.2.

__init__(root='data', split='train', transforms=None, checksum=False)[source]¶

Initialize a new Vaihingen2D dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If split is invalid.
DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery

Returns:

a matplotlib Figure with the rendered sample

Return type:

VHR-10¶

class torchgeo.datasets.VHR10(root='data', split='positive', transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

NWPU VHR-10 dataset.

Northwestern Polytechnical University (NWPU) very-high-resolution ten-class (VHR-10) remote sensing image dataset.

Consists of 800 VHR optical remote sensing images, where 715 color images were acquired from Google Earth with the spatial resolution ranging from 0.5 to 2 m, and 85 pansharpened color infrared (CIR) images were acquired from Vaihingen data with a spatial resolution of 0.08 m.

The data set is divided into two sets:

Positive image set (650 images) which contains at least one target in an image
Negative image set (150 images) does not contain any targets

The positive image set consists of objects from ten classes:

Airplanes (757)
Ships (302)
Storage tanks (655)
Baseball diamonds (390)
Tennis courts (524)
Basketball courts (159)
Ground track fields (163)
Harbors (224)
Bridges (124)
Vehicles (477)

Includes object detection bounding boxes from original paper and instance segmentation masks from follow-up publications. If you use this dataset in your research, please cite the following papers:

Note

This dataset requires the following additional library to be installed:

pycocotools to load the annotations.json file for the “positive” image set

__init__(root='data', split='positive', transforms=None, download=False, checksum=False)[source]¶

Initialize a new VHR-10 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “positive” or “negative”
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – if split argument is invalid
DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – if split="positive" and pycocotools is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, Any]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, show_feats='both', box_alpha=0.7, mask_alpha=0.7)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
suptitle (str | None) – optional string to use as a suptitle
show_titles (bool) – flag indicating whether to show titles above each panel
show_feats (str | None) – optional string to pick features to be shown: boxes, masks, both
box_alpha (float) – alpha value of box
mask_alpha (float) – alpha value of mask

Returns:

a matplotlib Figure with the rendered sample

Raises:

AssertionError – if show_feats argument is invalid
DependencyNotFoundError – If plotting masks and scikit-image is not installed.

Return type:

New in version 0.4.

Western USA Live Fuel Moisture¶

class torchgeo.datasets.WesternUSALiveFuelMoisture(root='data', input_features=('slope(t)', 'elevation(t)', 'canopy_height(t)', 'forest_cover(t)', 'silt(t)', 'sand(t)', 'clay(t)', 'vv(t)', 'vh(t)', 'red(t)', 'green(t)', 'blue(t)', 'swir(t)', 'nir(t)', 'ndvi(t)', 'ndwi(t)', 'nirv(t)', 'vv_red(t)', 'vv_green(t)', 'vv_blue(t)', 'vv_swir(t)', 'vv_nir(t)', 'vv_ndvi(t)', 'vv_ndwi(t)', 'vv_nirv(t)', 'vh_red(t)', 'vh_green(t)', 'vh_blue(t)', 'vh_swir(t)', 'vh_nir(t)', 'vh_ndvi(t)', 'vh_ndwi(t)', 'vh_nirv(t)', 'vh_vv(t)', 'slope(t-1)', 'elevation(t-1)', 'canopy_height(t-1)', 'forest_cover(t-1)', 'silt(t-1)', 'sand(t-1)', 'clay(t-1)', 'vv(t-1)', 'vh(t-1)', 'red(t-1)', 'green(t-1)', 'blue(t-1)', 'swir(t-1)', 'nir(t-1)', 'ndvi(t-1)', 'ndwi(t-1)', 'nirv(t-1)', 'vv_red(t-1)', 'vv_green(t-1)', 'vv_blue(t-1)', 'vv_swir(t-1)', 'vv_nir(t-1)', 'vv_ndvi(t-1)', 'vv_ndwi(t-1)', 'vv_nirv(t-1)', 'vh_red(t-1)', 'vh_green(t-1)', 'vh_blue(t-1)', 'vh_swir(t-1)', 'vh_nir(t-1)', 'vh_ndvi(t-1)', 'vh_ndwi(t-1)', 'vh_nirv(t-1)', 'vh_vv(t-1)', 'slope(t-2)', 'elevation(t-2)', 'canopy_height(t-2)', 'forest_cover(t-2)', 'silt(t-2)', 'sand(t-2)', 'clay(t-2)', 'vv(t-2)', 'vh(t-2)', 'red(t-2)', 'green(t-2)', 'blue(t-2)', 'swir(t-2)', 'nir(t-2)', 'ndvi(t-2)', 'ndwi(t-2)', 'nirv(t-2)', 'vv_red(t-2)', 'vv_green(t-2)', 'vv_blue(t-2)', 'vv_swir(t-2)', 'vv_nir(t-2)', 'vv_ndvi(t-2)', 'vv_ndwi(t-2)', 'vv_nirv(t-2)', 'vh_red(t-2)', 'vh_green(t-2)', 'vh_blue(t-2)', 'vh_swir(t-2)', 'vh_nir(t-2)', 'vh_ndvi(t-2)', 'vh_ndwi(t-2)', 'vh_nirv(t-2)', 'vh_vv(t-2)', 'slope(t-3)', 'elevation(t-3)', 'canopy_height(t-3)', 'forest_cover(t-3)', 'silt(t-3)', 'sand(t-3)', 'clay(t-3)', 'vv(t-3)', 'vh(t-3)', 'red(t-3)', 'green(t-3)', 'blue(t-3)', 'swir(t-3)', 'nir(t-3)', 'ndvi(t-3)', 'ndwi(t-3)', 'nirv(t-3)', 'vv_red(t-3)', 'vv_green(t-3)', 'vv_blue(t-3)', 'vv_swir(t-3)', 'vv_nir(t-3)', 'vv_ndvi(t-3)', 'vv_ndwi(t-3)', 'vv_nirv(t-3)', 'vh_red(t-3)', 'vh_green(t-3)', 'vh_blue(t-3)', 'vh_swir(t-3)', 'vh_nir(t-3)', 'vh_ndvi(t-3)', 'vh_ndwi(t-3)', 'vh_nirv(t-3)', 'vh_vv(t-3)', 'lat', 'lon'), transforms=None, download=False)[source]¶

Bases: NonGeoDataset

Western USA Live Fuel Moisture Dataset.

This tabular style dataset contains fuel moisture (mass of water in vegetation) and remotely sensed variables in the western United States. It contains 2615 datapoints and 138 variables. For more details see the dataset page.

Dataset Format:

.geojson file for each datapoint

Dataset Features:

138 remote sensing derived variables, some with a time dependency
2615 datapoints with regression target of predicting fuel moisture

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1016/j.rse.2020.111797

Note

This dataset requires the following additional library to be installed:

azcopy: to download the dataset from Source Cooperative.

New in version 0.5.

__init__(root='data', input_features=('slope(t)', 'elevation(t)', 'canopy_height(t)', 'forest_cover(t)', 'silt(t)', 'sand(t)', 'clay(t)', 'vv(t)', 'vh(t)', 'red(t)', 'green(t)', 'blue(t)', 'swir(t)', 'nir(t)', 'ndvi(t)', 'ndwi(t)', 'nirv(t)', 'vv_red(t)', 'vv_green(t)', 'vv_blue(t)', 'vv_swir(t)', 'vv_nir(t)', 'vv_ndvi(t)', 'vv_ndwi(t)', 'vv_nirv(t)', 'vh_red(t)', 'vh_green(t)', 'vh_blue(t)', 'vh_swir(t)', 'vh_nir(t)', 'vh_ndvi(t)', 'vh_ndwi(t)', 'vh_nirv(t)', 'vh_vv(t)', 'slope(t-1)', 'elevation(t-1)', 'canopy_height(t-1)', 'forest_cover(t-1)', 'silt(t-1)', 'sand(t-1)', 'clay(t-1)', 'vv(t-1)', 'vh(t-1)', 'red(t-1)', 'green(t-1)', 'blue(t-1)', 'swir(t-1)', 'nir(t-1)', 'ndvi(t-1)', 'ndwi(t-1)', 'nirv(t-1)', 'vv_red(t-1)', 'vv_green(t-1)', 'vv_blue(t-1)', 'vv_swir(t-1)', 'vv_nir(t-1)', 'vv_ndvi(t-1)', 'vv_ndwi(t-1)', 'vv_nirv(t-1)', 'vh_red(t-1)', 'vh_green(t-1)', 'vh_blue(t-1)', 'vh_swir(t-1)', 'vh_nir(t-1)', 'vh_ndvi(t-1)', 'vh_ndwi(t-1)', 'vh_nirv(t-1)', 'vh_vv(t-1)', 'slope(t-2)', 'elevation(t-2)', 'canopy_height(t-2)', 'forest_cover(t-2)', 'silt(t-2)', 'sand(t-2)', 'clay(t-2)', 'vv(t-2)', 'vh(t-2)', 'red(t-2)', 'green(t-2)', 'blue(t-2)', 'swir(t-2)', 'nir(t-2)', 'ndvi(t-2)', 'ndwi(t-2)', 'nirv(t-2)', 'vv_red(t-2)', 'vv_green(t-2)', 'vv_blue(t-2)', 'vv_swir(t-2)', 'vv_nir(t-2)', 'vv_ndvi(t-2)', 'vv_ndwi(t-2)', 'vv_nirv(t-2)', 'vh_red(t-2)', 'vh_green(t-2)', 'vh_blue(t-2)', 'vh_swir(t-2)', 'vh_nir(t-2)', 'vh_ndvi(t-2)', 'vh_ndwi(t-2)', 'vh_nirv(t-2)', 'vh_vv(t-2)', 'slope(t-3)', 'elevation(t-3)', 'canopy_height(t-3)', 'forest_cover(t-3)', 'silt(t-3)', 'sand(t-3)', 'clay(t-3)', 'vv(t-3)', 'vh(t-3)', 'red(t-3)', 'green(t-3)', 'blue(t-3)', 'swir(t-3)', 'nir(t-3)', 'ndvi(t-3)', 'ndwi(t-3)', 'nirv(t-3)', 'vv_red(t-3)', 'vv_green(t-3)', 'vv_blue(t-3)', 'vv_swir(t-3)', 'vv_nir(t-3)', 'vv_ndvi(t-3)', 'vv_ndwi(t-3)', 'vv_nirv(t-3)', 'vh_red(t-3)', 'vh_green(t-3)', 'vh_blue(t-3)', 'vh_swir(t-3)', 'vh_nir(t-3)', 'vh_ndvi(t-3)', 'vh_ndwi(t-3)', 'vh_nirv(t-3)', 'vh_vv(t-3)', 'lat', 'lon'), transforms=None, download=False)[source]¶

Initialize a new Western USA Live Fuel Moisture Dataset.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
input_features (Iterable[str]) – which input features to include
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory

Raises:

AssertionError – if input_features contains invalid variable names
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: input features and target at that index
Return type:: dict[str, Any]

xView2¶

class torchgeo.datasets.XView2(root='data', split='train', transforms=None, checksum=False)[source]¶

Bases: NonGeoDataset

xView2 dataset.

The xView2 dataset is a dataset for building disaster change detection. This dataset object uses the “Challenge training set (~7.8 GB)” and “Challenge test set (~2.6 GB)” data from the xView2 website as the train and test splits. Note, the xView2 website contains other data under the xView2 umbrella that are _not_ included here. E.g. the “Tier3 training data”, the “Challenge holdout set”, and the “full data”.

Dataset format:

images are three-channel pngs
masks are single-channel pngs where the pixel values represent the class

Dataset classes:

background
no damage
minor damage
major damage
destroyed

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/1911.09296

New in version 0.2.

__init__(root='data', split='train', transforms=None, checksum=False)[source]¶

Initialize a new xView2 dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If split is invalid.
DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery

Returns:

a matplotlib Figure with the rendered sample

Return type:

ZueriCrop¶

class torchgeo.datasets.ZueriCrop(root='data', bands=('NIR', 'B03', 'B02', 'B04', 'B05', 'B06', 'B07', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset

ZueriCrop dataset.

The ZueriCrop dataset is a dataset for time-series instance segmentation of crops.

Dataset features:

Sentinel-2 multispectral imagery
instance masks of 48 crop categories
nine multispectral bands
116k images with 10 m per pixel resolution (24x24 px)
~28k time-series containing 142 images each

Dataset format:

single hdf5 dataset containing images, semantic masks, and instance masks
data is parsed into images and instance masks, boxes, and labels
one mask per time-series

Dataset classes:

48 fine-grained hierarchical crop categories

If you use this dataset in your research, please cite the following paper:

https://doi.org/10.1016/j.rse.2021.112603

Note

This dataset requires the following additional library to be installed:

h5py to load the dataset

__init__(root='data', bands=('NIR', 'B03', 'B02', 'B04', 'B05', 'B06', 'B07', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶

Initialize a new ZueriCrop dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
bands (Sequence[str]) – the subset of bands to load
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.
DependencyNotFoundError – If h5py is not installed.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: sample containing image, mask, bounding boxes, and target label
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

plot(sample, time_step=0, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
time_step (int) – time step at which to access image, beginning with 0
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

New in version 0.2.

Copernicus-Bench¶

Copernicus-Bench is a comprehensive evaluation benchmark with 15 downstream tasks hierarchically organized across preprocessing (e.g., cloud removal), base applications (e.g., land cover classification), and specialized applications (e.g., air quality estimation). This benchmark enables systematic assessment of foundation model performances across various Sentinel missions on different levels of practical applications.

C = classification, R = regression, S = semantic segmentation, T = time series, CD = change detection¶
Level	Dataset	Task	Source	License	# Samples	# Classes	Size (px)	Resolution (m)	Bands
L1	Cloud-S2	S	Sentinel-2	CC0-1.0	2,817	4	512x512	10	MSI
L1	Cloud-S3	S	Sentinel-3	CC-BY-4.0	1,995	5	256x256	300	MSI
L2	EuroSAT-S1	C	Sentinel-1	CC-BY-4.0	27,000	10	64x64	10	SAR
L2	EuroSAT-S2	C	Sentinel-2	MIT	27,000	10	64x64	10	SAR
L2	BigEarthNet-S1	C	Sentinel-1	CDLA-Permissive-1.0	24,002	19	120x120	10	SAR
L2	BigEarthNet-S2	C	Sentinel-2	CDLA-Permissive-1.0	24,002	19	120x120	10	MSI
L2	LC100Cls-S3	C	Sentinel-3	CC-BY-4.0	8,635	23	96x96	300	MSI
L2	LC100Seg-S3	S	Sentinel-3	CC-BY-4.0	8,635	23	96x96	300	MSI
L2	DFC2020-S1	S	Sentinel-1	CC-BY-4.0	5,128	10	256x256	10	SAR
L2	DFC2020-S2	S	Sentinel-2	CC-BY-4.0	5,128	10	256x256	10	MSI
L3	Flood-S1	CD	Sentinel-1	MIT	5,000	3	224x224	10	SAR
L3	LCZ-S2	C	Sentinel-2	CC-BY-4.0	25,000	17	32x32	10	MSI
L3	Biomass-S3	R	Sentinel-3	CC-BY-4.0	5,000		96x96	300	MSI
L3	AQ-NO2-S5P	R	Sentinel-5P	CC-BY-4.0	2,467		56x56	1,000
L3	AQ-O3-S5P	R	Sentinel-5P	CC-BY-4.0	2,467		56x56	1,000

class torchgeo.datasets.CopernicusBench(name, *args, **kwargs)[source]¶

Bases: NonGeoDataset

Copernicus-Bench datasets.

This wrapper supports dynamically loading datasets in Copernicus-Bench.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2503.11849

New in version 0.7.

__init__(name, *args, **kwargs)[source]¶

Initialize a new CopernicusBench instance.

Parameters:

name (Literal['cloud_s2', 'cloud_s3', 'eurosat_s1', 'eurosat_s2', 'bigearthnet_s1', 'bigearthnet_s2', 'lc100cls_s3', 'lc100seg_s3', 'dfc2020_s1', 'dfc2020_s2', 'flood_s1', 'lcz_s2', 'biomass_s3', 'aq_no2_s5p', 'aq_o3_s5p']) – Name of the dataset to load.
*args (Any) – Arguments to pass to dataset class.
**kwargs (Any) – Keyword arguments to pass to dataset class.

__len__()[source]¶

Return the length of the dataset.

Returns:: Length of the dataset.
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

__getattr__(name)[source]¶

Wrapper around dataset object.

class torchgeo.datasets.CopernicusBenchBase(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset, ABC

Abstract base class for all Copernicus-Bench datasets.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2503.11849

New in version 0.7.

abstract property url: str¶: Download URL.

md5: str¶: MD5 checksum.

zipfile: str¶: Zip file name.

directory: str¶: Subdirectory containing split files.

filename = '{}.csv'¶: Filename format of split files.

dtype: dtype = torch.int64¶: Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = '.*'¶: Regular expression used to extract date from filename.

date_format = '%Y%m%dT%H%M%S'¶: Date format string used to parse date from filename.

abstract property all_bands: tuple[str, ...]¶: All spectral channels.

abstract property rgb_bands: tuple[str, ...]¶: Spectral channels used to make RGB plots.

cmap: str | matplotlib.colors.Colormap¶: Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...]¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBase instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the length of the dataset.

Returns:: Length of the dataset.
Return type:: int

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – A sample returned by NonGeoDataset.__getitem__().
show_titles (bool) – Flag indicating whether to show titles above each panel.
suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

class torchgeo.datasets.CopernicusBenchCloudS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench Cloud-S2 dataset.

Cloud-S2 is a multi-class cloud segmentation dataset derived from CloudSEN12+, one of the largest Sentinel-2 cloud and cloud shadow detection datasets with expert-labeled pixels. We take 25% samples with high-quality labels, and split them into 1699/567/551 train/val/test subsets.

Classes¶
Code	Class	Description
0	Clear	Pixels without cloud and cloud shadow contamination.
1	Thick Cloud	Opaque clouds that block all the reflected light from the Earth’s surface.
2	Thin Cloud	Semitransparent clouds that alter the surface spectral signal but still allow to recognize the background. This is the hardest class to identify.
3	Cloud Shadow	Dark pixels where light is occluded by thick or thin clouds.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '39a1f966e76455549a3e6c209ba751c1'¶: MD5 checksum.

zipfile: str = 'cloud_s2.zip'¶: Zip file name.

directory: str = 'cloud_s2'¶: Subdirectory containing split files.

filename_regex = 'ROI_\\d{5}__(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

cmap: str | matplotlib.colors.Colormap = <matplotlib.colors.ListedColormap object>¶: Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...] = ('Clear', 'Thick Cloud', 'Thin Cloud', 'Cloud Shadow')¶: List of classes for classification, semantic segmentation, and change detection.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchCloudS3(root='data', split='train', mode='multi', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench Cloud-S3 dataset.

Cloud-S3 is a cloud segmentation dataset with raw images from Sentinel-3 OLCI and labels from the IdePix classification algorithm.

This dataset has two modes:

Multiclass Classification¶
Code	Class	Description
0	Invalid	Invalid pixels, should be ignored during training.
1	Clear	Land, coastline, or water pixels.
2	Cloud-Ambiguous	Semi-transparent clouds, or clouds where the detection level is uncertain.
3	Cloud-Sure	Fully-opaque clouds with full confidence of their detection.
4	Cloud Shadow	Pixels are affected by a cloud shadow.
5	Snow/Ice	Clear snow/ice pixels.

Binary Classification¶
Code	Class	Description
0	Invalid	Invalid pixels, should be ignored during training.
1	Clear	Land, coastline, water, snow, or ice pixels.
2	Cloud	Pixels which are either cloud-sure or cloud-ambiguous.

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2503.11849

New in version 0.7.

md5: str = '1f82a8ccf16a0c44f0b1729e523e343a'¶: MD5 checksum.

zipfile: str = 'cloud_s3.zip'¶: Zip file name.

directory: str = 'cloud_s3'¶: Subdirectory containing split files.

filename_regex = 'S3[AB]_OL_1_EFR____(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

__init__(root='data', split='train', mode='multi', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBase instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal['binary', 'multi']) – One of ‘binary’ or ‘multi’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

classes: tuple[str, ...] = ('Invalid', 'Clear', 'Cloud-Ambiguous', 'Cloud-Sure', 'Cloud Shadow', 'Snow/Ice')¶: List of classes for classification, semantic segmentation, and change detection.

cmap: str | matplotlib.colors.Colormap = <matplotlib.colors.ListedColormap object>¶: Matplotlib color map for semantic segmentation and change detection plots.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchEuroSATS1(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench EuroSAT-S1 dataset.

EuroSAT-S1 is a multi-class land use/land cover classification dataset, and is functionally identical to EuroSAT-SAR.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = 'e7e7f8fc68fc55a7a689cb654912ff3f'¶: MD5 checksum.

zipfile: str = 'eurosat_s1.zip'¶: Zip file name.

directory: str = 'eurosat_s1'¶: Subdirectory containing split files.

filename = 'eurosat-{}.txt'¶: Filename format of split files.

classes: tuple[str, ...] = ('AnnualCrop', 'HerbaceousVegetation', 'Industrial', 'PermanentCrop', 'River', 'Forest', 'Highway', 'Pasture', 'Residential', 'SeaLake')¶: List of classes for classification, semantic segmentation, and change detection.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchEuroSATS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench EuroSAT-S2 dataset.

EuroSAT-S2 is a multi-class land use/land cover classification dataset, and is functionally identical to EuroSAT-MS.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = 'b2be02ca9767554c717f2e9bd15bbd23'¶: MD5 checksum.

zipfile: str = 'eurosat_s2.zip'¶: Zip file name.

directory: str = 'eurosat_s2'¶: Subdirectory containing split files.

filename = 'eurosat-{}.txt'¶: Filename format of split files.

classes: tuple[str, ...] = ('AnnualCrop', 'HerbaceousVegetation', 'Industrial', 'PermanentCrop', 'River', 'Forest', 'Highway', 'Pasture', 'Residential', 'SeaLake')¶: List of classes for classification, semantic segmentation, and change detection.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchBigEarthNetS1(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench BigEarthNet-S1 dataset.

BigEarthNet-S1 is a multilabel land use/land cover classification dataset composed of 5% of the Sentinel-1 data of BigEarthNet-v2.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '269355db0449e0da7213c95f30c346d4'¶: MD5 checksum.

zipfile: str = 'bigearthnetv2.zip'¶: Zip file name.

directory: str = 'bigearthnet_s1s2'¶: Subdirectory containing split files.

filename = 'multilabel-{}.csv'¶: Filename format of split files.

filename_regex = '.{16}_(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

classes: tuple[str, ...] = ('Urban fabric', 'Industrial or commercial units', 'Arable land', 'Permanent crops', 'Pastures', 'Complex cultivation patterns', 'Land principally occupied by agriculture, with significant areas of natural vegetation', 'Agro-forestry areas', 'Broad-leaved forest', 'Coniferous forest', 'Mixed forest', 'Natural grassland and sparsely vegetated areas', 'Moors, heathland and sclerophyllous vegetation', 'Transitional woodland, shrub', 'Beaches, dunes, sands', 'Inland wetlands', 'Coastal wetlands', 'Inland waters', 'Marine waters')¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBigEarthNetS1 instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchBigEarthNetS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench BigEarthNet-S2 dataset.

BigEarthNet-S2 is a multilabel land use/land cover classification dataset composed of 5% of the Sentinel-2 data of BigEarthNet-v2.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '269355db0449e0da7213c95f30c346d4'¶: MD5 checksum.

zipfile: str = 'bigearthnetv2.zip'¶: Zip file name.

directory: str = 'bigearthnet_s1s2'¶: Subdirectory containing split files.

filename = 'multilabel-{}.csv'¶: Filename format of split files.

filename_regex = '.{10}_(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

classes: tuple[str, ...] = ('Urban fabric', 'Industrial or commercial units', 'Arable land', 'Permanent crops', 'Pastures', 'Complex cultivation patterns', 'Land principally occupied by agriculture, with significant areas of natural vegetation', 'Agro-forestry areas', 'Broad-leaved forest', 'Coniferous forest', 'Mixed forest', 'Natural grassland and sparsely vegetated areas', 'Moors, heathland and sclerophyllous vegetation', 'Transitional woodland, shrub', 'Beaches, dunes, sands', 'Inland wetlands', 'Coastal wetlands', 'Inland waters', 'Marine waters')¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBigEarthNetS2 instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchLC100ClsS3(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench LC100Cls-S3 dataset.

LC100Cls-S3 is a multilabel land use/land cover classification dataset based on Sentinel-3 OLCI images and CGLS-LC100 land cover maps. CGLS-LC100 is a product in the Copernicus Global Land Service (CGLS) portfolio and delivers a global 23-class land cover map at 100 m spatial resolution.

This benchmark supports both static (1 image/location) and time series (1-4 images/location) modes, the former is used in the original benchmark.

Classes¶
Value	Description
0	Unknown. No or not enough satellite data available.
20	Shrubs. Woody perennial plants with persistent and woody stems and without any defined main stem being less than 5 m tall. The shrub foliage can be either evergreen or deciduous.
30	Herbaceous vegetation. Plants without persistent stem or shoots above ground and lacking definite firm structure. Tree and shrub cover is less than 10 %.
40	Cultivated and managed vegetation / agriculture. Lands covered with temporary crops followed by harvest and a bare soil period (e.g., single and multiple cropping systems). Note that perennial woody crops will be classified as the appropriate forest or shrub land cover type.
50	Urban / built up. Land covered by buildings and other man-made structures.
60	Bare / sparse vegetation. Lands with exposed soil, sand, or rocks and never has more than 10 % vegetated cover during any time of the year.
70	Snow and ice. Lands under snow or ice cover throughout the year.
80	Permanent water bodies. Lakes, reservoirs, and rivers. Can be either fresh or salt-water bodies.
90	Herbaceous wetland. Lands with a permanent mixture of water and herbaceous or woody vegetation. The vegetation can be present in either salt, brackish, or fresh water.
100	Moss and lichen.
111	Closed forest, evergreen needle leaf. Tree canopy >70 %, almost all needle leaf trees remain green all year. Canopy is never without green foliage.
112	Closed forest, evergreen broad leaf. Tree canopy >70 %, almost all broadleaf trees remain green year round. Canopy is never without green foliage.
113	Closed forest, deciduous needle leaf. Tree canopy >70 %, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.
114	Closed forest, deciduous broad leaf. Tree canopy >70 %, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.
115	Closed forest, mixed.
116	Closed forest, not matching any of the other definitions.
121	Open forest, evergreen needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all needle leaf trees remain green all year. Canopy is never without green foliage.
122	Open forest, evergreen broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all broadleaf trees remain green year round. Canopy is never without green foliage.
123	Open forest, deciduous needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.
124	Open forest, deciduous broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.
125	Open forest, mixed.
126	Open forest, not matching any of the other definitions.
200	Oceans, seas. Can be either fresh or salt-water bodies.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '967d1da6286e0d0e346e425a8f3800e9'¶: MD5 checksum.

zipfile: str = 'lc100_s3.zip'¶: Zip file name.

filename = 'multilabel-{}.csv'¶: Filename format of split files.

directory: str = 'lc100_s3'¶: Subdirectory containing split files.

filename_regex = 'S3[AB]_(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

classes: tuple[str, ...] = ('Unknown', 'Shrubs', 'Herbaceous vegetation', 'Cultivated and managed vegetation / agriculture', 'Urban / built up', 'Bare / sparse vegetation', 'Snow and ice', 'Permanent water bodies', 'Herbaceous wetland', 'Moss and lichen', 'Closed forest, evergreen needle leaf', 'Closed forest, evergreen broad leaf', 'Closed forest, deciduous needle leaf', 'Closed forest, deciduous broad leaf', 'Closed forest, mixed', 'Closed forest, not matching any of the other definitions', 'Open forest, evergreen needle leaf', 'Open forest, evergreen broad leaf', 'Open forest, deciduous needle leaf', 'Open forest, deciduous broad leaf', 'Open forest, mixed', 'Open forest, not matching any of the other definitions', 'Oceans, seas')¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchLC100ClsS3 instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal['static', 'time-series']) – One of ‘static’ or ‘time-series’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchLC100SegS3(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench LC100Seg-S3 dataset.

LC100Seg-S3 is a multilabel land use/land cover segmentation dataset based on Sentinel-3 OLCI images and CGLS-LC100 land cover maps. CGLS-LC100 is a product in the Copernicus Global Land Service (CGLS) portfolio and delivers a global 23-class land cover map at 100 m spatial resolution.

This benchmark supports both static (1 image/location) and time series (1-4 images/location) modes, the former is used in the original benchmark.

Classes¶
Value	Description
0	Unknown. No or not enough satellite data available.
20	Shrubs. Woody perennial plants with persistent and woody stems and without any defined main stem being less than 5 m tall. The shrub foliage can be either evergreen or deciduous.
30	Herbaceous vegetation. Plants without persistent stem or shoots above ground and lacking definite firm structure. Tree and shrub cover is less than 10 %.
40	Cultivated and managed vegetation / agriculture. Lands covered with temporary crops followed by harvest and a bare soil period (e.g., single and multiple cropping systems). Note that perennial woody crops will be classified as the appropriate forest or shrub land cover type.
50	Urban / built up. Land covered by buildings and other man-made structures.
60	Bare / sparse vegetation. Lands with exposed soil, sand, or rocks and never has more than 10 % vegetated cover during any time of the year.
70	Snow and ice. Lands under snow or ice cover throughout the year.
80	Permanent water bodies. Lakes, reservoirs, and rivers. Can be either fresh or salt-water bodies.
90	Herbaceous wetland. Lands with a permanent mixture of water and herbaceous or woody vegetation. The vegetation can be present in either salt, brackish, or fresh water.
100	Moss and lichen.
111	Closed forest, evergreen needle leaf. Tree canopy >70 %, almost all needle leaf trees remain green all year. Canopy is never without green foliage.
112	Closed forest, evergreen broad leaf. Tree canopy >70 %, almost all broadleaf trees remain green year round. Canopy is never without green foliage.
113	Closed forest, deciduous needle leaf. Tree canopy >70 %, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.
114	Closed forest, deciduous broad leaf. Tree canopy >70 %, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.
115	Closed forest, mixed.
116	Closed forest, not matching any of the other definitions.
121	Open forest, evergreen needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all needle leaf trees remain green all year. Canopy is never without green foliage.
122	Open forest, evergreen broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all broadleaf trees remain green year round. Canopy is never without green foliage.
123	Open forest, deciduous needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.
124	Open forest, deciduous broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.
125	Open forest, mixed.
126	Open forest, not matching any of the other definitions.
200	Oceans, seas. Can be either fresh or salt-water bodies.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '967d1da6286e0d0e346e425a8f3800e9'¶: MD5 checksum.

zipfile: str = 'lc100_s3.zip'¶: Zip file name.

filename = 'multilabel-{}.csv'¶: Filename format of split files.

directory: str = 'lc100_s3'¶: Subdirectory containing split files.

filename_regex = 'S3[AB]_(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

cmap: str | matplotlib.colors.Colormap = <matplotlib.colors.ListedColormap object>¶: Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...] = ('Unknown', 'Shrubs', 'Herbaceous vegetation', 'Cultivated and managed vegetation / agriculture', 'Urban / built up', 'Bare / sparse vegetation', 'Snow and ice', 'Permanent water bodies', 'Herbaceous wetland', 'Moss and lichen', 'Closed forest, evergreen needle leaf', 'Closed forest, evergreen broad leaf', 'Closed forest, deciduous needle leaf', 'Closed forest, deciduous broad leaf', 'Closed forest, mixed', 'Closed forest, not matching any of the other definitions', 'Open forest, evergreen needle leaf', 'Open forest, evergreen broad leaf', 'Open forest, deciduous needle leaf', 'Open forest, deciduous broad leaf', 'Open forest, mixed', 'Open forest, not matching any of the other definitions', 'Oceans, seas')¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchLC100SegS3 instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal['static', 'time-series']) – One of ‘static’ or ‘time-series’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchDFC2020S1(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench DFC2020-S1 dataset.

DFC2020-S1 is a land use/land cover segmentation datasets derived from the IEEE GRSS Data Fusion Contest 2020 (DFC2020).

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = 'f10ba017dab6f38b7a6857b169ea924b'¶: MD5 checksum.

zipfile: str = 'dfc2020.zip'¶: Zip file name.

directory: str = 'dfc2020_s1s2'¶: Subdirectory containing split files.

filename = 'dfc-{}-new.csv'¶: Filename format of split files.

classes: tuple[str, ...] = ('Background', 'Forest', 'Shrubland', 'Savanna', 'Grassland', 'Wetlands', 'Croplands', 'Urban/Built-up', 'Snow/Ice', 'Barren', 'Water')¶: List of classes for classification, semantic segmentation, and change detection.

cmap: str | matplotlib.colors.Colormap = <matplotlib.colors.ListedColormap object>¶: Matplotlib color map for semantic segmentation and change detection plots.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchDFC2020S2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench DFC2020-S2 dataset.

DFC2020-S2 is a land use/land cover segmentation datasets derived from the IEEE GRSS Data Fusion Contest 2020 (DFC2020).

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = 'f10ba017dab6f38b7a6857b169ea924b'¶: MD5 checksum.

zipfile: str = 'dfc2020.zip'¶: Zip file name.

directory: str = 'dfc2020_s1s2'¶: Subdirectory containing split files.

filename = 'dfc-{}-new.csv'¶: Filename format of split files.

classes: tuple[str, ...] = ('Background', 'Forest', 'Shrubland', 'Savanna', 'Grassland', 'Wetlands', 'Croplands', 'Urban/Built-up', 'Snow/Ice', 'Barren', 'Water')¶: List of classes for classification, semantic segmentation, and change detection.

cmap: str | matplotlib.colors.Colormap = <matplotlib.colors.ListedColormap object>¶: Matplotlib color map for semantic segmentation and change detection plots.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchFloodS1(root='data', split='train', mode=1, bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench Flood-S1 dataset.

Flood-S1 is a flood segmentation dataset extracted from a large flood mapping dataset Kuro Siwo.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = 'f4337fee5e90203c6d0c3efeb0b97b8a'¶: MD5 checksum.

zipfile: str = 'flood_s1.zip'¶: Zip file name.

directory: str = 'flood_s1'¶: Subdirectory containing split files.

filename = 'grid_dict_{}.json'¶: Filename format of split files.

filename_regex = '.{18}_(?P<date>\\d{8})'¶: Regular expression used to extract date from filename.

date_format = '%Y%m%d'¶: Date format string used to parse date from filename.

cmap: str | matplotlib.colors.Colormap = <matplotlib.colors.ListedColormap object>¶: Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...] = ('No Water', 'Permanent Waters', 'Floods')¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', mode=1, bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBase instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal[1, 2]) – Number of pre-flood images, 1 or 2.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchLCZS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench LCZ-S2 dataset.

LCZ-S2 is a multi-class scene classification dataset derived from So2Sat-LCZ42, a large-scale local climate zone classification dataset.

If you use this dataset in your research, please cite the following papers:

Note

This dataset requires the following additional library to be installed:

https://pypi.org/project/h5py/ to load the dataset.

New in version 0.7.

filename = 'lcz_{}.h5'¶: Filename format of split files.

classes: tuple[str, ...] = ('Compact high rise', 'Compact mid rise', 'Compact low rise', 'Open high rise', 'Open mid rise', 'Open low rise', 'Lightweight low rise', 'Large low rise', 'Sparsely built', 'Heavy industry', 'Dense trees', 'Scattered trees', 'Bush, scrub', 'Low plants', 'Bare rock or paved', 'Bare soil or sand', 'Water')¶: List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBase instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the length of the dataset.

Returns:: Length of the dataset.
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchBiomassS3(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench Biomass-S3 dataset.

Biomass-S3 is a regression dataset based on Sentinel-3 OLCI images and CCI biomass. The biomass product is part of the European Space Agency’s Climate Change Initiative (CCI) program and delivers global forest above-ground biomass at 100 m spatial resolution.

This benchmark supports both static (1 image/location) and time series (1-4 images/location) modes, the former is used in the original benchmark.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '4769ab8c2c23cd8957b99e15e071931c'¶: MD5 checksum.

zipfile: str = 'biomass_s3.zip'¶: Zip file name.

directory: str = 'biomass_s3'¶: Subdirectory containing split files.

filename = 'static_fnames-{}.csv'¶: Filename format of split files.

dtype: torch.dtype = torch.float32¶: Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = 'S3[AB]_(?P<date>\\d{8}T\\d{6})'¶: Regular expression used to extract date from filename.

cmap: str | matplotlib.colors.Colormap = 'YlGn'¶: Matplotlib color map for semantic segmentation and change detection plots.

__init__(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchBiomassS3 instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal['static', 'time-series']) – One of ‘static’ or ‘time-series’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchAQNO2S5P(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench AQ-NO2-S5P dataset.

AQ-NO2-S5P is a regression dataset based on Sentinel-5P NO2 images and EEA air quality data products. Specifically, this dataset combines 2021 measurements of NO2 (annual average concentration) from EEA with S5P NO2 (“tropospheric NO2 column number density”) from GEE.

This benchmark supports both annual (1 image/location) and seasonal (4 images/location) modes, the former is used in the original benchmark.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '92081c7437c5c1daf783868ad7669877'¶: MD5 checksum.

zipfile: str = 'airquality_s5p.zip'¶: Zip file name.

directory: str = 'airquality_s5p/no2'¶: Subdirectory containing split files.

filename = '{}.csv'¶: Filename format of split files.

dtype: torch.dtype = torch.float32¶: Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = '(?P<start>\\d{4}-\\d{2}-\\d{2})_(?P<stop>\\d{4}-\\d{2}-\\d{2})'¶: Regular expression used to extract date from filename.

date_format = '%Y-%m-%d'¶: Date format string used to parse date from filename.

cmap: str | matplotlib.colors.Colormap = 'Wistia'¶: Matplotlib color map for semantic segmentation and change detection plots.

__init__(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchAQNO2S5P instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal['annual', 'seasonal']) – One of ‘annual’ or ‘seasonal’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

class torchgeo.datasets.CopernicusBenchAQO3S5P(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]¶

Copernicus-Bench AQ-O3-S5P dataset.

AQ-O3-S5P is a regression dataset based on Sentinel-5P O3 images and EEA air quality data products. Specifically, this dataset combines 2021 measurements of O3 (93.2 percentile of maximum daily 8-hour means, SOMO35) from EEA with S5P O3 (“O3 column number density”) from GEE.

This benchmark supports both annual (1 image/location) and seasonal (4 images/location) modes, the former is used in the original benchmark.

If you use this dataset in your research, please cite the following papers:

New in version 0.7.

md5: str = '92081c7437c5c1daf783868ad7669877'¶: MD5 checksum.

zipfile: str = 'airquality_s5p.zip'¶: Zip file name.

directory: str = 'airquality_s5p/o3'¶: Subdirectory containing split files.

filename = '{}.csv'¶: Filename format of split files.

dtype: torch.dtype = torch.float32¶: Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = '(?P<start>\\d{4}-\\d{2}-\\d{2})_(?P<stop>\\d{4}-\\d{2}-\\d{2})'¶: Regular expression used to extract date from filename.

date_format = '%Y-%m-%d'¶: Date format string used to parse date from filename.

cmap: str | matplotlib.colors.Colormap = 'Wistia'¶: Matplotlib color map for semantic segmentation and change detection plots.

__init__(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new CopernicusBenchAQO3S5P instance.

Parameters:

root (str | os.PathLike[str]) – Root directory where dataset can be found.
split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.
mode (Literal['annual', 'seasonal']) – One of ‘annual’ or ‘seasonal’.
bands (collections.abc.Sequence[str] | None) – Sequence of band names to load (defaults to all bands).
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – If True, download dataset and store it in the root directory.
checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – Index to return.
Returns:: Data and labels at that index.
Return type:: dict[str, torch.Tensor]

SpaceNet¶

The SpaceNet Dataset is hosted as an Amazon Web Services (AWS) Public Dataset. It contains ~67,000 square km of very high-resolution imagery, >11M building footprints, and ~20,000 km of road labels to ensure that there is adequate open source data available for geospatial machine learning research. SpaceNet Challenge Dataset’s have a combination of very high resolution satellite imagery and high quality corresponding labels for foundational mapping features such as building footprints or road networks.

I = instance segmentation¶
Dataset	Task	Source	License	# Samples	# Classes	Size (px)	Resolution (m)	Bands
SpaceNet 1	I	WorldView-2	CC-BY-SA-4.0	9,735		406x439, 102x110	0.5–1	RGB, MSI
SpaceNet 2	I	WorldView-3	CC-BY-SA-4.0	14,119		650x650, 163x163	0.3–1.24	RGB, MSI
SpaceNet 3	I	WorldView-3	CC-BY-SA-4.0	3,477	7	1,300x1,300, 325x325	0.3–1.24	RGB, MSI
SpaceNet 4	I	WorldView-2	CC-BY-SA-4.0	1,991		900x900, 225x225	0.46–1.67	RGB, MSI
SpaceNet 5	I	WorldView-3	CC-BY-SA-4.0	2,588		1,300x1,300, 325x325	0.3–1.24	RGB, MSI
SpaceNet 6	I	WorldView-2	CC-BY-SA-4.0	5,462		900x900, 450x450	0.5–2	SAR, RGB, MSI
SpaceNet 7	I	Dove	CC-BY-SA-4.0	1,889		1,024x1,024	4	RGB
SpaceNet 8	I	Maxar	CC-BY-SA-4.0	1,289	8	1,300x1,300	0.3–0.8	RGB

class torchgeo.datasets.SpaceNet(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: NonGeoDataset, ABC

Abstract base class for the SpaceNet datasets.

The SpaceNet datasets are a set of datasets that all together contain >11M building footprints and ~20,000 km of road labels mapped over high-resolution satellite imagery obtained from a variety of sensors such as Worldview-2, Worldview-3 and Dove.

Note

The SpaceNet datasets require the following additional library to be installed:

AWS CLI: to download the dataset from AWS.

abstract property dataset_id: str¶: Dataset ID.

abstract property tarballs: dict[str, dict[int, list[str]]]¶: Mapping of tarballs[split][aoi] = [tarballs].

abstract property md5s: dict[str, dict[int, list[str]]]¶: Mapping of md5s[split][aoi] = [md5s].

abstract property valid_aois: dict[str, list[int]]¶: Mapping of valid_aois[split] = [aois].

abstract property valid_images: dict[str, list[str]]¶: Mapping of valid_images[split] = [images].

abstract property valid_masks: tuple[str, ...]¶: List of valid masks.

__init__(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Initialize a new SpaceNet Dataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
split (str) – ‘train’ or ‘test’ split
aois (list[int]) – areas of interest
image (str | None) – image selection
mask (str | None) – mask selection
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – if True, download dataset and store it in the root directory.
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

AssertionError – If any invalid arguments are passed.
DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]¶

Return the number of samples in the dataset.

Returns:: length of the dataset
Return type:: int

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]¶

Plot a sample from the dataset.

Parameters:

sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

New in version 0.2.

class torchgeo.datasets.SpaceNet1(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet 1: Building Detection v1 Dataset.

SpaceNet 1 is a dataset of building footprints over the city of Rio de Janeiro.

Dataset features:

No. of images: 6940 (8 Band) + 6940 (RGB)
No. of polygons: 382,534 building labels
Area Coverage: 2544 sq km
GSD: 1 m (8 band), 50 cm (rgb)
Chip size: 102 x 110 (8 band), 406 x 439 (rgb)

Dataset format:

Imagery - Worldview-2 GeoTIFFs
- 8Band.tif (Multispectral)
- RGB.tif (Pansharpened RGB)
Labels - GeoJSON
- labels.geojson

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/1807.01232

class torchgeo.datasets.SpaceNet2(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet 2: Building Detection v2 Dataset.

SpaceNet 2 is a dataset of building footprints over the cities of Las Vegas, Paris, Shanghai and Khartoum.

Collection features:

AOI	Area (km²)	# Images	# Buildings
Las Vegas	216	3850	151,367
Paris	1030	1148	23,816
Shanghai	1000	4582	92,015
Khartoum	765	1012	35,503

Imagery features:

	PAN	MS	PS-MS	PS-RGB
GSD (m)	0.31	1.24	0.30	0.30
Chip size (px)	650 x 650	163 x 163	650 x 650	650 x 650

Dataset format:

Imagery - Worldview-3 GeoTIFFs
- PAN.tif (Panchromatic)
- MS.tif (Multispectral)
- PS-MS (Pansharpened Multispectral)
- PS-RGB (Pansharpened RGB)
Labels - GeoJSON
- label.geojson

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/1807.01232

class torchgeo.datasets.SpaceNet3(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet 3: Road Network Detection.

SpaceNet 3 is a dataset of road networks over the cities of Las Vegas, Paris, Shanghai, and Khartoum.

Collection features:

AOI	Area (km²)	# Images	# Road Network Labels (km)
Vegas	216	854	3685
Paris	1030	257	425
Shanghai	1000	1028	3537
Khartoum	765	283	1030

Imagery features:

	PAN	MS	PS-MS	PS-RGB
GSD (m)	0.31	1.24	0.30	0.30
Chip size (px)	1300 x 1300	325 x 325	1300 x 1300	1300 x 1300

Dataset format:

Imagery - Worldview-3 GeoTIFFs
- PAN.tif (Panchromatic)
- MS.tif (Multispectral)
- PS-MS (Pansharpened Multispectral)
- PS-RGB (Pansharpened RGB)
Labels - GeoJSON
- labels.geojson

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/1807.01232

New in version 0.3.

class torchgeo.datasets.SpaceNet4(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet 4: Off-Nadir Buildings Dataset.

SpaceNet 4 is a dataset of 27 WV-2 imagery captured at varying off-nadir angles and associated building footprints over the city of Atlanta. The off-nadir angle ranges from 7 degrees to 54 degrees.

Dataset features:

No. of chipped images: 28,728 (PAN/MS/PS-RGBNIR)
No. of label files: 1064
No. of building footprints: >120,000
Area Coverage: 665 sq km
Chip size: 225 x 225 (MS), 900 x 900 (PAN/PS-RGBNIR)

Dataset format:

Imagery - Worldview-2 GeoTIFFs
- PAN.tif (Panchromatic)
- MS.tif (Multispectral)
- PS-RGBNIR (Pansharpened RGBNIR)
Labels - GeoJSON
- labels.geojson

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/1903.12239

class torchgeo.datasets.SpaceNet5(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet3

SpaceNet 5: Automated Road Network Extraction and Route Travel Time Estimation.

SpaceNet 5 is a dataset of road networks over the cities of Moscow, Mumbai and San Juan (unavailable).

Collection features:

AOI	Area (km²)	# Images	# Road Network Labels (km)
Moscow	1353	1353	3066
Mumbai	1021	1016	1951

Imagery features:

	PAN	MS	PS-MS	PS-RGB
GSD (m)	0.31	1.24	0.30	0.30
Chip size (px)	1300 x 1300	325 x 325	1300 x 1300	1300 x 1300

Dataset format:

Imagery - Worldview-3 GeoTIFFs
- PAN.tif (Panchromatic)
- MS.tif (Multispectral)
- PS-MS (Pansharpened Multispectral)
- PS-RGB (Pansharpened RGB)
Labels - GeoJSON
- labels.geojson

If you use this dataset in your research, please use the following citation:

The SpaceNet Partners, “SpaceNet5: Automated Road Network Extraction and Route Travel Time Estimation from Satellite Imagery”, https://spacenet.ai/sn5-challenge/

New in version 0.2.

class torchgeo.datasets.SpaceNet6(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet 6: Multi-Sensor All-Weather Mapping.

SpaceNet 6 is a dataset of optical and SAR imagery over the city of Rotterdam.

Collection features:

AOI	Area (km²)	# Images	# Building Footprint Labels
Rotterdam	120	3401	48000

Imagery features:

	PAN	RGBNIR	PS-RGB	PS-RGBNIR	SAR-Intensity
GSD (m)	0.5	2.0	0.5	0.5	0.5
Chip size (px)	900 x 900	450 x 450	900 x 900	900 x 900	900 x 900

Dataset format:

Imagery - GeoTIFFs from Worldview-2 (optical) and Capella Space (SAR)
- PAN.tif (Panchromatic)
- RGBNIR.tif (Multispectral)
- PS-RGB (Pansharpened RGB)
- PS-RGBNIR (Pansharpened RGBNIR)
- SAR-Intensity (SAR Intensity)
Labels - GeoJSON
- labels.geojson

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2004.06500

New in version 0.4.

class torchgeo.datasets.SpaceNet7(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet 7: Multi-Temporal Urban Development Challenge.

SpaceNet 7 is a dataset which consist of medium resolution (4.0m) satellite imagery mosaics acquired from Planet Labs’ Dove constellation between 2017 and 2020. It includes ≈ 24 images (one per month) covering > 100 unique geographies, and comprises > 40,000 km2 of imagery and exhaustive polygon labels of building footprints therein, totaling over 11M individual annotations.

Dataset features:

No. of train samples: 1423
No. of test samples: 466
No. of building footprints: 11,080,000
Area Coverage: 41,000 sq km
Chip size: 1024 x 1024
GSD: ~4m

Dataset format:

Imagery - Planet Dove GeoTIFF
- mosaic.tif
Labels - GeoJSON
- labels.geojson

If you use this dataset in your research, please cite the following paper:

https://arxiv.org/abs/2102.04420

New in version 0.2.

class torchgeo.datasets.SpaceNet8(root='data', split='train', aois=[], image=None, mask=None, transforms=None, download=False, checksum=False)[source]¶

Bases: SpaceNet

SpaceNet8: Flood Detection Challenge Using Multiclass Segmentation.

SpaceNet 8 is a dataset focusing on infrastructure and flood mapping related to hurricanes and heavy rains that cause route obstructions and significant damage.

If you use this dataset in your research, please cite the following paper:

https://openaccess.thecvf.com/content/CVPR2022W/EarthVision/html/Hansch_SpaceNet_8_-_The_Detection_of_Flooded_Roads_and_Buildings_CVPRW_2022_paper.html

New in version 0.6.

Base Classes¶

If you want to write your own custom dataset, you can extend one of these abstract base classes.

GeoDataset¶

class torchgeo.datasets.GeoDataset[source]¶

Bases: Dataset[dict[str, Any]], ABC

Abstract base class for datasets containing geospatial information.

Geospatial information includes things like:

coordinates (latitude, longitude)
coordinate reference system (CRS)
resolution

GeoDataset is a special class of datasets. Unlike NonGeoDataset, the presence of geospatial information allows two or more datasets to be combined based on latitude/longitude. This allows users to do things like:

Combine image and target labels and sample from both simultaneously (e.g., Landsat and CDL)
Combine datasets for multiple image sources for multimodal learning or data fusion (e.g., Landsat and Sentinel)
Combine image and other raster data (e.g., elevation, temperature, pressure) and sample from both simultaneously (e.g., Landsat and Aster Global DEM)

These combinations require that all queries are present in both datasets, and can be combined using an IntersectionDataset:

dataset = landsat & cdl

Users may also want to:

Combine datasets for multiple image sources and treat them as equivalent (e.g., Landsat 7 and Landsat 8)
Combine datasets for disparate geospatial locations (e.g., Chesapeake NY and PA)

These combinations require that all queries are present in at least one dataset, and can be combined using a UnionDataset:

dataset = landsat7 | landsat8

filename_glob = '*'¶

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

__add__ = None¶: GeoDataset addition can be ambiguous and is no longer supported. Users should instead use the intersection or union operator.

abstract __getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

__and__(other)[source]¶

Take the intersection of two GeoDataset.

Parameters:: other (GeoDataset) – another dataset
Returns:: a single dataset
Raises:: ValueError – if other is not a GeoDataset
Return type:: IntersectionDataset

New in version 0.2.

__or__(other)[source]¶

Take the union of two GeoDatasets.

Parameters:: other (GeoDataset) – another dataset
Returns:: a single dataset
Raises:: ValueError – if other is not a GeoDataset
Return type:: UnionDataset

New in version 0.2.

__len__()[source]¶

Return the number of files in the dataset.

Returns:: length of the dataset
Return type:: int

__str__()[source]¶

Return the informal string representation of the object.

Returns:: informal string representation
Return type:: str

property bounds: tuple[slice, slice, slice]¶

Bounds of the index.

Returns:: Bounding x, y, and t slices.

property crs: CRS¶

coordinate reference system (CRS) of the dataset.

Returns:: The coordinate reference system (CRS).

property res: tuple[float, float]¶

Resolution of the dataset in units of CRS.

Returns:: The resolution of the dataset.

property files: list[str]¶

A list of all files in the dataset.

Returns:: All files in the dataset.

New in version 0.5.

RasterDataset¶

class torchgeo.datasets.RasterDataset(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Bases: GeoDataset

Abstract base class for GeoDataset stored as raster files.

filename_regex = '.*'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion
start: used to calculate mint for index insertion
stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

band: replaced with requested band name

date_format = '%Y%m%d'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

mint: datetime = Timestamp('1677-09-21 00:12:43.145224193')¶: Minimum timestamp if not in filename

maxt: datetime = Timestamp('2262-04-11 23:47:16.854775807')¶: Maximum timestamp if not in filename

is_image = True¶

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

separate_files = False¶: True if data is stored in a separate file for each band, else False.

all_bands: tuple[str, ...] = ()¶: Names of all available bands in the dataset

rgb_bands: tuple[str, ...] = ()¶: Names of RGB bands in the dataset, used for plotting

property dtype: dtype¶

The dtype of the dataset (overrides the dtype of the data file via a cast).

Defaults to float32 if is_image is True, else long. Can be overridden for tasks like pixel-wise regression where the mask should be float32 instead of long.

Returns:: the dtype of the dataset

New in version 0.5.

property resampling: Resampling¶

Resampling algorithm used when reading input files.

Defaults to bilinear for float dtypes and nearest for int dtypes.

Returns:: The resampling method to use.

New in version 0.6.

__init__(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶

Initialize a new RasterDataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (collections.abc.Sequence[str] | None) – bands to return (defaults to all bands)
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

AssertionError – If bands are invalid.
DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {}¶: Color map for the dataset, used for plotting

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

VectorDataset¶

class torchgeo.datasets.VectorDataset(paths='data', crs=None, res=(0.0001, 0.0001), transforms=None, label_name=None, task='semantic_segmentation', layer=None)[source]¶

Bases: GeoDataset

Abstract base class for GeoDataset stored as vector files.

filename_regex = '.*'¶

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

date: used to calculate mint and maxt for index insertion

date_format = '%Y%m%d'¶

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group.

property dtype: dtype¶

The dtype of the dataset (overrides the dtype of the data file via a cast).

Defaults to long.

Returns:: the dtype of the dataset

New in version 0.6.

__init__(paths='data', crs=None, res=(0.0001, 0.0001), transforms=None, label_name=None, task='semantic_segmentation', layer=None)[source]¶

Initialize a new VectorDataset instance.

Parameters:

paths (str | os.PathLike[str] | collections.abc.Iterable[str | os.PathLike[str]]) – one or more root directories to search or files to load
crs (pyproj.crs.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float]) – resolution of the dataset in units of CRS
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
label_name (str | None) – name of the dataset property that has the label to be rasterized into the mask
task (Literal['object_detection', 'semantic_segmentation', 'instance_segmentation']) – computer vision task the dataset is used for. Supported output types object_detection, semantic_segmentation, instance_segmentation
layer (str | int | None) – if the input is a multilayer vector dataset, such as a geopackage, specify which layer to use. Can be int to specify the index of the layer, str to select the layer with that name or None which opens the first layer

Raises:

DatasetNotFoundError – If dataset is not found.
ValueError – If task is not one of allowed tasks

New in version 0.4: The label_name parameter.

Changed in version 0.5: root was renamed to paths.

New in version 0.8: The task and layer parameters

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

get_label(feature)[source]¶

Get label value to use for rendering a feature.

Parameters:: feature (Feature) – the fiona.model.Feature from which to extract the label.
Returns:: the integer label, or 0 if the feature should not be rendered.
Return type:: int

New in version 0.6.

NonGeoDataset¶

class torchgeo.datasets.NonGeoDataset[source]¶

Bases: Dataset[dict[str, Any]], ABC

Abstract base class for datasets lacking geospatial information.

This base class is designed for datasets with pre-defined image chips.

abstract __getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and labels at that index
Raises:: IndexError – if index is out of range of the dataset
Return type:: dict[str, Any]

abstract __len__()[source]¶

Return the length of the dataset.

Returns:: length of the dataset
Return type:: int

__str__()[source]¶

Return the informal string representation of the object.

Returns:: informal string representation
Return type:: str

NonGeoClassificationDataset¶

class torchgeo.datasets.NonGeoClassificationDataset(root='data', transforms=None, loader=<function default_loader>, is_valid_file=None)[source]¶

Bases: NonGeoDataset, ImageFolder

Abstract base class for classification datasets lacking geospatial information.

This base class is designed for datasets with pre-defined image chips which are separated into separate folders per class.

__init__(root='data', transforms=None, loader=<function default_loader>, is_valid_file=None)[source]¶

Initialize a new NonGeoClassificationDataset instance.

Parameters:

root (str | os.PathLike[str]) – root directory where dataset can be found
transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version
loader (collections.abc.Callable[[str | os.PathLike[str]], Any] | None) – a callable function which takes as input a path to an image and returns a PIL Image or numpy array
is_valid_file (collections.abc.Callable[[str | os.PathLike[str]], bool] | None) – A function that takes the path of an Image file and checks if the file is a valid file

__getitem__(index)[source]¶

Return an index within the dataset.

Parameters:: index (int) – index to return
Returns:: data and label at that index
Return type:: dict[str, torch.Tensor]

__len__()[source]¶

Return the number of data points in the dataset.

Returns:: length of the dataset
Return type:: int

IntersectionDataset¶

class torchgeo.datasets.IntersectionDataset(dataset1, dataset2, spatial_only=False, collate_fn=<function concat_samples>, transforms=None)[source]¶

Bases: GeoDataset

Dataset representing the intersection of two GeoDatasets.

This allows users to do things like:

Combine image and target labels and sample from both simultaneously (e.g., Landsat and CDL)
Combine datasets for multiple image sources for multimodal learning or data fusion (e.g., Landsat and Sentinel)
Combine image and other raster data (e.g., elevation, temperature, pressure) and sample from both simultaneously (e.g., Landsat and Aster Global DEM)

These combinations require that all queries are present in both datasets, and can be combined using an IntersectionDataset:

dataset = landsat & cdl

New in version 0.2.

__init__(dataset1, dataset2, spatial_only=False, collate_fn=<function concat_samples>, transforms=None)[source]¶

Initialize a new IntersectionDataset instance.

When computing the intersection between two datasets that both contain model inputs (such as images) or model outputs (such as masks), the default behavior is to stack the data along the channel dimension. The collate_fn parameter can be used to change this behavior.

Parameters:

dataset1 (GeoDataset) – the first dataset
dataset2 (GeoDataset) – the second dataset
spatial_only (bool) – if True, ignore temporal dimension when computing intersection
collate_fn (Callable[[Sequence[dict[str, Any]]], dict[str, Any]]) – function used to collate samples
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

Raises:

RuntimeError – if datasets have no spatiotemporal intersection
ValueError – if either dataset is not a GeoDataset

New in version 0.8: The spatial_only parameter.

New in version 0.4: The transforms parameter.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

__str__()[source]¶

Return the informal string representation of the object.

Returns:: informal string representation
Return type:: str

property crs: CRS¶

coordinate reference system (CRS) of both datasets.

Returns:: The coordinate reference system (CRS).

property res: tuple[float, float]¶

Resolution of both datasets in units of CRS.

Returns:: Resolution of both datasets.

UnionDataset¶

class torchgeo.datasets.UnionDataset(dataset1, dataset2, collate_fn=<function merge_samples>, transforms=None)[source]¶

Bases: GeoDataset

Dataset representing the union of two GeoDatasets.

This allows users to do things like:

Combine datasets for multiple image sources and treat them as equivalent (e.g., Landsat 7 and Landsat 8)
Combine datasets for disparate geospatial locations (e.g., Chesapeake NY and PA)

These combinations require that all queries are present in at least one dataset, and can be combined using a UnionDataset:

dataset = landsat7 | landsat8

New in version 0.2.

__init__(dataset1, dataset2, collate_fn=<function merge_samples>, transforms=None)[source]¶

Initialize a new UnionDataset instance.

When computing the union between two datasets that both contain model inputs (such as images) or model outputs (such as masks), the default behavior is to merge the data to create a single image/mask. The collate_fn parameter can be used to change this behavior.

Parameters:

dataset1 (GeoDataset) – the first dataset
dataset2 (GeoDataset) – the second dataset
collate_fn (Callable[[Sequence[dict[str, Any]]], dict[str, Any]]) – function used to collate samples
transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

Raises:

ValueError – if either dataset is not a GeoDataset

New in version 0.4: The transforms parameter.

__getitem__(query)[source]¶

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:: query (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
Returns:: Sample of input, target, and/or metadata at that index.
Raises:: IndexError – If query is not found in the index.
Return type:: dict[str, Any]

__str__()[source]¶

Return the informal string representation of the object.

Returns:: informal string representation
Return type:: str

property crs: CRS¶

coordinate reference system (CRS) of both datasets.

Returns:: The coordinate reference system (CRS).

property res: tuple[float, float]¶

Resolution of both datasets in units of CRS.

Returns:: The resolution of both datasets.

Utilities¶

Collation Functions¶

torchgeo.datasets.stack_samples(samples)[source]¶

Stack a list of samples along a new axis.

Useful for forming a mini-batch of samples to pass to torch.utils.data.DataLoader.

Parameters:: samples (Iterable[Mapping[Any, Any]]) – list of samples
Returns:: a single sample
Return type:: dict[Any, Any]

New in version 0.2.

torchgeo.datasets.concat_samples(samples)[source]¶

Concatenate a list of samples along an existing axis.

Useful for joining samples in a torchgeo.datasets.IntersectionDataset.

Parameters:: samples (Iterable[Mapping[Any, Any]]) – list of samples
Returns:: a single sample
Return type:: dict[Any, Any]

New in version 0.2.

torchgeo.datasets.merge_samples(samples)[source]¶

Merge a list of samples.

Useful for joining samples in a torchgeo.datasets.UnionDataset.

Parameters:: samples (Iterable[Mapping[Any, Any]]) – list of samples
Returns:: a single sample
Return type:: dict[Any, Any]

New in version 0.2.

torchgeo.datasets.unbind_samples(sample)[source]¶

Reverse of stack_samples().

Useful for turning a mini-batch of samples into a list of samples. These individual samples can then be plotted using a dataset’s plot method.

Parameters:: sample (MutableMapping[Any, Any]) – a mini-batch of samples
Returns:: list of samples
Return type:: list[dict[Any, Any]]

New in version 0.2.

Splitting Functions¶

torchgeo.datasets.random_bbox_assignment(dataset, lengths, generator=<torch._C.Generator object>)[source]¶

Split a GeoDataset randomly assigning its index’s objects.

This function will go through each object in the GeoDataset’s index and randomly assign it to new GeoDatasets.

Parameters:

dataset (GeoDataset) – dataset to be split
lengths (Sequence[float]) – lengths or fractions of splits to be produced
generator (torch._C.Generator | None) – (optional) generator used for the random permutation

Returns:

A list of the subset datasets.

Return type:

New in version 0.5.

torchgeo.datasets.random_bbox_splitting(dataset, fractions, generator=<torch._C.Generator object>)[source]¶

Split a GeoDataset randomly splitting its index’s objects.

This function will go through each object in the GeoDataset’s index, split it in a random direction and assign the resulting objects to new GeoDatasets.

Parameters:

dataset (GeoDataset) – dataset to be split
fractions (Sequence[float]) – fractions of splits to be produced
generator (torch._C.Generator | None) – generator used for the random permutation

Returns:

A list of the subset datasets.

Return type:

New in version 0.5.

torchgeo.datasets.random_grid_cell_assignment(dataset, fractions, grid_size=6, generator=<torch._C.Generator object>)[source]¶

Overlays a grid over a GeoDataset and randomly assigns cells to new GeoDatasets.

This function will go through each object in the GeoDataset’s index, overlay a grid over it, and randomly assign each cell to new GeoDatasets.

Parameters:

dataset (GeoDataset) – dataset to be split
fractions (Sequence[float]) – fractions of splits to be produced
grid_size (int) – number of rows and columns for the grid
generator (torch._C.Generator | None) – generator used for the random permutation

Returns:

A list of the subset datasets.

Return type:

New in version 0.5.

torchgeo.datasets.roi_split(dataset, rois)[source]¶

Split a GeoDataset intersecting it with a ROI for each desired new GeoDataset.

Parameters:

dataset (GeoDataset) – dataset to be split
rois (Sequence[Polygon]) – regions of interest of splits to be produced

Returns:

A list of the subset datasets.

Return type:

New in version 0.5.

torchgeo.datasets.time_series_split(dataset, lengths)[source]¶

Split a GeoDataset on its time dimension to create non-overlapping GeoDatasets.

Parameters:

dataset (GeoDataset) – dataset to be split
lengths (Sequence[float | pandas._libs.tslibs.timedeltas.Timedelta | pandas._libs.interval.Interval]) – lengths, fractions or pairs of timestamps (start, end) of splits to be produced

Returns:

A list of the subset datasets.

Return type: