torchgeo.models<a class="headerlink" href="#module-torchgeo.models" title="Permalink to this heading">¶

class torchgeo.models.Aurora_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Aurora weights.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2405.13063

New in version 0.8.

Change Star¶

class torchgeo.models.ChangeStar(dense_feature_extractor, seg_classifier, changemixin, inference_mode='t1t2')[source]¶

Bases: Module

The base class of the network architecture of ChangeStar.

ChangeStar is composed of an any segmentation model and a ChangeMixin module. This model is mainly used for binary/multi-class change detection under bitemporal supervision and single-temporal supervision. It features the property of segmentation architecture reusing, which is helpful to integrate advanced dense prediction (e.g., semantic segmentation) network architecture into change detection.

For multi-class change detection, semantic change prediction can be inferred by a binary change prediction from the ChangeMixin module and two semantic predictions from the Segmentation model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2108.07002

__init__(dense_feature_extractor, seg_classifier, changemixin, inference_mode='t1t2')[source]¶

Initializes a new ChangeStar model.

Parameters:

dense_feature_extractor (Module) – module for dense feature extraction, typically a semantic segmentation model without semantic segmentation head.
seg_classifier (Module) – semantic segmentation head, typically a convolutional layer followed by an upsampling layer.
changemixin (ChangeMixin) – torchgeo.models.ChangeMixin module
inference_mode (str) – name of inference mode 't1t2' | 't2t1' | 'mean'. 't1t2': concatenate bitemporal features in the order of t1->t2; 't2t1': concatenate bitemporal features in the order of t2->t1; 'mean': the weighted mean of the output of 't1t2' and 't1t2'

forward(x)[source]¶

Forward pass of the model.

Parameters:: x (Tensor) – a bitemporal input tensor of shape [B, T, C, H, W]
Returns:: a dictionary containing bitemporal semantic segmentation logit and binary change detection logit/probability
Return type:: dict[str, torch.Tensor]

class torchgeo.models.ChangeStarFarSeg(backbone='resnet50', classes=1, backbone_pretrained=True)[source]¶

Bases: ChangeStar

The network architecture of ChangeStar(FarSeg).

ChangeStar(FarSeg) is composed of a FarSeg model and a ChangeMixin module.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2108.07002

__init__(backbone='resnet50', classes=1, backbone_pretrained=True)[source]¶

Initializes a new ChangeStarFarSeg model.

Parameters:

backbone (str) – name of ResNet backbone
classes (int) – number of output segmentation classes
backbone_pretrained (bool) – whether to use pretrained weight for backbone

class torchgeo.models.ChangeMixin(in_channels=256, inner_channels=16, num_convs=4, scale_factor=4.0)[source]¶

Bases: Module

This module enables any segmentation model to detect binary change.

The common usage is to attach this module on a segmentation model without the classification head.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2108.07002

__init__(in_channels=256, inner_channels=16, num_convs=4, scale_factor=4.0)[source]¶

Initializes a new ChangeMixin module.

Parameters:

in_channels (int) – sum of channels of bitemporal feature maps
inner_channels (int) – number of channels of inner feature maps
num_convs (int) – number of convolution blocks
scale_factor (float) – number of upsampling factor

forward(bi_feature)[source]¶

Forward pass of the model.

Parameters:: bi_feature (Tensor) – input bitemporal feature maps of shape [b, t, c, h, w]
Returns:: a list of bidirected output predictions
Return type:: list[torch.Tensor]

Copernicus-FM¶

class torchgeo.models.CopernicusFM(img_size=224, patch_size=16, drop_rate=0.0, embed_dim=1024, depth=24, num_heads=16, hyper_dim=128, num_classes=0, global_pool=True, mlp_ratio=4.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶

Bases: Module

CopernicusFM: VisionTransformer backbone.

Example

1. Spectral Mode (Using Wavelength and Bandwidth):

>>> model = CopernicusFM()
>>> x = torch.randn(1, 4, 224, 224) # input image
>>> metadata = torch.full((1, 4), float('nan')) # [lon (degree), lat (degree), delta_time (days since 1970/1/1), patch_token_area (km^2)], assume unknown
>>> wavelengths = [490, 560, 665, 842] # wavelength (nm): B,G,R,NIR (Sentinel 2)
>>> bandwidths = [65, 35, 30, 115] # bandwidth (nm): B,G,R,NIR (Sentinel 2)
>>> kernel_size = 16 # expected patch size
>>> input_mode = 'spectral'
>>> logit = model(x, metadata, wavelengths=wavelengths, bandwidths=bandwidths, input_mode=input_mode, kernel_size=kernel_size)
>>> print(logit.shape)

2. Variable Mode (Using language embedding):

>>> model = CopernicusFM()
>>> varname = 'Sentinel 5P Nitrogen Dioxide' # variable name (as input to a LLM for language embed)
>>> x = torch.randn(1, 1, 56, 56) # input image
>>> metadata = torch.full((1, 4), float('nan')) # [lon (degree), lat (degree), delta_time (days since 1970/1/1), patch_token_area (km^2)], assume unknown
>>> language_embed = torch.randn(2048) # language embedding: encode varname with a LLM (e.g. Llama)
>>> kernel_size = 4 # expected patch size
>>> input_mode = 'variable'
>>> logit = model(x, metadata, language_embed=language_embed, input_mode=input_mode, kernel_size=kernel_size)
>>> print(logit.shape)

__init__(img_size=224, patch_size=16, drop_rate=0.0, embed_dim=1024, depth=24, num_heads=16, hyper_dim=128, num_classes=0, global_pool=True, mlp_ratio=4.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶

Initialize a new CopernicusFM instance.

Parameters:

img_size (int) – Input image size.
patch_size (int) – Patch size.
drop_rate (float) – Head dropout rate.
embed_dim (int) – Transformer embedding dimension.
depth (int) – Depth of transformer.
num_heads (int) – Number of attention heads.
hyper_dim (int) – Dimensions of dynamic weight generator.
num_classes (int) – Number of classes for classification head.
global_pool (bool) – Whether or not to perform global pooling.
mlp_ratio (float) – Ratio of MLP hidden dim to embedding dim.
norm_layer (type[torch.nn.modules.module.Module]) – Normalization layer.

get_coord_pos_embed(lons, lats, embed_dim)[source]¶

Geospatial coordinate position embedding.

Parameters:

lons (Tensor) – Longitudes (x).
lats (Tensor) – Latitudes (y).
embed_dim (int) – Embedding dimension.

Returns:

Coordinate position embedding.

Return type:

get_area_pos_embed(areas, embed_dim)[source]¶

Geospatial area position embedding.

Parameters:

areas (Tensor) – Spatial areas.
embed_dim (int) – Embedding dimension.

Returns:

Area position embedding.

Return type:

get_time_pos_embed(times, embed_dim)[source]¶

Geotemporal position embedding.

Parameters:

times (Tensor) – Timestamps.
embed_dim (int) – Embedding dimension.

Returns:

Temporal position embedding.

Return type:

forward_features(x, metadata, wavelengths=None, bandwidths=None, language_embed=None, input_mode='spectral', kernel_size=None)[source]¶

Forward pass of the feature embedding layer.

Parameters:

x (Tensor) – Input mini-batch.
metadata (Tensor) – Longitudes (degree), latitudes (degree), times (days since 1970/1/1), and areas (km^2) of each patch. Use NaN for unknown metadata.
wavelengths (collections.abc.Sequence[float] | None) – Wavelengths of each spectral band (nm). Only used if input_mode==’spectral’.
bandwidths (collections.abc.Sequence[float] | None) – Bandwidths in nm. Only used if input_mode==’spectral’.
language_embed (torch.Tensor | None) – Language embedding tensor from Llama 3.2 1B (length 2048). Only used if input_mode==’variable’.
input_mode (Literal['spectral', 'variable']) – One of ‘spectral’ or ‘variable’.
kernel_size (int | None) – If provided and differs from the initialized kernel size, the generated patch embed kernel weights are resized accordingly.

Returns:

Output mini-batch.

Return type:

forward_head(x, pre_logits=False)[source]¶

Forward pass of the attention head.

Parameters:

x (Tensor) – Input mini-batch.
pre_logits (bool) – Whether or not to return the layer before logits are computed.

Returns:

Output mini-batch.

Return type:

forward(x, metadata, wavelengths=None, bandwidths=None, language_embed=None, input_mode='spectral', kernel_size=None)[source]¶

Forward pass of the model.

Parameters:

x (Tensor) – Input mini-batch.
metadata (Tensor) – Longitudes (degree), latitudes (degree), times (days since 1970/1/1), and areas (km^2) of each patch. Use NaN for unknown metadata.
wavelengths (collections.abc.Sequence[float] | None) – Wavelengths of each spectral band (nm). Only used if input_mode==’spectral’.
bandwidths (collections.abc.Sequence[float] | None) – Bandwidths in nm. Only used if input_mode==’spectral’.
language_embed (torch.Tensor | None) – Language embedding tensor from Llama 3.2 1B (length 2048). Only used if input_mode==’variable’.
input_mode (Literal['spectral', 'variable']) – One of ‘spectral’ or ‘variable’.
kernel_size (int | None) – If provided and differs from the initialized kernel size, the generated patch embed kernel weights are resized accordingly.

Returns:

Output mini-batch.

Return type:

torchgeo.models.copernicusfm_base(weights=None, *args, **kwargs)[source]¶

CopernicusFM vit-base model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2503.11849

New in version 0.7.

Parameters:

weights (torchgeo.models.copernicusfm.CopernicusFM_Base_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to CopernicusFM.
**kwargs (Any) – Additional keyword arguments to pass to CopernicusFM.

Returns:

A CopernicusFM base model.

Return type:

CopernicusFM

class torchgeo.models.CopernicusFM_Base_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Copernicus-FM-base weights.

CROMA¶

class torchgeo.models.CROMA(modalities=['sar', 'optical'], encoder_dim=768, encoder_depth=12, num_heads=16, patch_size=8, image_size=120)[source]¶

Bases: Module

Pretrained CROMA model.

Corresponds to the pretrained CROMA model found in the CROMA repository:

https://github.com/antofuller/CROMA/blob/main/pretrain_croma.py

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2311.00566

__init__(modalities=['sar', 'optical'], encoder_dim=768, encoder_depth=12, num_heads=16, patch_size=8, image_size=120)[source]¶

Initialize the CROMA model.

Parameters:

modalities (Sequence[str]) – List of modalities used during forward pass, list can contain ‘sar’, ‘optical’, or both.
encoder_dim (int) – Dimension of the encoder.
encoder_depth (int) – Depth of the encoder.
num_heads (int) – Number of heads for the multi-head attention, should be power of 2.
patch_size (int) – Size of the patches.
image_size (int) – Size of the input images, CROMA was trained on 120x120 images, must be a multiple of 8.

Raises:

AssertionError – If any arguments are not valid.

forward(x_sar=None, x_optical=None)[source]¶

Forward pass of the CROMA model.

Parameters:

x_sar (torch.Tensor | None) – Input mini-batch of SAR images [B, 2, H, W].
x_optical (torch.Tensor | None) – Input mini-batch of optical images [B, 12, H, W].

torchgeo.models.croma_base(weights=None, *args, **kwargs)[source]¶

CROMA base model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2311.00566

New in version 0.7.

Parameters:

weights (torchgeo.models.croma.CROMABase_Weights | None) – Pretrained weights to load.
*args (Any) – Additional arguments to pass to :class:CROMA.`
**kwargs (Any) – Additional keyword arguments to pass to :class:CROMA.`

Returns:

CROMA base model.

Return type:

CROMA

torchgeo.models.croma_large(weights=None, *args, **kwargs)[source]¶

CROMA large model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2311.00566

New in version 0.7.

Parameters:

weights (torchgeo.models.croma.CROMALarge_Weights | None) – Pretrained weights to load.
*args (Any) – Additional arguments to pass to :class:CROMA.`
**kwargs (Any) – Additional keyword arguments to pass to :class:CROMA.`

Returns:

CROMA large model.

Return type:

CROMA

class torchgeo.models.CROMABase_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

CROMA base model weights.

New in version 0.7.

class torchgeo.models.CROMALarge_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

CROMA large model weights.

New in version 0.7.

DOFA¶

class torchgeo.models.DOFA(img_size=224, patch_size=16, drop_rate=0.0, embed_dim=1024, depth=24, num_heads=16, dynamic_embed_dim=128, num_classes=45, global_pool=True, mlp_ratio=4.0, norm_layer=functools.partial(<class 'torch.nn.modules.normalization.LayerNorm'>, eps=1e-06))[source]¶

Bases: Module

Dynamic One-For-All (DOFA) model.

Reference implementation:

https://github.com/zhu-xlab/DOFA

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.15356

New in version 0.6.

__init__(img_size=224, patch_size=16, drop_rate=0.0, embed_dim=1024, depth=24, num_heads=16, dynamic_embed_dim=128, num_classes=45, global_pool=True, mlp_ratio=4.0, norm_layer=functools.partial(<class 'torch.nn.modules.normalization.LayerNorm'>, eps=1e-06))[source]¶

Initialize a new DOFA instance.

Parameters:

img_size (int) – Input image size.
patch_size (int) – Patch size.
drop_rate (float) – Head dropout rate.
embed_dim (int) – Transformer embedding dimension.
depth (int) – Depth of transformer.
num_heads (int) – Number of attention heads.
dynamic_embed_dim (int) – Dimensions of dynamic weight generator.
num_classes (int) – Number of classes for classification head.
global_pool (bool) – Whether or not to perform global pooling.
mlp_ratio (float) – Ratio of MLP hidden dim to embedding dim.
norm_layer (type[torch.nn.modules.module.Module]) – Normalization layer.

forward_features(x, wavelengths)[source]¶

Forward pass of the feature embedding layer.

Parameters:

x (Tensor) – Input mini-batch.
wavelengths (list[float]) – Wavelengths of each spectral band (μm).

Returns:

Output mini-batch.

Return type:

forward_head(x, pre_logits=False)[source]¶

Forward pass of the attention head.

Parameters:

x (Tensor) – Input mini-batch.
pre_logits (bool) – Whether or not to return the layer before logits are computed.

Returns:

Output mini-batch.

Return type:

forward(x, wavelengths)[source]¶

Forward pass of the model.

Parameters:

x (Tensor) – Input mini-batch.
wavelengths (list[float]) – Wavelengths of each spectral band (μm).

Returns:

Output mini-batch.

Return type:

torchgeo.models.dofa_small_patch16_224(*args, **kwargs)[source]¶

Dynamic One-For-All (DOFA) small patch size 16 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.15356

New in version 0.6.

Parameters:

*args (Any) – Additional arguments to pass to DOFA.
**kwargs (Any) – Additional keyword arguments to pass to DOFA.

Returns:

A DOFA small 16 model.

Return type:

torchgeo.models.dofa_base_patch16_224(weights=None, *args, **kwargs)[source]¶

Dynamic One-For-All (DOFA) base patch size 16 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.15356

New in version 0.6.

Parameters:

weights (torchgeo.models.dofa.DOFABase16_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to DOFA.
**kwargs (Any) – Additional keyword arguments to pass to DOFA.

Returns:

A DOFA base 16 model.

Return type:

torchgeo.models.dofa_large_patch16_224(weights=None, *args, **kwargs)[source]¶

Dynamic One-For-All (DOFA) large patch size 16 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.15356

New in version 0.6.

Parameters:

weights (torchgeo.models.dofa.DOFALarge16_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to DOFA.
**kwargs (Any) – Additional keyword arguments to pass to DOFA.

Returns:

A DOFA large 16 model.

Return type:

torchgeo.models.dofa_huge_patch14_224(*args, **kwargs)[source]¶

Dynamic One-For-All (DOFA) huge patch size 14 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.15356

New in version 0.6.

Parameters:

*args (Any) – Additional arguments to pass to DOFA.
**kwargs (Any) – Additional keyword arguments to pass to DOFA.

Returns:

A DOFA huge 14 model.

Return type:

class torchgeo.models.DOFABase16_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Dynamic One-For-All (DOFA) base patch size 16 weights.

New in version 0.6.

class torchgeo.models.DOFALarge16_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Dynamic One-For-All (DOFA) large patch size 16 weights.

New in version 0.6.

EarthLoc¶

class torchgeo.models.EarthLoc(in_channels=3, image_size=320, desc_dim=4096, backbone='resnet50', pretrained=True)[source]¶

Bases: Module

EarthLoc model for generating feature descriptors from satellite imagery.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.06758

New in version 0.8.

__init__(in_channels=3, image_size=320, desc_dim=4096, backbone='resnet50', pretrained=True)[source]¶

Initialize the EarthLoc model.

Parameters:

in_channels (int) – Number of input channels in the images (default: 3 for RGB).
image_size (int) – Size of the input images (assumed square).
desc_dim (int) – Dimension of the final output feature descriptor.
backbone (str) – Backbone model to use for feature extraction (default: “resnet50”).
pretrained (bool) – Whether to use pre-trained weights for the backbone model.

forward(x)[source]¶

Forward pass of the EarthLoc model.

Parameters:: x (Tensor) – Input tensor of shape (b, c, h, w).
Returns:: Output feature descriptor tensor of shape (b, desc_dim).
Return type:: Tensor

torchgeo.models.earthloc(weights=None, *args, **kwargs)[source]¶

EarthLoc model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2403.06758

New in version 0.8.

Parameters:

weights (torchgeo.models.earthloc.EarthLoc_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to EarthLoc.
**kwargs (Any) – Additional keyword arguments to pass to EarthLoc.

Returns:

An EarthLoc model.

Return type:

EarthLoc

class torchgeo.models.EarthLoc_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

EarthLoc weights.

FarSeg¶

class torchgeo.models.FarSeg(backbone='resnet50', classes=16, backbone_pretrained=True)[source]¶

Bases: Module

Foreground-Aware Relation Network (FarSeg).

This model can be used for binary- or multi-class object segmentation, such as building, road, ship, and airplane segmentation. It can be also extended as a change detection model. It features a foreground-scene relation module to model the relation between scene embedding, object context, and object feature, thus improving the discrimination of object feature representation.

If you use this model in your research, please cite the following paper:

https://arxiv.org/pdf/2011.09766

__init__(backbone='resnet50', classes=16, backbone_pretrained=True)[source]¶

Initialize a new FarSeg model.

Parameters:

backbone (str) – name of ResNet backbone, one of [“resnet18”, “resnet34”, “resnet50”, “resnet101”]
classes (int) – number of output segmentation classes
backbone_pretrained (bool) – whether to use pretrained weight for backbone

forward(x)[source]¶

Forward pass of the model.

Parameters:: x (Tensor) – input image
Returns:: output prediction
Return type:: Tensor

Fully-convolutional Network¶

class torchgeo.models.FCN(in_channels, classes, num_filters=64)[source]¶

Bases: Module

A simple 5 layer FCN with leaky relus and ‘same’ padding.

__init__(in_channels, classes, num_filters=64)[source]¶

Initializes the 5 layer FCN model.

Parameters:

in_channels (int) – Number of input channels that the model will expect
classes (int) – Number of filters in the final layer
num_filters (int) – Number of filters in each convolutional layer

forward(x)[source]¶

Forward pass of the model.

FC Siamese Networks¶

class torchgeo.models.FCSiamConc(*args, **kwargs)[source]¶

Bases: SegmentationModel

Fully-convolutional Siamese Concatenation (FC-Siam-conc).

If you use this model in your research, please cite the following paper:

https://doi.org/10.1109/ICIP.2018.8451652

__init__(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm='batchnorm', decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None)[source]¶

Initialize a new FCSiamConc model.

Parameters:

encoder_name (str) – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution
encoder_depth (int) – A number of stages used in encoder in range [3, 5]. two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features. Each stage generate features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5
encoder_weights (str | None) – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)
decoder_channels (Sequence[int]) – List of integers which specify in_channels parameter for convolutions used in decoder. Length of the list should be the same as encoder_depth
decoder_use_batchnorm (bool | str | dict[str, Any]) –
Specifies normalization between Conv2D and activation. Accepts the following types:
- True: Defaults to “batchnorm”.
- False: No normalization (nn.Identity).
- str: Specifies normalization type using default parameters. Available values: “batchnorm”, “identity”, “layernorm”, “instancenorm”, “inplace”.
- dict: Fully customizable normalization settings. Structure: `python {"type": <norm_type>, **kwargs} ` where norm_name corresponds to normalization type (see above), and kwargs are passed directly to the normalization layer as defined in PyTorch documentation.
  
  Example: `python decoder_use_norm={"type": "layernorm", "eps": 1e-2} `
decoder_attention_type (str | None) – Attention module used in decoder of the model. Available options are None and scse. SCSE paper https://arxiv.org/abs/1808.08127
in_channels (int) – A number of input channels for the model, default is 3 (RGB images)
classes (int) – A number of classes for output mask (or you can think as a number of channels of output mask)
activation (str | collections.abc.Callable[[torch.Tensor], torch.Tensor] | None) – An activation function to apply after the final convolution n layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

forward(x)[source]¶

Forward pass of the model.

Parameters:: x (Tensor) – input images of shape (b, t, c, h, w)
Returns:: predicted change masks of size (b, classes, h, w)
Return type:: Tensor

class torchgeo.models.FCSiamDiff(*args, **kwargs)[source]¶

Bases: Unet

Fully-convolutional Siamese Difference (FC-Siam-diff).

If you use this model in your research, please cite the following paper:

https://doi.org/10.1109/ICIP.2018.8451652

__init__(*args, **kwargs)[source]¶

Initialize a new FCSiamConc model.

Parameters:

*args (Any) – Additional arguments passed to Unet
**kwargs (Any) – Additional keyword arguments passed to Unet

forward(x)[source]¶

Forward pass of the model.

Parameters:: x (Tensor) – input images of shape (b, t, c, h, w)
Returns:: predicted change masks of size (b, classes, h, w)
Return type:: Tensor

L-TAE¶

class torchgeo.models.LTAE(in_channels=128, n_head=16, d_k=8, n_neurons=(256, 128), dropout=0.2, d_model=256, T=1000, len_max_seq=24, positions=None)[source]¶

Bases: Module

Lightweight Temporal Attention Encoder (L-TAE).

This model implements a lightweight temporal attention encoder that processes time series data using a multi-head attention mechanism. It is designed to efficiently encode temporal sequences into fixed-length embeddings.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2007.00586

New in version 0.8.

__init__(in_channels=128, n_head=16, d_k=8, n_neurons=(256, 128), dropout=0.2, d_model=256, T=1000, len_max_seq=24, positions=None)[source]¶

Sequence-to-embedding encoder.

Parameters:

in_channels (int) – Number of channels of the input embeddings
n_head (int) – Number of attention heads
d_k (int) – Dimension of the key and query vectors
n_neurons (Sequence[int]) – Defines the dimensions of the successive feature spaces of the MLP that processes the concatenated outputs of the attention heads
dropout (float) – dropout
T (int) – Period to use for the positional encoding
len_max_seq (int) – Maximum sequence length, used to pre-compute the positional encoding table
positions (collections.abc.Sequence[int] | None) – List of temporal positions to use instead of position in the sequence
d_model (int | None) – If specified, the input tensors will first processed by a fully connected layer to project them into a feature space of dimension d_model

forward(x)[source]¶

Forward pass of the model.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, seq_len, in_channels)
Returns:: Output tensor of shape (batch_size, n_neurons[-1])
Return type:: Tensor

MOSAIKS¶

class torchgeo.models.MOSAIKS(dataset, in_channels=3, features=4096, kernel_size=4, bias=-1.0, seed=None)[source]¶

Bases: RCF

MOSAIKS RCF model with the recommended parameters defined in the paper.

If you use this model in your research, please cite the following paper:

https://www.nature.com/articles/s41467-021-24638-z

Note

This Module is not trainable. It is only used as a feature extractor.

New in version 0.8.

__init__(dataset, in_channels=3, features=4096, kernel_size=4, bias=-1.0, seed=None)[source]¶

Initializes the MOSAIKS model.

Parameters:

dataset (NonGeoDataset) – a NonGeoDataset to sample from
in_channels (int) – number of input channels
features (int) – number of features to compute, must be divisible by 2
kernel_size (int) – size of the kernel used to compute the RCFs
bias (float) – bias of the convolutional layer
seed (int | None) – random seed used to initialize the convolutional layer

class torchgeo.models.RCF(in_channels=4, features=16, kernel_size=3, bias=-1.0, seed=None, mode='gaussian', dataset=None)[source]¶

Bases: Module

This model extracts random convolutional features (RCFs) from its input.

RCFs are used in the Multi-task Observation using Satellite Imagery & Kitchen Sinks (MOSAIKS) method proposed in “A generalizable and accessible approach to machine learning with global satellite imagery”.

This class can operate in two modes, “gaussian” and “empirical”. In “gaussian” mode, the filters will be sampled from a Gaussian distribution, while in “empirical” mode, the filters will be sampled from a dataset.

If you use this model in your research, please cite the following paper:

https://www.nature.com/articles/s41467-021-24638-z

Note

This Module is not trainable. It is only used as a feature extractor.

__init__(in_channels=4, features=16, kernel_size=3, bias=-1.0, seed=None, mode='gaussian', dataset=None)[source]¶

Initializes the RCF model.

This is a static model that serves to extract fixed length feature vectors from input patches.

New in version 0.2: The seed parameter.

New in version 0.5: The mode and dataset parameters.

Parameters:

in_channels (int) – number of input channels
features (int) – number of features to compute, must be divisible by 2
kernel_size (int) – size of the kernel used to compute the RCFs
bias (float) – bias of the convolutional layer
seed (int | None) – random seed used to initialize the convolutional layer
mode (str) – “empirical” or “gaussian”
dataset (torchgeo.datasets.geo.NonGeoDataset | None) – a NonGeoDataset to sample from when mode is “empirical”

forward(x)[source]¶

Forward pass of the RCF model.

Parameters:: x (Tensor) – a tensor with shape (B, C, H, W)
Returns:: a tensor of size (B, self.num_features)
Return type:: Tensor

ResNet¶

torchgeo.models.resnet18(weights=None, *args, **kwargs)[source]¶

ResNet-18 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/pdf/1512.03385

New in version 0.4.

Parameters:

weights (torchgeo.models.resnet.ResNet18_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model()
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model()

Returns:

A ResNet-18 model.

Return type:

torchgeo.models.resnet50(weights=None, *args, **kwargs)[source]¶

ResNet-50 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/pdf/1512.03385

Changed in version 0.4: Switched to multi-weight support API.

Parameters:

weights (torchgeo.models.resnet.ResNet50_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A ResNet-50 model.

Return type:

torchgeo.models.resnet152(weights=None, *args, **kwargs)[source]¶

ResNet-152 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/pdf/1512.03385

New in version 0.6.

Parameters:

weights (torchgeo.models.resnet.ResNet152_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A ResNet-152 model.

Return type:

class torchgeo.models.ResNet18_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

ResNet-18 weights.

For timm resnet18 implementation.

New in version 0.4.

class torchgeo.models.ResNet50_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

ResNet-50 weights.

For timm resnet50 implementation.

New in version 0.4.

class torchgeo.models.ResNet152_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

ResNet-152 weights.

For timm resnet152 implementation.

New in version 0.6.

Scale-MAE¶

torchgeo.models.ScaleMAE(res=1.0, *args, **kwargs)[source]¶

Custom Vision Transformer for Scale-MAE with GSD positional embeddings.

This is a ViT encoder only model of the Scale-MAE architecture with GSD positional embeddings.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2212.14532

class torchgeo.models.ScaleMAELarge16_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Scale-MAE Large patch size 16 weights.

New in version 0.6.

Swin Transformer¶

torchgeo.models.swin_v2_t(weights=None, *args, **kwargs)[source]¶

Swin Transformer v2 tiny model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2111.09883

New in version 0.6.

Parameters:

weights (torchgeo.models.swin.Swin_V2_T_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to torchvision.models.swin_transformer.SwinTransformer.
**kwargs (Any) – Additional keyword arguments to pass to torchvision.models.swin_transformer.SwinTransformer.

Returns:

A Swin Transformer Tiny model.

Return type:

SwinTransformer

torchgeo.models.swin_v2_b(weights=None, *args, **kwargs)[source]¶

Swin Transformer v2 base model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2111.09883

New in version 0.6.

Parameters:

weights (torchgeo.models.swin.Swin_V2_B_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to torchvision.models.swin_transformer.SwinTransformer.
**kwargs (Any) – Additional keyword arguments to pass to torchvision.models.swin_transformer.SwinTransformer.

Returns:

A Swin Transformer Base model.

Return type:

SwinTransformer

class torchgeo.models.Swin_V2_T_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Swin Transformer v2 Tiny weights.

For torchvision swin_v2_t implementation.

New in version 0.6.

class torchgeo.models.Swin_V2_B_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Swin Transformer v2 Base weights.

For torchvision swin_v2_b implementation.

New in version 0.6.

Panopticon¶

class torchgeo.models.Panopticon(attn_dim=2304, embed_dim=768, img_size=224)[source]¶

Bases: Module

Panopticon ViT-Base Foundation Model.

New in version 0.7.

__init__(attn_dim=2304, embed_dim=768, img_size=224)[source]¶

Initialize a Panopticon model.

Parameters:

attn_dim (int) – Dimension of channel attention.
embed_dim (int) – Embedding dimension of backbone.
img_size (int) – Image size. Panopticon can be initizialized with any image size but image size is fixed after initialization. For optimal performance, we recommend to use the same image size as used during training. For the published weights, this is 224.

forward(x_dict)[source]¶

Forward pass of the model including forward pass through the head.

Parameters:

x_dict (dict[str, torch.Tensor]) –

Dictionary with keys:

imgs: Input tensor of shape (B, C, H, W).
chn_ids: Tensor of shape (B,C) encoding the spectral information of each channel. For optical channels, this is the wavelength in nanometers. For SAR channels, this is a negative integer as outlined in https://github.com/Panopticon-FM/panopticon/blob/main/dinov2/configs/data/satellites/sentinel1.yaml

Returns:

Embeddings.

Return type:

torchgeo.models.panopticon_vitb14(weights=None, img_size=224, **kwargs)[source]¶

Panopticon ViT-Base model.

Panopticon can handle arbitrary optical channel and SAR combinations. It can also be initialized with any image size where the image size is fixed after initialization. However, we recommend to set 224 in alignment with the pretraining. For more information on how to use the model, see https://github.com/Panopticon-FM/panopticon?tab=readme-ov-file#using-panopticon.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2503.10845

New in version 0.7.

Returns:: The Panopticon ViT-Base model with the published weights loaded.
Return type:: Panopticon

class torchgeo.models.Panopticon_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Panopticon weights.

New in version 0.7.

U-Net¶

torchgeo.models.unet(weights=None, classes=None, *args, **kwargs)[source]¶

U-Net model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/1505.04597

New in version 0.8.

Parameters:

weights (torchgeo.models.unet.Unet_Weights | None) – Pre-trained model weights to use.
classes (int | None) – Number of output classes. If not specified, the number of classes will be inferred from the weights.
*args (Any) – Additional arguments to pass to segmentation_models_pytorch.create_model
**kwargs (Any) – Additional keyword arguments to pass to segmentation_models_pytorch.create_model

Returns:

A U-Net model.

Return type:

Unet

class torchgeo.models.Unet_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

U-Net weights.

For smp Unet implementation.

New in version 0.8.

Vision Transformer¶

torchgeo.models.vit_small_patch16_224(weights=None, *args, **kwargs)[source]¶

Vision Transform (ViT) small patch size 16 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2010.11929

New in version 0.4.

Parameters:

weights (torchgeo.models.vit.ViTSmall16_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A ViT small 16 model.

Return type:

torchgeo.models.vit_base_patch16_224(weights=None, *args, **kwargs)[source]¶

Vision Transform (ViT) base patch size 16 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2010.11929

New in version 0.7.

Parameters:

weights (torchgeo.models.vit.ViTBase16_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A ViT base 16 model.

Return type:

torchgeo.models.vit_large_patch16_224(weights=None, *args, **kwargs)[source]¶

Vision Transform (ViT) large patch size 16 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2010.11929

New in version 0.7.

Parameters:

weights (torchgeo.models.vit.ViTLarge16_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A ViT large 16 model.

Return type:

torchgeo.models.vit_huge_patch14_224(weights=None, *args, **kwargs)[source]¶

Vision Transform (ViT) huge patch size 14 model.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2010.11929

New in version 0.7.

Parameters:

weights (torchgeo.models.vit.ViTHuge14_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A ViT huge 14 model.

Return type:

torchgeo.models.vit_small_patch14_dinov2(weights=None, *args, **kwargs)[source]¶

Vision Transform (ViT) small patch size 14 model for DINOv2.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2304.07193

New in version 0.7.

Parameters:

weights (torchgeo.models.vit.ViTSmall14_DINOv2_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A DINOv2 ViT small 14 model.

Return type:

torchgeo.models.vit_base_patch14_dinov2(weights=None, *args, **kwargs)[source]¶

Vision Transform (ViT) base patch size 14 model for DINOv2.

If you use this model in your research, please cite the following paper:

https://arxiv.org/abs/2304.07193

New in version 0.7.

Parameters:

weights (torchgeo.models.vit.ViTBase14_DINOv2_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to timm.create_model().
**kwargs (Any) – Additional keyword arguments to pass to timm.create_model().

Returns:

A DINOv2 ViT base 14 model.

Return type:

class torchgeo.models.ViTSmall16_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Vision Transformer Small Patch Size 16 weights.

For timm vit_small_patch16_224 implementation.

New in version 0.4.

class torchgeo.models.ViTBase16_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Vision Transformer Base Patch Size 16 weights.

For timm vit_base_patch16_224 implementation.

New in version 0.7.

class torchgeo.models.ViTLarge16_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Vision Transformer Large Patch Size 16 weights.

For timm vit_large_patch16_224 implementation.

New in version 0.7.

class torchgeo.models.ViTHuge14_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Vision Transformer Huge Patch Size 14 weights.

For timm vit_huge_patch14_224 implementation.

New in version 0.7.

class torchgeo.models.ViTSmall14_DINOv2_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Vision Transformer Small Patch Size 14 (DINOv2) weights.

For timm vit_small_patch14_dinov2 implementation.

New in version 0.7.

class torchgeo.models.ViTBase14_DINOv2_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

Vision Transformer Base Patch Size 14 (DINOv2) weights.

For timm vit_base_patch14_dinov2 implementation.

New in version 0.7.

YOLO¶

torchgeo.models.yolo(weights=None, *args, **kwargs)[source]¶

YOLO model.

New in version 0.8.

Parameters:

weights (torchgeo.models.yolo.YOLO_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to ultralytics.YOLO
**kwargs (Any) – Additional keyword arguments to pass to ultralytics.YOLO

Returns:

An ultralytics.YOLO model.

Raises:

DependencyNotFoundError – If ultralytics is not installed.

Return type:

class torchgeo.models.YOLO_Weights(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: WeightsEnum

YOLO weights.

For ultralytics YOLO implementation.

New in version 0.8.

Utility Functions¶

torchgeo.models.get_model(name, *args, **kwargs)[source]¶

Get an instantiated model from its name.

New in version 0.4.

Parameters:

name (str) – Name of the model.
*args (Any) – Additional arguments passed to the model builder method.
**kwargs (Any) – Additional keyword arguments passed to the model builder method.

Returns:

An instantiated model.

Return type: