{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Copyright (c) TorchGeo Contributors. All rights reserved.\n", "# Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Custom Trainers\n", "\n", "_Written by: Caleb Robinson_\n", "\n", "In this tutorial, we demonstrate how to extend a TorchGeo [\"trainer class\"](https://torchgeo.readthedocs.io/en/latest/api/trainers.html). In TorchGeo there exist several trainer classes that are pre-made PyTorch Lightning Modules designed to allow for the easy training of models on semantic segmentation, classification, change detection, etc. tasks using TorchGeo's [prebuilt DataModules](https://torchgeo.readthedocs.io/en/latest/api/datamodules.html). While the trainers aim to provide sensible defaults and customization options for common tasks, they will not be able to cover all situations (e.g., researchers will likely want to implement and use their own architectures, loss functions, optimizers, and/or metrics in the training routine). If you run into such a situation, then you can simply extend the trainer class you are interested in, and write custom logic to override the default functionality.\n", "\n", "This tutorial shows how to do exactly this to customize a learning rate schedule, logging, and model checkpointing for a semantic segmentation task using the [LandCover.ai](https://landcover.ai.linuxpolska.com/) dataset.\n", "\n", "It's recommended to run this notebook on Google Colab if you don't have your own GPU. Click the \"Open in Colab\" button above to get started." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "As always, we install TorchGeo." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install torchgeo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports\n", "\n", "Next, we import TorchGeo and any other libraries we need." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections.abc import Sequence\n", "from typing import Any\n", "\n", "import lightning\n", "import lightning.pytorch as pl\n", "from lightning.pytorch.callbacks import ModelCheckpoint\n", "from lightning.pytorch.callbacks.callback import Callback\n", "from torch.optim import AdamW\n", "from torch.optim.lr_scheduler import CosineAnnealingLR\n", "from torchmetrics import MetricCollection\n", "from torchmetrics.classification import (\n", " Accuracy,\n", " FBetaScore,\n", " JaccardIndex,\n", " Precision,\n", " Recall,\n", ")\n", "\n", "from torchgeo.datamodules import LandCoverAI100DataModule\n", "from torchgeo.trainers import SemanticSegmentationTask" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "## Custom SemanticSegmentationTask\n", "\n", "Now, we create a `CustomSemanticSegmentationTask` class that inhierits from `SemanticSegmentationTask` and that overrides a few methods:\n", "- `__init__`: We add two new parameters `tmax` and `eta_min` to control the learning rate scheduler\n", "- `configure_optimizers`: We use the `CosineAnnealingLR` learning rate scheduler instead of the default `ReduceLROnPlateau`\n", "- `configure_metrics`: We add a \"MeanIoU\" metric (what we will use to evaluate the model's performance) and a variety of other classification metrics\n", "- `configure_callbacks`: We demonstrate how to stack `ModelCheckpoint` callbacks to save the best checkpoint as well as periodic checkpoints\n", "- `on_train_epoch_start`: We log the learning rate at the start of each epoch so we can easily see how it decays over a training run\n", "\n", "Overall these demonstrate how to customize the training routine to investigate specific research questions (e.g., the effect of the scheduler on test performance)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class CustomSemanticSegmentationTask(SemanticSegmentationTask):\n", " # any keywords we add here between *args and **kwargs will be found in self.hparams\n", " def __init__(\n", " self, *args: Any, tmax: int = 50, eta_min: float = 1e-6, **kwargs: Any\n", " ) -> None:\n", " super().__init__(*args, **kwargs) # pass args and kwargs to the parent class\n", "\n", " def configure_optimizers(\n", " self,\n", " ) -> 'lightning.pytorch.utilities.types.OptimizerLRSchedulerConfig':\n", " \"\"\"Initialize the optimizer and learning rate scheduler.\n", "\n", " Returns:\n", " Optimizer and learning rate scheduler.\n", " \"\"\"\n", " tmax: int = self.hparams['tmax']\n", " eta_min: float = self.hparams['eta_min']\n", "\n", " optimizer = AdamW(self.parameters(), lr=self.hparams['lr'])\n", " scheduler = CosineAnnealingLR(optimizer, T_max=tmax, eta_min=eta_min)\n", " return {\n", " 'optimizer': optimizer,\n", " 'lr_scheduler': {'scheduler': scheduler, 'monitor': self.monitor},\n", " }\n", "\n", " def configure_metrics(self) -> None:\n", " \"\"\"Initialize the performance metrics.\"\"\"\n", " num_classes: int = self.hparams['num_classes']\n", "\n", " self.train_metrics = MetricCollection(\n", " {\n", " 'OverallAccuracy': Accuracy(\n", " task='multiclass', num_classes=num_classes, average='micro'\n", " ),\n", " 'OverallPrecision': Precision(\n", " task='multiclass', num_classes=num_classes, average='micro'\n", " ),\n", " 'OverallRecall': Recall(\n", " task='multiclass', num_classes=num_classes, average='micro'\n", " ),\n", " 'OverallF1Score': FBetaScore(\n", " task='multiclass',\n", " num_classes=num_classes,\n", " beta=1.0,\n", " average='micro',\n", " ),\n", " 'MeanIoU': JaccardIndex(\n", " num_classes=num_classes, task='multiclass', average='macro'\n", " ),\n", " },\n", " prefix='train_',\n", " )\n", " self.val_metrics = self.train_metrics.clone(prefix='val_')\n", " self.test_metrics = self.train_metrics.clone(prefix='test_')\n", "\n", " def configure_callbacks(self) -> Sequence[Callback] | Callback:\n", " \"\"\"Initialize callbacks for saving the best and latest models.\n", "\n", " Returns:\n", " List of callbacks to apply.\n", " \"\"\"\n", " return [\n", " ModelCheckpoint(every_n_epochs=50, save_top_k=-1, save_last=True),\n", " ModelCheckpoint(monitor=self.monitor, mode=self.mode, save_top_k=5),\n", " ]\n", "\n", " def on_train_epoch_start(self) -> None:\n", " \"\"\"Log the learning rate at the start of each training epoch.\"\"\"\n", " optimizers = self.optimizers()\n", " if isinstance(optimizers, list):\n", " lr = optimizers[0].param_groups[0]['lr']\n", " else:\n", " lr = optimizers.param_groups[0]['lr']\n", " self.logger.experiment.add_scalar('lr', lr, self.current_epoch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train model\n", "\n", "The remainder of the turial is straightforward and follows the typical [PyTorch Lightning](https://lightning.ai/) training routine. We instantiate a `DataModule` for the LandCover.AI 100 dataset (a small version of the LandCover.AI dataset for notebook testing), instantiate a `CustomSemanticSegmentationTask` with a U-Net and ResNet-18 backbone, then train the model using a Lightning trainer.\n", "\n", "The following variables can be modified to control training." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbmake": { "mock": { "batch_size": 1, "fast_dev_run": true, "max_epochs": 1, "num_workers": 0 } } }, "outputs": [], "source": [ "batch_size = 32\n", "num_workers = 8\n", "max_epochs = 50\n", "fast_dev_run = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dm = LandCoverAI100DataModule(\n", " root='data', batch_size=batch_size, num_workers=num_workers, download=True\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "task = CustomSemanticSegmentationTask(\n", " model='unet',\n", " backbone='resnet18',\n", " weights=True,\n", " in_channels=3,\n", " num_classes=6,\n", " loss='ce',\n", " lr=1e-3,\n", " tmax=50,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# validate that the task's hyperparameters are as expected\n", "task.hparams" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trainer = pl.Trainer(\n", " fast_dev_run=fast_dev_run, log_every_n_steps=1, min_epochs=1, max_epochs=max_epochs\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trainer.fit(task, dm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test model\n", "\n", "Finally, we test the model (optionally loading from a previously saved checkpoint)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can load directly from a saved checkpoint with `.load_from_checkpoint(...)`\n", "# Note that you can also just call `trainer.test(task, dm)` if you've already trained\n", "# the model in the current notebook session.\n", "\n", "# task = CustomSemanticSegmentationTask.load_from_checkpoint(\n", "# os.path.join('lightning_logs', 'version_0', 'checkpoints', 'epoch=0-step=1.ckpt')\n", "# )\n", "\n", "trainer.test(task, dm)" ] } ], "metadata": { "jupytext": { "formats": "ipynb,py" }, "kernelspec": { "display_name": "geo", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 2 }