Publikasjonsdetaljer
- Arrangement: (ESRIN, Frascati)
- År: 2025
- Lenke:
To leverage Earth observation (EO) data for large scale analysis, automatic methods is a prerequisite. Since 2012, deep learning (DL) models have brought about a revolutionary change in the analysis of image data and are currently considered state-of-the-art for a broad spectrum of EO tasks. However, a bottleneck with supervised DL models is that they often require a vast amount of labelled data to be trained, and the research community has therefore started to explore alternatives to supervised learning. Inspired by the progress in large language models, foundation models (FM) are now being applied extensively in computer vision.
FMs are trained on a vast volume of unlabeled data and can identify complex patterns due to their large-scale learning capabilities. Typically, an additional head or decoder (small network) is added to the FM, which is trained and adapted to various use-cases by means of a small amount of labelled data. FM have also started to be explored for EO applications, however, current EO-based FMs are limited in terms of handling different modalities with large differences in resolution.
Modern FMs are often based on transformers and are trained using self-supervised learning (SSL). There are several SSL schemes in place, including masked autoencoders (MAE) where we mask part of the input data and learn the model to predict the masked data. This is not useful by itself, but the model learns compressed representation of the data, which can be leveraged in downstream applications. This potentially makes the FM more useful than models trained on a limited set of labeled data.
The Norwegian Computing Center and UiT – The Arctic University of Norway are in collaboration with user partners Romanian National Meteorological Administration, Danish Meteorological Institute, Polar View and Norwegian Water Resources and Energy Directorate developing a multi-modal FM. The FM is designed to process data from the satellites Sentinel-1 SAR, Sentinel-2 and Sentinel-3 OLCI and SLSTR. The FM is based on vison transformers (ViT) but utilizing the same principle as the USat approach to handle the different resolutions between the modalities. The training of the FM is based on the MAE approach, and to ensure that the SSL work efficiently, we have developed a smart sampling scheme during that provides relevant and diverse training data. In addition to SSL, we have also created a learning task in regressing climate variables from the ERA5 dataset. To train the FM, over 20 TB of Sentinel data was collected and processed using the LUMI supercomputer.
The multi-modal FM is demonstrated on the following use-cases: mapping of snow, flood zone mapping, mapping and monitoring of sea ice, iceberg detection, early draught warning and mapping of wetlands. The downstream tasks are implemented using the open-source framework TerraTorch, which is a flexible fine-tuning framework for geospatial FMs. The FM4CS model is one of the first to apply Sentinel-3 data, which makes it attractive for climate applications.