DeepAndes: A Self-Supervised Vision Foundation Model for Multispectral Remote Sensing Imagery of the Andes
/ Authors
/ Abstract
By mapping sites at large scales usingremotely sensed data, archaeologists can generate unique insights into long-term demographic trends, interregional social networks, and human adaptations in the past. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional supervised deep learning methods face challenges in annotating fine-grained archaeological features at scale. In addition, while recent vision foundation models have shown remarkable success in learning large-scale remote sensing data with minimal annotations, most off-the-shelf solutions are designed for RGB images rather than multispectral satellite imagery, such as the eight-band data used in our study. In this article, we introduce DeepAndes, a transformer-based vision foundation model trained on three million multispectral satellite images, specifically tailored for Andean archaeology. DeepAndes incorporates a customized DINOv2 self-supervised learning algorithm optimized for eight-band multispectral imagery, marking the first foundation model designed explicitly for the Andes region. We evaluate its image understanding performance through imbalanced image classification, image instance retrieval, and pixel-level semantic segmentation tasks. Our experiments show that DeepAndes achieves superior F1 scores, mean average precision, and Dice scores in few-shot learning scenarios, significantly outperforming models trained from scratch or pretrained on smaller datasets. This underscores the effectiveness of large-scale self-supervised pretraining in archaeological remote sensing.
Journal: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing