Poster

Exploring Prithvi WXC Adaptation from MERRA2 to ERA5

Abstract

Recent advances in AI-based weather modeling have given rise to large-scale “weather foundation” models pretrained on large amounts of weather data. In this study, we investigate whether the Prithvi weather foundation model pretraining on 40 years of MERRA2 data, characterized by a resolution of 0.5° × 0.625° facilitates adaptation to ERA5, which has a finer resolution of 0.25° × 0.25°. We fine‑tune the Prithvi model on three years of ERA5 data, and investigate if there is any advantage of MERRA pretraining

We compare three architectures during fine‑tuning:

  1. Encoder–Decoder (ED): the original Prithvi backbone.
  2. Low‑rank Attention Side‑Tuning (LAST): augmenting the ED with low‑rank adapters in the attention layers.
  3. Convolutional‑Only (CONV): replacing the decoder with additional convolutional modules and adopting an encoder‑only transformer.

Our experiments reveal:

  • Resolution gap: The twofold increase in spatial resolution and the different variable set in ERA5 introduce significant domain shift, leading to longer training times even for the model pretrained on MERRA2.
  • One‑month vs. three‑year fine‑tuning: Short‑term fine‑tuning accelerates initial adaptation but fails to close the convergence gap introduced by resolution and variable differences. Three‑year training still has considerable computational training cost.
  • Architecture ablations: Among them, the CONV model achieves the fastest convergence and highest predictive skill.

We quantify the ablation effects to isolate the benefit of MERRA2 pretraining versus architecture choice. Our results suggest that, while MERRA2 pretraining provides a useful initialization, substantial fine‑tuning (in terms of both data volume and compute) remains necessary when transferring to higher‑resolution datasets.