AI for seeing the forest — and the trees
IBM and NASA’s family of foundation models have introduced a fundamentally new way of representing scientific data and transferring knowledge across scales and domains. These models are also attracting award-winning attention.
NASA’s fleet of Earth-orbiting satellites send down terabytes of imagery each day, capturing rivers flooding their banks, forests and deserts expanding and retreating, and cities and roads emerging from land formerly covered in vegetation.
It’s too much information for us humans to wrap our heads around, but with today’s powerful AI models, the big picture and the meaningful details are coming into focus all at once. What makes this feat possible is a new algorithmic approach that turns raw data into rich, structured representations of the physical world.
The power of these learned abstractions can be seen in new data-driven models of the ground beneath us, as well as Earth’s oceans, climate, its host star, the Sun, and even the sum total of all scientific knowledge humans have acquired over the ages.
Algorithms have been a constant in IBM’s more than 60 years of collaboration with NASA, from the early guidance computers aboard Saturn rockets to today’s foundation models. Algorithms have shaped how data becomes knowledge at ever-increasing scale. But now a new era in algorithmic innovation is underway, driven by novel computing paradigms and ways of abstracting real-world observations. It could deliver results more enduring than any single dataset or mission.
IBM and NASA’s family of groundbreaking, open-source foundation models were inspired by this approach. By learning general-purpose representations from massive datasets, these algorithmic primitives can encode and transfer knowledge from one task to another and be endlessly reused. NASA recently recognized the engineers and domain experts behind the work at IBM, NASA, and more than a dozen research universities, with a prestigious NASA Group Achievement Award.
"By developing open, transparent AI foundation models with real scientific rigor, we're giving researchers a new way to extract meaning from NASA’s vast data, accelerating analysis, and enabling breakthrough discoveries," said Rahul Ramachandran, a NASA researcher who leads the AI for Science initiative in the Office of the Chief Science Data Officer.
The collaboration began with the release of Prithvi-EO-1.0, and its successor, Prithvi-EO-2.0, which can be applied to mapping the extent of floods, wildfires, and landslides. Then came Prithvi WxC, the first global foundation model for weather and climate prediction, and Surya, the first foundation model for the Sun, which can be used for predicting a variety of space weather that can wreak havoc here on Earth.
IBM and NASA researchers also developed and open-sourced supporting software to make fine-tuning and validating applications built atop these models easier, with the data-processing library TerraTorch, and the GEO-Bench-2 and SuryaBench datasets for evaluating AI models built with observational data.
Since Prithvi-EO-1.0’s debut in 2023, the model family has been collectively downloaded more than 600,000 times and cited in more than 350 studies. NASA has also seen a surge in people using its Harmonized Landsat and Sentinel-2 (HLS) data product, which the Prithvi models were trained on.
Before newer transformer-based models like Prithvi and Surya, traditional machine-learning models required humans to painstakingly label images and other records. It could take weeks to months to bring new capabilities to a model. Foundation models eliminated this work by learning an abstract representation of the raw data that could generalize to new situations.
NASA estimates that the Prithvi models alone have indirectly generated $36 million in economic value by removing the barriers to scientists using NASA’s archival footage and making important new discoveries. Celebrating the models' impact, the American Geophysical Union last month awarded the Prithvi-EO team its Open Science Recognition Prize.
And NASA itself has held up the work as an example of how technological innovation can accelerate discovery across the Earth and planetary sciences. Last week, the Prithvi team flew to the Marshall Space Flight Center to accept their award, which also recognizes the researchers behind INDUS, a family of efficient LLMs tailored for scientific search. To mark the occasion, we decided to zoom in on four applications that IBM and NASA’s models have inspired so far, showing how algorithmic advances can translate into real-world scientific impact.
Predicting Canada’s canola harvest
By late summer each year, large swaths of Canada’s prairies turn bright yellow with flowering rapeseed plants. They’ve been specially bred to produce a healthier alternative to traditional rapeseed oil. Harvested and crushed, their seeds are turned into a type of vegetable oil called canola, which stands for Canadian oil, low acid.
About a third of the canola oil consumed worldwide is produced in the Canadian prairies. Vahab Khosdel, a computer science professor at Manitoba University, has been trying to find ways to improve annual harvest estimates. Better forecasting can help insurance companies set the price of their premiums to more accurately reflect their risk exposure. More generally, AI image analysis can also help them handle claims more efficiently by making losses easier to verify.
Khoshdel recently built an application for predicting regional canola yields by tuning Prithvi-EO-2.0 on high-resolution images of seasonal crop growth. He found that its mid-summer predictions were up to three times more accurate than the leading computer vision models currently used, with a three-acre-per-bushel error rate of 7-8%. He’s currently working with Agi3, a startup that develops risk-management software for agriculture, to commercialize this work.
"Sooner or later, everyone will be using foundation models," he said. "It’s a win-win for both farmers and insurance companies. If they can predict how good a crop will be — or how bad — they can be more transparent about how they set prices and determine coverage policies."
Predicting extreme rainfall
Weather forecasting has improved dramatically in the last 30 years, but predicting the exact timing and intensity of rainfall continues to challenge researchers. Tiny changes in Earth’s atmosphere — things like shifts in temperature, humidity, or wind — can create wildly different outcomes. And though satellites can capture fast-moving storms in detail, integrating this information into conventional weather forecasts has proven difficult.
Take the deadly flash floods that ravaged Texas Hill Country last summer. Four months’ worth of rain fell in just hours as thunderstorms stalled above the Guadalupe River early on July 4, 2025. In minutes, the rising river surpassed the 100-year-flood mark, sweeping away cabins, cars, and homes, killing at least 135 people. The National Weather Service issued a flood watch the afternoon before the floods but predicted only half as much rain as the 15 inches that ultimately fell. Could a data-driven AI model have done better?
Simon Pfreundschuh, an atmospheric scientist at Colorado State University, thinks so. He and his colleagues recently tuned Prithvi-WxC on raw satellite images to see if adding information to the model's MERRA-2 training data could improve its rainfall estimates. Early results look promising.
Their prototype application, called Prithvi-WxC Precip, makes use of sensor data in the microwave and infrared frequencies, which can penetrate clouds and capture the kind of intense convection-driven rain that dropped over central Texas on July 4, 2025.
Using Prithvi-WxC Precip to forecast that storm, Pfreundschuh found that the model predicted rainfall in the right place two to three days in advance, farther out than both a baseline model tuned on MERRA-2 re-analysis data and a state-of-the-art AI forecasting model. Prithvi-WxC Precip also excelled at a short lead time of 12-hours, when many models struggle to see where a storm is headed.
The results suggest that AI-based weather prediction models can be improved by integrating corrected re-analysis data with direct observational data from other satellites. "The satellite observations have a very clear signal of where it’s raining when we initialize the forecast and that appears to improve the early lead-time forecast by more than 30%," he said.
Sizing up wildfires from space
Last year was one of the three warmest years on record, continuing a long-running streak of record-setting heat. Drier conditions in many places are triggering severe wildfires, with fire seasons in the western U.S, Mexico, Brazil, and East Africa now more than a month longer than they were 35 years ago, according to NASA.
Wildfires can ignite with little warning, and spread quickly, meaning longer fire seasons that can increase the chances that people will get hurt. AI-driven response plans aren’t here yet, but they could become a reality if data could be processed in space instead of here on Earth.
"We could see where a fire is moving in minutes instead of hours," said Andrew Patrick Du, a postdoc at the University of Adelaide who works on AI applications at the edge.
Whether you run it on ground or in space, there’s no performance drop
Du is part of a movement to bring AI models directly aboard Earth-orbiting satellites. Through distillation and quantization, he and his colleagues recently compressed Prithvi-EO-2.0 to a fifteenth of its size. Prithvi was now light enough to run on satellite hardware, but another problem remained. Pre-trained on HLS data, this tiny model struggled to interpret images of floods, wildfires, landslides, and clouds, taken by cameras aboard other satellites.
To adapt the compressed model to a new satellite, they fine-tuned it on images captured by their target satellite. They then deployed and executed the model on the IMAGIN-e payload aboard the International Space Station (ISS). The demonstration showed that tiny Prithvi-EO-2.0 could pick out landscape features as capably as the original model. “Whether you run it on ground or in space, there’s no performance drop," he said.
How much carbon can Earth’s forests sock away?
The world’s forests, plants, and soils, together soak up about a third of the carbon emissions humans produce each year. It’s a rough estimate, with important implications for calculating the global carbon budget, which influences climate prediction models and can help motivate nature-based climate solutions like reforestation.
Prithvi has changed the way I work. It allows us to extract information much faster and it’s also easy to use. We are curious to see what more it can do.
A network of flux towers has been set up in tree-tops worldwide to get local measures of productivity, or how much carbon the forests are absorbing through photosynthesis. Yanghui Kang, an assistant professor at Virginia Tech and Prithvi-EO collaborator, studies data streaming from these towers to try and get a more precise estimate of ecosystem productivity.
Today, accounting for carbon globally is highly uncertain because direct observations are too local and intermittent to extend to that scale. AI, however, is upending this assumption. With Srija Chakraborty, a scientist at the Universities Space Research Association, Kang recently used Prithvi-EO-2.0 to knit together flux tower data with low-resolution satellite views of the landscape.
They fine-tuned Prithvi-EO-2.0 on a subset of flux tower data and found that it returned productivity estimates 20% more accurate than traditional AI methods currently in use. Researchers are now extending the model to 200 more sites and using it to assess nature-based climate solutions.
"Prithvi has changed the way I work,” said Kang. “It allows us to extract information much faster and it’s also easy to use. We are curious to see what more it can do."
What’s next
Work by other collaborators has confirmed that Prithvi-EO-2.0 can map permafrost in the Arctic and craters on Mars, places that never figured into its training. It’s a glimpse of what a powerful abstraction can do through encoding patterns broad enough to be transferred to never before-seen worlds.
This is the deeper story of algorithmic innovation. "Earlier algorithms executed instructions that engineers wrote by hand, while foundation models can learn representations from data that capture its underlying structure," said Juan Bernabé-Moreno, director of IBM Research Europe who leads the NASA collaboration.
As astronauts prepare to return to the moon, and NASA lays the groundwork for landing on Mars, the algorithms traveling with them could be instrumental in unlocking new information about these unknown environments.
"For decades, IBM and NASA have worked together to push the boundaries of knowledge and exploration," said IBM Fellow Alessandro Curioni, VP of algorithms and applications at IBM Research. "Algorithmic research has been at the core of our work—quietly, relentlessly expanding what humanity can discover."
Related posts
- Q & AKim Martineau
LLMs have model cards. Now, benchmarks do, too
ReleaseKim MartineauBoost your tools: Introducing ToolOps, the tool lifecycle extension in ALTK
Technical noteHimanshu Gupta, Jim Laredo, Neelamadhav Gantayat, Jayachandu Bandlamudi, Prerna Agarwal, Sameep Mehta, Renuka Sindhgatta, Ritwik Chaudhuri, and Rohith VallamIBM Granite tops Stanford’s list as the world’s most transparent model
NewsPeter Hess
