SIAM News Blog
Research

Predicting Climate Change With Data-driven Methods

As of 2021, the average global surface temperature had increased by 1.21° Celsius (C) above pre-industrial levels (using the 1850-1900 average as a baseline) [5], and the global mean sea level had risen by 82 millimeters (mm) compared to the 1986-2005 average [1]. These shifts in temperature and sea level are key indicators of climate change. The record-breaking summer of 2022—with temperatures above 40°C in many parts of Europe and the U.S.—highlighted the severity of the situation and the urgent need for action [2]. 

Weather prediction requires precise models. Historical data and models of complex physical processes can help explain the impact of greenhouse gas emissions and the underlying causes of changes to the climate. In our study, which relies on annual historical data, we develop and employ a unified structural equation model (uSEM) to identify the pathways that contribute to increased global mean surface temperature (GMST) and global mean sea level (GMSL) [3, 6].

Identifying the Relationships Between Climate Variables

To explore the causal pathways between climate variables, we consider the following seven factors:

  • GMSL
  • GMST, including the air temperature above sea ice areas
  • Glacier and ice sheet mass balance
  • Arctic sea ice extent
  • Global specific humidity
  • Greenhouse gases: carbon dioxide (CO2), methane (CH4), and nitrous oxide, collectively represented by their total global warming potential (GWP) to avoid multicollinearity
  • Sunspot number (SSN), which indicates solar activity.

First, we use the adaptive lasso within the regularized uSEM approach to conduct a system-wise variable selection for the entire uSEM equation system. We then perform an equation-wise variable selection on each autoregressive distributed lag (ARDL) model of the uSEM system via backward stepwise selection, which is based on the Akaike information criterion. Both methods identify the same model, confirming the robustness of the climate change pathways that uSEM detected.

Figure 1 depicts the model that ARDL and uSEM identified. The significant pathways that uSEM noted are refitted to estimate the path coefficients (red indicates positive coefficients and blue indicates negative coefficients); p-values are shown in parentheses in Figure 1b. This data-driven, explicit equation uSEM model minimizes our reliance on existing scientific knowledge; allows for intuitive, independent inference of climate system status and future trends; and supplements complex physics-driven models, which are often considered black boxes.

<strong>Figure 1.</strong> Unified structural equation model (uSEM) with factors that affect climate pathways. Climate factors include global specific humidity; global warming potential (GWP); sunspot number (SSN); global mean surface temperature (GMST), including air temperate above sea ice areas; Arctic sea ice extent; glacier and ice sheet mass balance; and global mean sea level (GMSL). <strong>1a.</strong> The full hypothesized uSEM with all conceivable directed paths not contradictory to common sense, depicted by the black arrows. <strong>1b.</strong> Confirmed uSEM with significant pathways that were selected in unison via two variable selection methods: the system-wise regularized uSEM approach and equation-wise stepwise variable selection based on each autoregressive distributed lag model. Significant positive and negative pathways are respectively labeled with red and blue arrows; the corresponding path coefficients and p-values (one-sided) are also noted in parentheses. Gray dashed arrows represent insignificant pathways at a significance level of 0.05 (one-sided) Figure courtesy of [6].
Figure 1. Unified structural equation model (uSEM) with factors that affect climate pathways. Climate factors include global specific humidity; global warming potential (GWP); sunspot number (SSN); global mean surface temperature (GMST), including air temperate above sea ice areas; Arctic sea ice extent; glacier and ice sheet mass balance; and global mean sea level (GMSL). 1a. The full hypothesized uSEM with all conceivable directed paths not contradictory to common sense, depicted by the black arrows. 1b. Confirmed uSEM with significant pathways that were selected in unison via two variable selection methods: the system-wise regularized uSEM approach and equation-wise stepwise variable selection based on each autoregressive distributed lag model. Significant positive and negative pathways are respectively labeled with red and blue arrows; the corresponding path coefficients and p-values (one-sided) are also noted in parentheses. Gray dashed arrows represent insignificant pathways at a significance level of 0.05 (one-sided) Figure courtesy of [6].

Figure 1b indicates that heightened greenhouse gas emissions—represented by increased GWP—significantly raise GMST. Conversely, SSN does not significantly impact GMST, while humidity and GMST exhibit a positive interactive relationship. An increase in GMST significantly decreases sea ice coverage and mass, which in turn considerably increases GMSL. Glacier melt directly impacts sea ice melt and reduces sea ice formation, potentially diminishing albedo and enhancing solar absorption to create a positive feedback loop for rising temperatures and further melt. Sea ice plays a crucial role in temperature regulation, albedo, and ocean circulation; its decline affects the global ocean conveyor belt, which regulates the flow of warm, less dense surface water. Furthermore, a higher GMST both directly and indirectly elevates GMSL through sea ice decrease and glacier melt.

Our data-driven model closely aligns with recent scientific discoveries. For example, GWP’s positive impact on GMST supports the well understood greenhouse effect. The mutual positive relationship between humidity and GMST aligns with the water vapor greenhouse effect and water cycle theories, which assert that warmer air retains more water vapor — enhancing evaporation and reducing condensation. The connection between a higher GMST and melting glaciers and ice sheets is consistent with warming temperatures; similarly, the negative consequences of reduced sea ice mass or extent on GMSL reflect this process as well. GMST’s impact on GMSL also coincides with the thermal expansion of oceans, which drives an increase in sea level. Overall, the data-driven pathway analysis corroborates climate science principles and parallels physics-based models.

Forecasting GMST and GMSL

At the 2021 United Nations Conference on Climate Change (COP26), which took place in Glasgow, Scotland, representatives from nearly 200 nations gathered to address climate change, formulate global strategies, and set ambitious targets for emission reduction. The conference ultimately yielded the Glasgow Climate Pact, which aims to limit GMST to no more than 1.5°C above pre-industrial levels. To meet this goal, anthropogenic CO2 emissions must decrease by 45 percent (relative to 2010 levels) by 2030, and anthropogenic CH4 emissions must decrease by 30 percent (relative to 2020 levels).

We can use our data-driven model to forecast GMST and GMSL under both an unrestricted scenario and the COP26 limits. Without restrictions, we anticipate that GMST will increase to 1.97°C above pre-industrial levels by 2050, and 3.28°C above pre-industrial levels by 2100. Using the 20-year mean from 1986-2005 as a baseline, GMSL will increase to 246.72 mm by 2050 and 655.25 mm by 2100. In contrast, with the COP26 constraints, GMST will increase to 1.66°C above pre-industrial levels by 2050, and 1.88°C above pre-industrial levels by 2100. GMSL will increase to 229.67 mm by 2050 and 531.23 mm by 2100. 

Our data-driven model demonstrates that COP26 guidelines are unable to keep warming within 1.5°C to curb the effects of severe climate change. It is therefore crucial to formulate stricter limits on greenhouse gas emission.

Forecasting Regional Mean Sea Level

Regional sea level rise along coastlines is a great concern to both policymakers and the public. For each major coastal region, we use ARDL—a versatile time series regression model—to predict the regional mean sea level based on its previous levels over time, as well as the GMSL at the same and prior time points. Though we studied eight coastal regions in total, we focus here on New York City and Osaka, Japan. 

If the yearly mean sea level in New York City rises to 997.65 mm above its 2021 level by 2100—as predicted with the upper 95 percent forecast interval bound in the unrestricted scenario—then historical hourly data indicate that the daily highest sea level will exceed the current level by two meters for an average of 45.54 days per year. However, this number could reduce to 6.29 days under COP26's greenhouse gas emission restrictions. 

In Osaka, if the yearly mean sea level rises to 1065.64 mm above the 2021 level by 2100, the upper 95 percent forecast interval bound under the unrestricted scenario predicts an average of 3.55 severe flooding days annually, with daily peak sea levels that exceed the current level by two meters. Under the COP26 restrictions, this number could decrease to 0.38 days.

<strong>Figure 2.</strong> Three-dimensional Google Earth simulations that illustrate regional mean sea level increases of zero, one, and two meters from current levels for New York City <strong>(2a-2c)</strong> and Osaka, Japan <strong>(2d-2f)</strong>. As a source of comparison, our analysis indicates that New York City’s regional mean sea level will rise to 680.20 millimeters (mm) above the current level under the United Nations Conference on Climate Change (COP26) restrictions, 867.79 mm above the current level with no restrictions, and 1269.97 mm above the current level in the Shared Socioeconomic Pathways SSP5-8.5 greenhouse gas emissions scenario [4]. Osaka’s regional mean sea level will respectively rise to 739.75 mm above the current level under the COP26 restrictions, 939.49 mm above the current level with no restrictions, and 1366.68 mm above the current level in the SSP5-8.5 scenario. These projections underscore the urgent need for drastic actions to mitigate future climate change impacts. Figure courtesy of [6].
Figure 2. Three-dimensional Google Earth simulations that illustrate regional mean sea level increases of zero, one, and two meters from current levels for New York City (2a-2c) and Osaka, Japan (2d-2f). As a source of comparison, our analysis indicates that New York City’s regional mean sea level will rise to 680.20 millimeters (mm) above the current level under the United Nations Conference on Climate Change (COP26) restrictions, 867.79 mm above the current level with no restrictions, and 1269.97 mm above the current level in the Shared Socioeconomic Pathways SSP5-8.5 greenhouse gas emissions scenario [4]. Osaka’s regional mean sea level will respectively rise to 739.75 mm above the current level under the COP26 restrictions, 939.49 mm above the current level with no restrictions, and 1366.68 mm above the current level in the SSP5-8.5 scenario. These projections underscore the urgent need for drastic actions to mitigate future climate change impacts. Figure courtesy of [6].

In summary, while the COP26 resolutions for the reduction of anthropogenic greenhouse gases offer some mitigation benefits—particularly in controlling the GMST—their impact on global and regional sea level rise is less pronounced. The simulation in Figure 2 uses Google Earth Pro to illustrate potential changes to targeted areas by 2100 under different emission scenarios. Although the depicted sea level rise in these three-dimensional maps may not precisely align with individual greenhouse gas emission forecasts due to limitations in map resolution, they effectively convey the scale of the potential consequences across various emission scenarios. Osaka will likely experience more severe impacts than New York City, mainly due to differences in altitude between the two locations. 

Further details on data-driven methods are available in our corresponding papers [6 , 7]. We are currently exploring additional data-driven approaches to forecast climate change and expect to release related publications in the near future.


Guanchao Tong delivered a minisymposium presentation about this research at the 2024 SIAM Conference on Discrete Mathematics, which took place in Spokane, Wash., last year in conjunction with the 2024 SIAM Annual Meeting.

References
[1] Hamlington, B.D., Bellas-Manley, A., Willis, J.K., Fournier, S., Vinogradova, N., Nerem, R.S., … Kopp, R. (2024). The rate of global sea level rise doubled during the past three decades. Commun. Earth Environ., 5(1), 601.
[2] Heating up [Editorial]. (2022). Nat. Clim. Chang., 12(8), 693.
[3] Kim, J., Zhu, W., Chang, L., Bentler, P.M., & Ernst, T. (2007). Unified structural equation modeling approach for the analysis of multisubject, multivariate functional MRI data. Human Brain Mapp., 28(2), 85-93.
[4] Riahi, K., van Vuuren, D.P., Kriegler, E., Edmonds, J., O'Neill, B.C., Fujimori, S., … Tavoni, M. (2017). The shared socioeconomic pathways and their energy, land use, and greenhouse gas emissions implications: An overview. Glob. Environ. Change, 42, 153-168.
[5] Rohde, R.A., & Hausfather, Z. (2020). The Berkeley Earth land/ocean temperature record. Earth Syst. Sci. Data, 12(4), 3469-3479.
[6] Song, J., Tong, G., Chao, J., Chung, J., Zhang, M., Lin, W., … Zhu, W. (2023). Data driven pathway analysis and forecast of global warming and sea level rise. Sci. Rep., 13(1), 5536.
[7] Tong, G., Chao, J., Ma, W., Zhong, Z., Gupta, G., & Zhu, W. (2025). Leveraging synthetic data to improve regional sea level predictions. Sci. Rep., 15(1), 3546.

About the Authors

Guanchao Tong

Assistant professor, Wenzhou-Kean University and Kean University

Guanchao Tong is an assistant professor in the Department of Mathematical Sciences at Wenzhou-Kean University and Kean University. His research uses mathematical learning models and focuses on data analysis, data mining, and time series modeling, with applications in climate change and biomedicine.

Wenxuan Ma

Undergraduate student, Kean University

Wenxuan Ma is an undergraduate student at Kean University with a strong interest in statistical learning, machine learning, and their application to real-world problems that involve time series analysis.