Using automated machine learning for the upscaling of gross primary productivity

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Using automated machine learning for the upscaling of gross primary productivity. / Gaber, Max; Kang, Yanghui; Schurgers, Guy; Keenan, Trevor.

In: Biogeosciences, Vol. 21, No. 10, 2024, p. 2447-2472.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Gaber, M, Kang, Y, Schurgers, G & Keenan, T 2024, 'Using automated machine learning for the upscaling of gross primary productivity', Biogeosciences, vol. 21, no. 10, pp. 2447-2472. https://doi.org/10.5194/bg-21-2447-2024

APA

Gaber, M., Kang, Y., Schurgers, G., & Keenan, T. (2024). Using automated machine learning for the upscaling of gross primary productivity. Biogeosciences, 21(10), 2447-2472. https://doi.org/10.5194/bg-21-2447-2024

Vancouver

Gaber M, Kang Y, Schurgers G, Keenan T. Using automated machine learning for the upscaling of gross primary productivity. Biogeosciences. 2024;21(10):2447-2472. https://doi.org/10.5194/bg-21-2447-2024

Author

Gaber, Max ; Kang, Yanghui ; Schurgers, Guy ; Keenan, Trevor. / Using automated machine learning for the upscaling of gross primary productivity. In: Biogeosciences. 2024 ; Vol. 21, No. 10. pp. 2447-2472.

Bibtex

@article{4573be4878054d68aceb367b844968b1,
title = "Using automated machine learning for the upscaling of gross primary productivity",
abstract = "Estimating gross primary productivity (GPP) over space and time is fundamental for understanding the response of the terrestrial biosphere to climate change. Eddy covariance flux towers provide in situ estimates of GPP at the ecosystem scale, but their sparse geographical distribution limits larger-scale inference. Machine learning (ML) techniques have been used to address this problem by extrapolating local GPP measurements over space using satellite remote sensing data. However, the accuracy of the regression model can be affected by uncertainties introduced by model selection, parameterization, and choice of explanatory features, among others. Recent advances in automated ML (AutoML) provide a novel automated way to select and synthesize different ML models. In this work, we explore the potential of AutoML by training three major AutoML frameworks on eddy covariance measurements of GPP at 243 globally distributed sites. We compared their ability to predict GPP and its spatial and temporal variability based on different sets of remote sensing explanatory variables. Explanatory variables from only Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance data and photosynthetically active radiation explained over 70 % of the monthly variability in GPP, while satellite-derived proxies for canopy structure, photosynthetic activity, environmental stressors, and meteorological variables from reanalysis (ERA5-Land) further improved the frameworks{\textquoteright} predictive ability. We found that the AutoML framework Auto-sklearn consistently outperformed other AutoML frameworks as well as a classical random forest regressor in predicting GPP but with small performance differences, reaching an r2 of up to 0.75. We deployed the best-performing framework to generate global wall-to-wall maps highlighting GPP patterns in good agreement with satellite-derived reference data. This research benchmarks the application of AutoML in GPP estimation and assesses its potential and limitations in quantifying global photosynthetic activity.",
author = "Max Gaber and Yanghui Kang and Guy Schurgers and Trevor Keenan",
note = "Publisher Copyright: {\textcopyright} Author(s) 2024.",
year = "2024",
doi = "10.5194/bg-21-2447-2024",
language = "English",
volume = "21",
pages = "2447--2472",
journal = "Biogeosciences",
issn = "1726-4170",
publisher = "Copernicus GmbH",
number = "10",

}

RIS

TY - JOUR

T1 - Using automated machine learning for the upscaling of gross primary productivity

AU - Gaber, Max

AU - Kang, Yanghui

AU - Schurgers, Guy

AU - Keenan, Trevor

N1 - Publisher Copyright: © Author(s) 2024.

PY - 2024

Y1 - 2024

N2 - Estimating gross primary productivity (GPP) over space and time is fundamental for understanding the response of the terrestrial biosphere to climate change. Eddy covariance flux towers provide in situ estimates of GPP at the ecosystem scale, but their sparse geographical distribution limits larger-scale inference. Machine learning (ML) techniques have been used to address this problem by extrapolating local GPP measurements over space using satellite remote sensing data. However, the accuracy of the regression model can be affected by uncertainties introduced by model selection, parameterization, and choice of explanatory features, among others. Recent advances in automated ML (AutoML) provide a novel automated way to select and synthesize different ML models. In this work, we explore the potential of AutoML by training three major AutoML frameworks on eddy covariance measurements of GPP at 243 globally distributed sites. We compared their ability to predict GPP and its spatial and temporal variability based on different sets of remote sensing explanatory variables. Explanatory variables from only Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance data and photosynthetically active radiation explained over 70 % of the monthly variability in GPP, while satellite-derived proxies for canopy structure, photosynthetic activity, environmental stressors, and meteorological variables from reanalysis (ERA5-Land) further improved the frameworks’ predictive ability. We found that the AutoML framework Auto-sklearn consistently outperformed other AutoML frameworks as well as a classical random forest regressor in predicting GPP but with small performance differences, reaching an r2 of up to 0.75. We deployed the best-performing framework to generate global wall-to-wall maps highlighting GPP patterns in good agreement with satellite-derived reference data. This research benchmarks the application of AutoML in GPP estimation and assesses its potential and limitations in quantifying global photosynthetic activity.

AB - Estimating gross primary productivity (GPP) over space and time is fundamental for understanding the response of the terrestrial biosphere to climate change. Eddy covariance flux towers provide in situ estimates of GPP at the ecosystem scale, but their sparse geographical distribution limits larger-scale inference. Machine learning (ML) techniques have been used to address this problem by extrapolating local GPP measurements over space using satellite remote sensing data. However, the accuracy of the regression model can be affected by uncertainties introduced by model selection, parameterization, and choice of explanatory features, among others. Recent advances in automated ML (AutoML) provide a novel automated way to select and synthesize different ML models. In this work, we explore the potential of AutoML by training three major AutoML frameworks on eddy covariance measurements of GPP at 243 globally distributed sites. We compared their ability to predict GPP and its spatial and temporal variability based on different sets of remote sensing explanatory variables. Explanatory variables from only Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance data and photosynthetically active radiation explained over 70 % of the monthly variability in GPP, while satellite-derived proxies for canopy structure, photosynthetic activity, environmental stressors, and meteorological variables from reanalysis (ERA5-Land) further improved the frameworks’ predictive ability. We found that the AutoML framework Auto-sklearn consistently outperformed other AutoML frameworks as well as a classical random forest regressor in predicting GPP but with small performance differences, reaching an r2 of up to 0.75. We deployed the best-performing framework to generate global wall-to-wall maps highlighting GPP patterns in good agreement with satellite-derived reference data. This research benchmarks the application of AutoML in GPP estimation and assesses its potential and limitations in quantifying global photosynthetic activity.

U2 - 10.5194/bg-21-2447-2024

DO - 10.5194/bg-21-2447-2024

M3 - Journal article

AN - SCOPUS:85194193364

VL - 21

SP - 2447

EP - 2472

JO - Biogeosciences

JF - Biogeosciences

SN - 1726-4170

IS - 10

ER -

ID: 395150451