A global land cover training dataset from 1984 to 2020

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

A global land cover training dataset from 1984 to 2020. / Stanimirova, Radost; Tarrio, Katelyn; Turlej, Konrad; McAvoy, Kristina; Stonebrook, Sophia; Hu, Kai Ting; Arévalo, Paulo; Bullock, Eric L.; Zhang, Yingtong; Woodcock, Curtis E.; Olofsson, Pontus; Zhu, Zhe; Barber, Christopher P.; Souza, Carlos M.; Chen, Shijuan; Wang, Jonathan A.; Mensah, Foster; Calderón-Loor, Marco; Hadjikakou, Michalis; Bryan, Brett A.; Graesser, Jordan; Beyene, Dereje L.; Mutasha, Brian; Siame, Sylvester; Siampale, Abel; Friedl, Mark A.

In: Scientific Data, Vol. 10, 879, 2023.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Stanimirova, R, Tarrio, K, Turlej, K, McAvoy, K, Stonebrook, S, Hu, KT, Arévalo, P, Bullock, EL, Zhang, Y, Woodcock, CE, Olofsson, P, Zhu, Z, Barber, CP, Souza, CM, Chen, S, Wang, JA, Mensah, F, Calderón-Loor, M, Hadjikakou, M, Bryan, BA, Graesser, J, Beyene, DL, Mutasha, B, Siame, S, Siampale, A & Friedl, MA 2023, 'A global land cover training dataset from 1984 to 2020', Scientific Data, vol. 10, 879. https://doi.org/10.1038/s41597-023-02798-5

APA

Stanimirova, R., Tarrio, K., Turlej, K., McAvoy, K., Stonebrook, S., Hu, K. T., Arévalo, P., Bullock, E. L., Zhang, Y., Woodcock, C. E., Olofsson, P., Zhu, Z., Barber, C. P., Souza, C. M., Chen, S., Wang, J. A., Mensah, F., Calderón-Loor, M., Hadjikakou, M., ... Friedl, M. A. (2023). A global land cover training dataset from 1984 to 2020. Scientific Data, 10, [879]. https://doi.org/10.1038/s41597-023-02798-5

Vancouver

Stanimirova R, Tarrio K, Turlej K, McAvoy K, Stonebrook S, Hu KT et al. A global land cover training dataset from 1984 to 2020. Scientific Data. 2023;10. 879. https://doi.org/10.1038/s41597-023-02798-5

Author

Stanimirova, Radost ; Tarrio, Katelyn ; Turlej, Konrad ; McAvoy, Kristina ; Stonebrook, Sophia ; Hu, Kai Ting ; Arévalo, Paulo ; Bullock, Eric L. ; Zhang, Yingtong ; Woodcock, Curtis E. ; Olofsson, Pontus ; Zhu, Zhe ; Barber, Christopher P. ; Souza, Carlos M. ; Chen, Shijuan ; Wang, Jonathan A. ; Mensah, Foster ; Calderón-Loor, Marco ; Hadjikakou, Michalis ; Bryan, Brett A. ; Graesser, Jordan ; Beyene, Dereje L. ; Mutasha, Brian ; Siame, Sylvester ; Siampale, Abel ; Friedl, Mark A. / A global land cover training dataset from 1984 to 2020. In: Scientific Data. 2023 ; Vol. 10.

Bibtex

@article{a10a65052c7d4cfaa1afd8e18c7282d4,
title = "A global land cover training dataset from 1984 to 2020",
abstract = "State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.",
author = "Radost Stanimirova and Katelyn Tarrio and Konrad Turlej and Kristina McAvoy and Sophia Stonebrook and Hu, {Kai Ting} and Paulo Ar{\'e}valo and Bullock, {Eric L.} and Yingtong Zhang and Woodcock, {Curtis E.} and Pontus Olofsson and Zhe Zhu and Barber, {Christopher P.} and Souza, {Carlos M.} and Shijuan Chen and Wang, {Jonathan A.} and Foster Mensah and Marco Calder{\'o}n-Loor and Michalis Hadjikakou and Bryan, {Brett A.} and Jordan Graesser and Beyene, {Dereje L.} and Brian Mutasha and Sylvester Siame and Abel Siampale and Friedl, {Mark A.}",
note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",
year = "2023",
doi = "10.1038/s41597-023-02798-5",
language = "English",
volume = "10",
journal = "Scientific data",
issn = "2052-4463",
publisher = "nature publishing group",

}

RIS

TY - JOUR

T1 - A global land cover training dataset from 1984 to 2020

AU - Stanimirova, Radost

AU - Tarrio, Katelyn

AU - Turlej, Konrad

AU - McAvoy, Kristina

AU - Stonebrook, Sophia

AU - Hu, Kai Ting

AU - Arévalo, Paulo

AU - Bullock, Eric L.

AU - Zhang, Yingtong

AU - Woodcock, Curtis E.

AU - Olofsson, Pontus

AU - Zhu, Zhe

AU - Barber, Christopher P.

AU - Souza, Carlos M.

AU - Chen, Shijuan

AU - Wang, Jonathan A.

AU - Mensah, Foster

AU - Calderón-Loor, Marco

AU - Hadjikakou, Michalis

AU - Bryan, Brett A.

AU - Graesser, Jordan

AU - Beyene, Dereje L.

AU - Mutasha, Brian

AU - Siame, Sylvester

AU - Siampale, Abel

AU - Friedl, Mark A.

N1 - Publisher Copyright: © 2023, The Author(s).

PY - 2023

Y1 - 2023

N2 - State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.

AB - State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.

U2 - 10.1038/s41597-023-02798-5

DO - 10.1038/s41597-023-02798-5

M3 - Journal article

C2 - 38062043

AN - SCOPUS:85178953466

VL - 10

JO - Scientific data

JF - Scientific data

SN - 2052-4463

M1 - 879

ER -

ID: 380698695