Automatic classification of boulders: Processes and concerns regarding machine learning workflows

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

  • Signe Irene Kirstejn S Hansen
Coastal areas are vulnerable and easily affected by human activity and climate change. To be able to protect these areas, we need mapping of benthic habi-tats to help identify areas in need of protection. However, these shallow watersare dicult to access by vessel-borne acoustic systems. Shallow water requiressmaller vessels, where data collection is both expensive and time-consuming.Topo-bathymetric LiDAR measurements make it possible to cover large areasfast, and this makes it feasible to repeat the measurements for an area andthereby investigate how the area changes over time. Taking advantage of thesenew possibilities results in frequent and large amounts of data which need tobe analysed. To be able to compare climate-related seabed changes from dif-ferent coastal areas in the world it is important that the data is comparable.The large amount of data, and the requirement of comparability, calls for au-tomated and reproducible data processing and analysis methods.Stone reefs are among the habitats which need protection. In this thesis topo-bathymetric LiDAR data are used to create machine learning workows forboulder detection. The overall aim is to improve the understanding of the pro-cesses and considerations involved, when using machine learning for mappingboulders in coastal environments using point cloud data.Random forest machine learning has been carried out in Matlab in combina-tion with feature selection algorithms.Features derived from intensity and height have been used as input data. Atraining- and test-set has been created from ortho-photos analysis and ground-truthing. The study has resulted in a workow recommending separatedground-truthed training- and test sets including seabed diversity representa-tive for the area, in which the machine learning model should work. Boulderdetection leads to imbalanced data sets, which can be handled by a combi-nation of assigning a weight score to the minority class, and using only asubsample of the majority class. Knowledge about habitats, seabed struc-tures and transition zones between these can lead to a better choice of inputdata. Knowledge about special characteristics of dierent habitats can be usedto distinguish between them. The establishing of a library is suggested withknowledge about characteristics of dierent habitats and the relevant discrim-inators in combination with small ground truthed data sets. To discriminatebetween boulder and vegetation is very dicult. The work on this thesis ledto a conceptual model of the interrelations between the boulder distributionand geomorphology in Rødsand lagoon.
OriginalsprogEngelsk
ForlagDepartment of Geosciences and Natural Resource Management, Faculty of Science, University of Copenhagen
Antal sider140
StatusUdgivet - 2023

ID: 347746158