摘要
Understanding the complexities of the growing time-series data collection poses a challenge. To extract valuable insights and knowledge from this data, data mining approaches have been developed to process and analyse it effectively. Dimension reduction (DR) is a commonly employed method for this purpose. Selecting appropriate hyperparameter values and measuring visualisation quality for DR are critical for ensuring the usefulness of the visualisation. To enhance DR further, we propose integrating it with pseudo labels generated by clustering techniques. This paper designs DataTalk Visualisation (DataTalk-V), an algorithm for visualising time series data. DataTalk-V automatically performs clustering and selects hyperparameters for the dimension reduction (DR) method on high-dimensional data, resulting in two-dimensional data. DataTalk-V is built on IoTtalk, an IoT application development platform. DataTalk-V leverages a cost function in Bayesian optimisation to effectively optimise the hyperparameters for DR. We demonstrate that the two-dimensional data reduced by DataTalk-V not only facilitates data visualisation but also enhances the prediction accuracy of the k-nearest neighbours (k-NN) algorithm. We demonstrate that the DR model generated by DataTalk-V is applied to analyse the sensitivity of the features from soil samples and successfully predicts the correlation of these features with their respective machine learning models.
原文 | American English |
---|---|
頁(從 - 到) | 63-73 |
期刊 | International Journal of Sensor Networks |
卷 | 44 |
發行號 | 2 |
DOIs | |
出版狀態 | Published - 2月 2024 |