Gap filling crowdsourced air temperature data in cities using data-driven approaches

Miao He, Zhiwen Luo*, Xiaoxiong Xie, Peng Wang, Haichao Wang, Gabriela Zapata-Lancaster

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)

Abstract

Crowdsourced temperature data from citizen weather stations (CWS) in urban area offer valuable insights into intra-urban temperature distribution but are often challenged by a significant number of missing values. Existing gap-filling methods, typically effective for random gaps and low missing rates, are inadequate for the continuous gaps and high missing rates common in CWS recordings. This study introduces a novel data-driven approach to fill these gaps by modelling relationships between CWS data and official weather station (OWS) records during periods of data availability. We evaluate various feature sets and data-driven algorithms, including Multiple Linear Regression (MLR), Random Forest (RF), and Multilayer Perceptron (MLP), using a complete CWS temperature dataset from July 2018 in London. The MLP-based models, which include features such as preceding and subsequent air temperature along with past solar radiation, demonstrate superior performance across various missing data conditions. In the most challenging case, with a missing rate of 70–80% and continuous gaps, the MLP model achieves a Mean Absolute Error of 0.59 °C, a Root Mean Squared Error of 0.73 °C, and a coefficient of determination (R2) of 0.94. The robustness of the MLP algorithm is further validated across multiple complete CWS datasets from different areas in London. This study offers effective strategies for handling common missing data conditions in CWS datasets and serves as a valuable reference for future machine learning applications in urban climatology.
Original languageEnglish
Article number112593
JournalBuilding and Environment
Volume271
Early online date26 Jan 2025
DOIs
Publication statusE-pub ahead of print - 26 Jan 2025

Fingerprint

Dive into the research topics of 'Gap filling crowdsourced air temperature data in cities using data-driven approaches'. Together they form a unique fingerprint.

Cite this