Low-Cost Air Quality Sensors: Improving Data Quality via Calibration & Standardisation
Air quality data collected by citizen scientists is often criticised for lacking appropriate levels of quality and accuracy, which prevents its uptake by decision makers outside the immediate citizen science community e.g. public authorities, environmental agencies, business managers. To address this issue, COMPAIR is implementing a distant calibration approach that uses cloud services and data triangulation to make citizen science data policy-ready.
Compact low-cost air quality sensors used in citizen science projects have simple measurement principles, and as such suffer from sensitivity to environmental influences (e.g. temperature, humidity, interfering pollutants), as well as drift and sensitivity changes during their deployment.
Default calibration methods such as factory calibration involve measuring sensor response to a known pollutant concentration. Using two pollutant concentrations (e.g. 0 and 100ppm), sensor response is measured and the sensor is calibrated using a linear signal - pollutant concentration response curve. This calibration method may lead to accurate estimation of pollutant concentration under similar conditions as those used during the lab tests, but leads to a decrease in accuracy the longer the sensor is deployed in outdoor conditions with varying environmental influences.
Another method to calibrate sensors involves co-locating all low-cost sensors to be deployed at a reference station with high-end measurement equipment, such as a beta attenuation mass monitor for particulate matter or a chemiluminescence monitor to detect NO2 gas. After a co-location period of typically a few weeks, sensors are placed at the location of interest. Ideally, after the measurement at location the sensors are placed back to the reference station for a second co-location period. The data gathered during the co-location periods from the low-cost sensors (LCS) and reference sensors are used to train a model, usually multilinear model with raw sensor data, temperature, humidity and cross-interfering pollutants of interest as variables.
In the case of particulate matter, additional correction can be performed to account for droplet formation around the pollutant particle of interest which leads to overestimation of particle size, according to Köhler theory. While co-location calibration leads to improved concentration estimations than factory calibration in outdoor conditions, a disadvantage is that moving the sensors can be labour-intensive, and due to the limited sensor lifetime co-location periods decrease the time sensors can be used in the region of interest. Additionally, seasonal changes can significantly affect the calibration validity (Ratingen et al. 2021), so the algorithm trained in one set of environmental conditions has varying performance when applied to sensor data from a different set of conditions.
In COMPAIR, we plan to use a novel sensor calibration method called distant calibration. Firstly, a few co-located sensors and several sensors in the region of interest are deployed simultaneously. This ensures that they go through a similar process of ageing in the field. Pollutants that are generated mostly by human activity such as particulate matter and NO2 have lower and relatively stable concentrations during night times and are fairly homogeneously distributed in an area compared to the daytime.
The distant calibration algorithm, previously validated using several types of particulate matter and NO2 sensors (Hofman et al. 2020; Hofman et al. 2021), filters out out-of-range sensor data (“sanity checks”) and uses a 34-day moving window of nighttime LCS data, combined with ground truth sensor data for training a multilinear model. Ground truth sensors are reference sensors located at a distance to the field sensors, which are selected if they are within a kilometre radius (typically 15km, extended if there are insufficient number of reference stations in close proximity), and have high correlation with the LCS measurements.
Then, the parameters extracted from the training model are applied to calibrate LCS data of the next day in real-time using the cloud environment. This ensures that dynamic changes in the microenvironment around the sensors have a minimum effect on the calibration performance, as the calibration takes place in real-time. Performance of the calibration is evaluated using the reference co-located sensors; which allows corrections for drift and ageing (loss of sensitivity) into account. When performing the calibration on the co-located sensors, reference sensor next to the LCS is excluded as an option as the ground truth sensor. A schematic representation of the calibration algorithm workflow is presented in the figure below.
Schematic representation of the calibration algorithm workflow
This workflow and algorithm was developed and validated in projects performed in Belgium and The Netherlands (Hofman et al. 2020; Hofman et al. 2021). Within COMPAIR, sensors will be deployed in 4 distinct pilot regions with varying microclimates (warm and dry in Sofia/Plovdiv and Athens compared to mild Berlin and Flanders), and varying availability and sparser distribution of reference sensor data. Therefore we plan to validate the distant calibration approach, evaluate its added value and improve it for COMPAIR use cases. Different training approaches, including machine learning will also be evaluated.
A common pitfall in research is data management: when the generated data is locally saved, insufficiently labelled and missing context information which is a barrier against the re-use of data. Data standardisation brings sensor data into a common, consistent format that enables deeper user interaction, processing and analysis. In COMPAIR, we intend to use the non-proprietary Open Geospatial Consortium (OGC) SensorThings API data model for air quality data. Below is a diagram that shows how different components of sensor data and metadata are included in the SensorThings API data model.
Components of sensor data and metadata
In SensorThings API data format, a Thing (such as an IoT device), is linked to a Datastream, which is a collection of Observations grouped by the same ObservedProperty and Sensor. A Thing can have a Location, HistoricalLocation, multiple Sensors and Datastreams. An Observation is an act that produces a result whose value is the estimate of a target of interest (FeatureOfInterest).
As shown above, the data format allows for an explicit specification of sensor location, measured unit, measured property, type of sensor, location, observation and timestamp. Air quality data to be collected by COMPAIR will be converted to the OGC SensorThings API format within the imec calibration platform. Using the data model will allow for easy sharing of data accompanied by the metadata and querying the data using different attributes as needed, regardless of sensor type.
As part of the COMPAIR project, imec is leading the development of air quality sensor calibration algorithms based on a combination of low-cost sensor data, environmental data and reference grade sensor data located at a distance to the low cost sensors. Data from the air quality sensors will be processed in the cloud with information then made available through different COMPAIR visualisation platforms. In terms of data standardisation, we will implement the OGC SensorThings API which is a non-proprietary, platform-independent international standard for interconnecting IoT devices, data and applications over the web. All these measures form an integrated calibration and standardisation strategy which we are going to refine over multiple testing and data collection rounds to ensure that gathered information on air pollution is accurate enough for use by decision makers in policy, industry, academic and research.
Hofman J, E. Nikolaou M, Huu Do T, Qin X, Rodrigo E, Philips W, et al. (2020). Mapping Air Quality in IoT Cities: Cloud Calibration and Air Quality Inference of Sensor Data. IEEE SENSORS 2020 Conference Proceedings. IEEE; 2020.
Hofman J., Nikolaou M., Stroobants C., Weijs S., Shantharam S.P., La Manna V.P. (2021). Distant calibration of low-cost PM and NO2 sensors; evidence from multiple sensor testbeds . Env Sc &Tech.
Ratingen, S. van, Vonk, J., Blokhuis, C., Wesseling, J., Tielemans, E., & Weijers, E. (2021). Seasonal influence on the performance of low-cost no2 sensor calibrations. Sensors, 21(23), 7919. https://doi.org/10.3390/s21237919