You are currently viewing New Stata commands to get the most out of sensor data

We have good news for Stata users! We have developed Stata commands to help you make the most of sensor stream data.

Why collect sensor data?

Sensor data, which we introduced in SurveyCTO 2.50, allows users to capture non-personally identifiable information from device sensors that measure light, movement, and sound. Sensor data is powerful because it can be used to better understand the context under which the data collection took place and if it was collected according to plan.

In the case of a household study, for example, survey managers might expect for surveys to be conducted indoors, with slight movements, and moderate sound levels. As long as the form is open, sensor data can capture these levels and help indicate outliers.

Combined with SurveyCTO’s other quality control features, including automated quality checks, geodata, audio audits, and text audits, this tool allows managers to more powerfully and efficiently detect and correct any issues in their data, ensuring the highest quality possible. It’s also a critical part of our efforts to develop more machine-learning technologies to automate quality control.

How can it be analyzed?

There are two different types of sensor data available to SurveyCTO users:

  1. Sensor statistics:  These are individual statistics that summarize sensor data, each providing one statistic per form submission.
  2. Sensor streams: These are streams of sensor data, each providing one .csv file per form submission.

Sensor statistics can be more easily analyzed. While the raw data measuring levels of light, movement, and sound might not be meaningful in terms of its exact levels, it can be easily recorded and tracked with automated quality checks to flag outliers. A review and correction workflow can then be used to examine those flagged submissions more closely.

Sensor streams are more time-consuming to work with because of the volume of data collected. For every submission, a sensor stream records a stream of observations (potentially thousands) and stores it as an additional .csv file attached to the submission. That can easily amount to a lot of data! For this reason, sensor streams are mostly useful to those doing advanced analysis with powerful statistical tools, such as Stata. This documentation topic offers more guidance on the means and limitations of capturing these types of data.

How can Stata commands help?

When using our sensor statistic fields, you must set up thresholds for light, sound, and movement before the data is collected. But suppose the thresholds aren’t accurate? Let’s say, for example, that your upper and lower bounds for light levels don’t properly differentiate between indoor and outdoor light, resulting in a statistic that incorrectly suggests that either all or none of the interviews took place indoors.  You can always test your devices with different settings in varied conditions ahead of time, but wouldn’t it be even better if you could adjust the thresholds after the fact?

Thanks to a collaboration with longtime SurveyCTO user Kristoffer Bjärkefur, we developed Stata commands to give you that freedom! You can now use this scto package to set threshold levels at any point and even after your survey is completed.  You can also segment collected data according to specific time periods and generate statistics for each segment for comparison. This capability can help you to generate valuable insights from sensor data and create more effective processes to improve your data quality.

Refer to this support article for instructions on how to install the package and use the commands. You can also read more here about other things you might not know that SurveyCTO can do.

Please email us and engage with Stata community to share your experience using these commands!