The true front line for data quality – the place where the battle for data accuracy is often won or lost – is the point of original data collection.
This is the reason why we have been focusing our SurveyCTO efforts on how to help our users collect higher-quality data, and the reason why we’re painfully aware of the so many ways that we and our users could do better.
A recent workshop* hosted by Innovations for Poverty Action (IPA), in New Haven, reminded me of a particularly enduring and thorny challenge: how do we know when we’re doing better or worse? How do we know when we’re collecting higher- or lower-quality data? And how do we know which approaches to data-collection and quality-control produce the highest-quality data?
Recognizing the challenges
As one of the people helping lead the discussion on electronic data collection at the workshop, I suggested that the fundamental challenges to data quality arise from a few key facts:
- Respondents are human, so they have imperfect recall and motivation, they can become tired or annoyed.
- Enumerators are human, so they can make mistakes, quickly fill answers to unasked questions when they’re sure they know the answer, or even just fake interviews.
- Research assistants are human, so they often fail to implement quality-control efforts in a timely manner, they shy away from conflict and don’t confront underperforming staff, and they often operate with chronic shortages of time and prior experience.
Depending on the setting, you can add to that list: field managers are human, principal investigators are human, monitoring and evaluation officers are human, donors are human, etc., etc. The point is: the inevitable shortcomings or imperfections of individuals involved in data-collection efforts present certain systematic challenges.
The good news is that systems – and technology – can help. For example, using digital data-collection techniques can dramatically reduce the complexity of the enumerator’s job, by allowing the digital instrument to manage skip patterns, flag potential problems, and disallow many kinds of mistakes; making frequent follow-up calls or visits can help to reduce recall difficulties for respondents; and building technology that makes implementing best-practice quality control quicker and easier can reduce the burden on RA’s, M&E officers, and even PI’s.
If you face up to the challenges, there are solutions. And where solutions don’t yet exist, we can create them.
A call to action
We’ve had conversations with colleagues for the last few years about how important it is to generate rigorous evidence on what does or doesn’t work in terms of collecting quality field data. And while we’ve been spoiling to take the lead on some serious effort at evaluation, we’ve always been concerned that our objectivity would be called into question (since we’re a technology vendor with arguably vested interest in certain results). Therefore, we’ve been waiting for others to take the lead.
We stand ready to support whoever is ready to step forward and collect rigorous data on this subject. This support can take the form of ideas, supporting technology, and recruitment. On the recruitment front, we would be happy to encourage the many hundreds of organizations and projects that collect data with SurveyCTO to participate in some larger-scale evaluation effort. Not nearly all of our users would have the time or flexibility to participate, but many have a keen interest in data quality and at least some would likely be willing to help.
A few potential research questions include:
- When enumerators know that they might be secretly recorded, what effect does that have on their effort? The quality of the data they collect?
- When enumerators know that they are being recorded, what effect does that have on their effort? The quality of the data they collect?
- What is the effectiveness and cost-effectiveness of traditional in-person back-checking (re-visits) vs. random audio audits?
- What is the best way to leverage audio audits for improving data quality? Personally follow up with enumerators, re-training and disciplining them as appropriate? Having regular “learning sessions” in which audio recordings are publicly played and discussed, so that there is a social pressure (and fear of public embarrassment) that is brought to bear?
These are just a few examples of questions that we find interesting, based what we or our users have seen in the field.
The difficulty of measuring (measurement) outcomes
When it comes to measuring the actual impact of different approaches to measurement or quality control, one core challenge is this: how do you identify the truth?
After all, if you don’t know the truth, how would you know if you were doing better or worse at measuring that truth?
One approach is to run a lab experiment where you test different approaches in a controlled environment, so that you know the truth up-front. It was precisely such a lab experiment, in fact, that converted me into a digital-data-collection believer in 2012 (discussed here in my first blog post). For certain types of comparison, this methodology can work well. But for really definitively knowing what works in the field, you have to deal with this “what is the truth?” issue.
Another approach is to treat the number and severity of “caught mistakes” as the outcome of interest. So, for example, you might compare a random back-checking (re-visit) regime with a system of random audio auditing. If you randomize which interviews are assigned to which quality-control approach and you carefully track the number and nature of issues that are caught, then you can learn something about the relative effectiveness (and cost-effectiveness) of the different approaches.
Of course, you would have to worry a lot about the dynamic effects: if one approach is very good at identifying problems and the team responds to those problems, then the number of identified problems will naturally go down over time (perhaps rapidly). A very effective quality-control technique might look like it does a poor job identifying problems precisely because it is so effective at helping teams to eliminate problems. So, to get a clean measure of problems identified, it would probably be necessary to delay correcting (or even advertising) problems until after enough data has been collected.
Anecdotally, SurveyCTO users do report that audio audits, for example, help them to identify problems – sometimes systemic, hugely-worrisome problems. But obviously, none of us are interested in the problems themselves. We’re interested in the quality of the data. So these dynamic effects that result from teams responding to problems, those are actually quite important. (Identifying the problems to start with, that’s kind of like the first stage.)
It’s a tricky business, and more thinking and more discussion is needed.
We stand ready to support serious efforts at evaluation. Who is ready to step forward and take the lead?
* The IPA/Yale workshop focused on field measurement with the goal of identifying a potential research agenda around measurement methodologies and to generate more systematic evidence about what works and what doesn’t. Participants included a broad range of researchers from IPA, Abdul Latif Jameel Poverty Action Lab (J-PAL), World Bank, Center for Effective Global Action (CEGA), University of Michigan’s Survey Research Center, and other affiliates (like me!), plus at least one donor representative (from Gates Foundation). Read more about the workshop from Berk Ozler of The World Bank and Thoai Ngo and Jessie Pinchoff of IPA.