On SurveyCTO: An open letter

It started as a modest experiment in electronic data collection embedded within a much more ambitious experiment involving microfinance in rural India. Today, SurveyCTO has grown into a mature technology platform used to collect data in hundreds of projects around the world. This inaugural blog post seems like a good time to look briefly back, and then forward to where SurveyCTO is headed.

Since our public launch in April 2013, we’ve tried to keep a laser-like focus on the product and its users. Though we’ve never done much to ourselves promote the product, we’ve been fortunate to grow very quickly. As it turns out, if you focus on keeping your users happy, they themselves promote your product for you. One user turns into three, three turn into nine, and so on. It adds up, and for that we’re extremely grateful to our users.

Along these lines, I’d like to single out EGPAF who were extremely generous in how they featured us in one of their recent blog posts, and JSI who had done the same in one of their earlier blog posts. We need to get better about promoting SurveyCTO ourselves, but until we do we are lucky to have such great users out there, spreading the word.

Why it’s all about data quality

Thanjavur, Tamil Nadu, January 2012. The genesis of SurveyCTO can be traced back to a first mini-experiment in data quality. We were seeing lots of incorrect household ID numbers in data that we were collecting; each of these required a lengthy investigation and cost us an inordinate amount of time. And that was after already enduring repeated, madness-inducing delays and cost overruns to develop and debug sophisticated data-entry software, complete with double-entry and supervisor resolution of all entry conflicts. That pain, delay, and expense was meant to keep errors to a minimum, but we were still seeing so many errors. What was going on?

Being development economists of an empirical mindset, we did the natural thing: we conducted an experiment.

We generated a bunch of data – including names, ages, and ID numbers – for randomized interview scripts. We then took a bunch of our enumerators, sat them down, and had them interview each other, one responding from a randomized script while the other noted the responses. It was all in the office, face-to-face, so all very simple and controlled. A random half used our standard paper-and-pencil method of recording responses, the other half used a no-frills electronic survey form we developed on Open Data Kit (ODK). This was our first experiment with electronic data collection.

What we found shocked us. Simple ID numbers had a 12% error rate when data-entered from the hand-written forms: ones looked like sevens, and so on. The electronically-entered numbers – which I believe were double-entered on the device with validation to ensure that both were the same – were essentially error-free.

Forget about cutting our massive printing costs, allowing real-time monitoring of incoming data, dramatically reducing time-to-data, or any of the other advantages of electronic data collection. This paper-based error rate in ID numbers alone sold us on going electronic.

Since that first experiment I have become far less involved with the day-to-day of that particular project, but I know that they quickly introduced barcode scanning to further improve collection of basic ID numbers, and that they’ve used digital photos and other methods to increase data quality. They’ve even experimented with changing the auto-correct dictionaries on their devices to make entering of Tamil names easier.

Meanwhile, with SurveyCTO, we have pushed to make the benefits of ODK less expensive and more accessible to a wider audience of less-technical users – and the quest to keep raising the bar on data quality has continued ever since. For example:

a colleague at Poverty Action Lab had the brilliant idea to add randomized audio recordings for auditing survey administration, and that has been something of a game-changer for quality-control efforts;
we’ve taken the kinds of high-frequency quality checks that were standard in industry-leading Poverty Action Lab or Innovations for Poverty Action projects, and made them more accessible for those who don’t want to write the same code for every project (or don’t want to write code at all!);
we’ve made it easy to track incoming data in MS Excel, Google Spreadsheets, or MS Word; and
we’ve tried to get steadily better about documenting and encouraging best practices.

Still, the economist in me understands that if you want to see more of something, it’s important to lower the cost of that something. In the case of quality control, we can and should do more to make it easier and cheaper to collect data of the highest possible quality. And so we’re working hard to do just that.

We’re audacious enough to believe not only that we can meaningfully improve the quality of data being used in the world – but that we’re already doing it.

Where we’re headed: the product

In addition to making industry-leading quality-control practices cheaper and easier, we recognize that we need to make a lot of other things easier as well. This has become particularly clear as use of SurveyCTO has expanded from primarily long, complex research surveys to also include shorter, simpler M&E instruments. It can take a bit longer for a new user to get to know SurveyCTO – and, importantly, its spreadsheet-based form definitions – and so we see that some users new to electronic data collection prefer a simpler, more online, more drag-and-drop product interface for form design and management.

One option would be to say, “sure, use those other systems for your simple surveys, then come to us when your needs grow,” and that’s effectively what we’ve tended to say. But this is where our social mission kicks in, because many organizations end up poorly served by choosing those simpler-seeming alternatives.

As their needs grow, they often struggle to live within the constraints of their chosen system because, as it always seems, changing systems would just be too difficult right now.
They often end up spending a lot of money on these other solutions, money that would be better spent on other aspects of their organizational missions.
They often live with bugs, poor-to-nonexistent technical support, and far less in the way of quality-control options.

It’s hard for us to watch all of this, particularly in organizations whose social missions we ourselves support. So our job has become clear: make SurveyCTO as easy to get started as it is to grow. This is perfectly consistent with our founding mission to take technologies like these and make them cheaper and more accessible to a broader audience. It’s just that the “broader audience” has expanded beyond the more-technical researchers whom we targeted initially to now include somewhat-less-technical (and perhaps more on-the-run) M&E users.

Where we’re headed: the business

A key part of our founding mission was to free nonprofit researchers and M&E professionals from time-sucking adventures in open-source technologies and grant-fueled reinventions of the wheel. The idea was simple: if everybody pays just a little bit, they can share an excellent, professionally-supported, indefinitely-sustained technology platform that frees them to focus on their own very important work.

After all, why should this NGO or that Initiative spend tens or hundreds of thousands of dollars, spin up a whole team, and learn how to develop software? Or build up the IT capacity to manage the technical requirements of an open-source solution? When the grant ends, typically so does the effort. It’s outrageously wasteful. Millions of U.S. aid dollars, for example, have been (indirectly) spent in the mobile-data-collection space, the vast majority of it on dead-end custom solutions that don’t sustain. In social-welfare terms, it’s a disaster.

Research organizations have been quicker to embrace the cost-sharing model of a professional solution, but many other nonprofits have unfortunately continued to prefer custom solutions or nominally-free open-source solutions that are quite costly in terms of staff time, technical resources, and consulting fees. And a trend particularly strong in the international-development sector has proven to be a challenge to our model: people seem willing to spend huge amounts on staff and consultants – but essentially nothing on software. There is this notion that software should be free, and it’s condemning many projects and organizations to use buggy, poorly-supported products and/or create their own unsustainable technology.

Despite this challenge, we’ve reached a very important milestone: a whole lot of little monthly subscription payments are now adding up to fully cover running costs, including server hosting, product development, and support. We still have earlier losses to make up (mostly development costs), but we’ve turned an important corner. And, in the process, we’ve proved that this is a model that can work. We’ve built something sustainable.

Still, only a very small fraction of those who could be benefiting from SurveyCTO are benefiting. So we need to get far better at promoting our product, and we need to make it easier and easier to use so that a wider and wider audience can take advantage. And while we always want to remain true to our “ultra lean start-up” roots, we are keenly aware that greater scale will afford us:

more resources with which to improve the product,
more scope to lower prices, and
more opportunities to cross-subsidize use by small NGO’s, under-resourced government agencies, students, and other users with lesser means.

We believe deeply in evidence-based decision-making – in governments, in development programs, in research – and we want to help lots of people collect more and better data. So while we’re off to a great start, we still have our work cut out for us.

Where we’re headed: this blog

For our history to-date, we primarily communicated with our users via support requests: if somebody ran into trouble, we jumped in to help. Then, recently, we began trying to be a bit more proactive, emailing periodic updates in the form of short newsletters to our users. That’s been good for certain things, like announcing new releases, but it’s less good for other types of content.

In this blog, we’ll post information that’s potentially of interest not just to our current users, but to a broader community of researchers and M&E professionals. Planned topics include planning for electronic data collection, special considerations for students, and choosing between electronic and paper-based data collection. We have built up a lot of expertise – and a lot of opinions! – in this data-collection space, so it’s time that we started sharing it more widely.

It also benefits us to keep an open line of communication between us and you. After all, it’s you, our partners, users, prospects, comrades-in-arms, and even competitors, who are on the front lines and have the best ideas about how data-collection can be done better. Our open and effective communication with you will always remain a key to our success as a platform.

A final word of thanks

If you’re already a SurveyCTO user, thank you for helping us to get to where we are today. We’re excited about where we are and we’re excited about where we’re headed – and you’re a big part of both. Thank you.

Chris Robert

You Might Also Like

Real-time project monitoring and visualization

We’re heading to GPSDD’s Data for Development Festival in Bristol, UK

The SurveyCTO Story