I would like to share with you our guiding principles.
As a social enterprise funded through user fees from hundreds of different organizations, we have an unusual degree of freedom: for the most part, we don’t have donors or investors to answer to, and even individual users have limited power to drive our decisions. It’s a great place to be, but it means that we’re fully responsible for charting the path that best serves our overall user base, the sector, and our social mission.
Since the beginning, a set of core principles have guided the development and evolution of SurveyCTO. None of them are entirely static, but we tend not to adjust them very often. They are:
- Data security: private data should be kept as private as possible. Only people who legitimately need to see private data should be able to see it, and that generally doesn’t include our own server administrators or engineers (and it certainly doesn’t include hackers).
- Data quality: data collection should be transparent and carefully monitored. Improved transparency in data collection, coupled with timely and effective monitoring of data quality, is vitally necessary to combat the natural forces that push toward poor data quality.
- Data use: effective visualization and analysis requires other tools designed and built for those purposes. We don’t seek to compete with other products in the database, visualization, or analysis spaces. Rather, we prefer to integrate well with tools like Google Sheets, Airtable, Salesforce, Tableau, Power BI, Stata, SPSS, and R.
- Reliability and support: our users deserve a product that works reliably, and to get the help they need. Our users have more than enough challenges in their lives, so we have to work 24×7 to make sure that SurveyCTO creates as little stress as possible (and, generally, reduces stress instead).
- Empowerment: we should empower those who need to collect data to use our technology directly, with as few intermediaries as possible. The research, M&E, and field teams that use SurveyCTO know best how to design, deploy, and improve their data-collection instruments, so we should empower them with great technology they can themselves use; IT teams and outside consultants should serve as facilitators and supporters, not intermediaries or gatekeepers.
Below, I expand a little bit on what these principles mean in practice, and how they make us different from other vendors in the space.
Christopher Robert, Founder
1. Data security: private data should be kept as private as possible.
Our approach to data security is simple but also radical: even though we host servers and data for our users, we ourselves don’t want to be able to see sensitive data (not our server administrators, not our engineers, not our support team, not our Amazon Web Services cloud providers, not hackers who might someday gain access to our servers or other systems, none of us). After all, most of our users make confidentiality pledges as part of their data collection, and we want to respect those pledges. The first step in respecting them is assuring that a minimum possible number of people have access to view confidential data.
In practice, this means that we strongly encourage users to generate their own 2,048-bit public/private encryption keys, and to use those keys to secure their data. What’s more: we never want to see the private keys, because we never want to, ourselves, be able to decrypt the data. In terms of implementation:
- In SurveyCTO v1.17, we added options to publish data directly to .kml files in SurveyCTO Sync, so that it would be easy to visualize GPS data in Google Earth, even offline (and without having to send sensitive data to cloud providers like Google Maps).
- In SurveyCTO v1.22, we added the ability to exempt individual non-PII fields from private-key encryption, to allow for easier publishing and sharing (e.g., to dashboards) – without having to lower the overall level of protection for more-sensitive data.
- In SurveyCTO v2.20, we added our Data Explorer for easy, in-browser data monitoring and exploration. This included the ability to decrypt data safely in-browser and view GPS and other data online, but in a fundamentally safer way than in other cloud products. Decrypted data is held in-memory only, on the user’s computer, and nobody outside the local computer can see that data. Google Maps provides map tiles on-request, for example, so Google does see evidence of the general areas being viewed, but the actual GPS locations (the actual pins) are never sent to Google or anybody else.
A hallmark of our approach has been to avoid producing features that would tempt users away from protecting their data in the best possible way. For example, years ago we could have very easily added the kind of cloud-based map view that basically every competitor platform has offered; but that would have required that (a) users either don’t use their own encryption keys to secure their GPS data or share the keys with us, and (b) GPS positions be shared on the cloud, with Google and potentially other providers. In fact, there was tremendous pressure to provide these kinds of features, and we lost out on a fair amount of business because we didn’t offer them. The argument was always “well, you can at least offer these features for unencrypted forms, can’t you?” And it was true: we could have. But then we would have been building in a powerful incentive for people to not use encryption and to share their data more widely than seems prudent.
It took us several years to build what we considered to be a safe solution for viewing GPS and other data online, in-browser, but in the end we were able to pull it off. And we think that our technical approach has struck the right balance between safety and convenience.
2. Data quality: data collection should be transparent and carefully monitored.
In the field, when people are collecting data, there are so many things that can go wrong: the data can be entirely made-up, the wrong person or facility might be interviewed or inspected, questions might be asked in the wrong way or not asked at all, answers might be recorded incorrectly, etc., etc. The challenges are many, and because fieldwork is some unfortunate combination of boring and difficult nearly everywhere in the world, similar challenges arise in nearly every setting. And sadly, all natural forces seem to push in the direction of poor data quality. Without visibility into the data-collection process, it’s very difficult to assess the quality of field-collected data, let alone manage for quality.
So a big part of our job has been to constantly improve the visibility. This means not only collecting richer data and meta-data in the field, but also making the process of reviewing and learning from that data as easy as possible. And here, “easy” means fitting into the challenging realities of field project management. So for example, a great many data-collection projects are difficult to get off the ground: they are running behind-schedule, there are still changes to instruments up through training and piloting, and generally there’s not time to think about back-end data systems or QC monitoring before data starts streaming in. Very few teams seem to be able to pre-plan for QC processes, so our job has been to try to require less and less pre-planning. In terms of implementation:
- In SurveyCTO v1.0, we added options for random audio auditing as well as text audits that saved detailed meta-data about the time spent on individual fields or questions. We also introduced support for auto-generated mail merge templates for Microsoft Word, along with features to auto-merge incoming data with those templates in SurveyCTO Sync. This rendered incoming data more readable for those reviewing the data, allowing for a human scrutiny/QC process akin to those traditionally used in paper-based data-collection workflows.
- In SurveyCTO v1.17, we added options to publish data directly to .kml files in SurveyCTO Sync, so that incoming data could be easily reviewed by geographical location.
- In SurveyCTO v1.30, we added the new concept of “speed limits” to flag cases where forms are being completed too quickly, and to automatically trigger audio audits based on speed-limit violations. We also added a suite of “automated quality checks” (otherwise known as “high-frequency checks” or “statistical checks”) to flag potential data-quality issues based on the full distribution of responses so far. So, for example, if one enumerator’s responses are different enough from others’ (statistically speaking), then that enumerator’s responses can be flagged for further review.
- In SurveyCTO v2.20, we added the Data Explorer as the safe, in-browser way to review incoming data – both in aggregate and at the individual level. Importantly, this new interface allows for seamless movement between aggregate views and individual submissions, formats and labels individual submissions for effective review, and includes information and media from automated quality checks, audio audits, text audits, speed limits, and more, all in one place.
- In SurveyCTO v2.40, we added a new review and correction workflow, so that our users can not only catch data-quality problems, but also correct them. It allows for systematic quality-control processes to be more easily put in place, and for key corrections to be made before data is released to dashboards or analysis.
- In SurveyCTO v2.50, we added powerful new sensor meta-data options, like the ability to capture the percentage of interview time that seemed to involve conversation or how much noise, ambient light, or movement there was during the interview. By combining this data with the ability to automatically flag outliers for review, field QC processes can be even more effective.
We believe that the right kind of visibility into the data-collection process is needed for managers to effectively manage their own field teams, but we also believe that it’s vital to correct long-standing problems in the overall market for field data collection. Most organizations outsource data collection to firms that specialize in building and managing field teams, and, for those organizations that do the outsourcing, there has been far too little visibility into the data-collection process itself. Without visibility into data quality, it has been impossible to contract on data quality or enforce any kind of minimum standard. Contracted firms deliver “clean” datasets to their clients, and the clients have very little ability to distinguish when they are or are not receiving accurate, high-quality data. We have steadily sought to provide technology that offers greater visibility, allowing for better client-side monitoring and an expanding range of potential contractual terms.
We’re excited about the progress we’ve made to-date, and we’re even more excited about the potential for continued improvement. For example, we’re actively thinking about how we might use machine learning technologies to build upon traditional methods of quality control and provide powerful new monitoring options to our users.
Whether it’s fancy stuff like machine learning or just getting better at the human-centered design process for presenting the right kinds of information to the right people, we’re looking forward to continuing to innovate in this area of data quality. After all, the quality of the data we collect plays a key role in driving the quality of our decision-making. Low-quality data is not only of questionable value to us, it can even be harmful.
3. Data use: effective visualization and analysis requires other tools designed and built for those purposes.
So this is a tough one, because a lot of our users would like an all-in-one, end-to-end solution, and it’s never easy to say no. It’s also never comfortable to admit that you’re not the best at something. But the fact is that our users use the data they collect in SurveyCTO in a dizzying number of ways, and while the mainstream tech world has left offline, illiterate populations effectively unserved, there are massively-well-funded efforts to provide data management, visualization, and analysis products that can work for our users. It’s not at all obvious how we, as a small social enterprise operating in this space, can effectively compete with platforms and tools like Google Sheets, Airtable, Salesforce, Tableau, Power BI, Stata, SPSS, and R. Moreover, it’s not at all clear that we should even try.
Now, yes, mainstream products can sometimes be expensive, and so we’re sympathetic to those who want all the features of Product X but would rather pay a low, SurveyCTO-style price. But more and more mainstream tech companies are offering compelling nonprofit discounts, and many have CSR initiatives that offer grants for funding nonprofit adoption of best-in-class technology. Fundamentally, these programs seem a better way to get that kind of great technology into nonprofit hands, vs. a small organization like SurveyCTO trying to duplicate all of that technology at a lower price point.
It’s also true, though, that smaller teams with lower technical capacity might have trouble integrating multiple solutions; for them, an all-in-one might be all that is really technically feasible. Here also, we’re sympathetic – but we still don’t think that reproducing best-in-class reporting or analysis functionality would be the right approach. At the very least, the more time and effort we put into that functionality, the more our prices will have to rise… which would then put SurveyCTO out-of-reach for many of the smaller teams we had originally hoped to serve.
What we’d like to do is get better and better about how we integrate with other solutions. We’d like deploying SurveyCTO+ProductX to be as clear and simple as possible, so that less and less technical skill is required. While we don’t think that SurveyCTO will ever be a true “all in one” that operates independently, we’d like setting up an overall system and workflow with SurveyCTO to be as close to “all-in-one easy” as possible. That will require more technical work for our R&D team, but then also better materials (checklists, videos, etc.) to support those who are setting up these systems.
And finally, we know that we can’t avoid all data visualization, analysis, or management. After all, we recognize that our data quality mission (above) fundamentally requires that people be able to monitor and correct the data effectively. This actually requires quite a lot in terms of real-time visualization and analysis. So, on the surface, you’ll see us continue to build in ever-more-sophisticated visualization and analysis, and you might think that it should be easy to expand to include a wider range of visualization and analysis options. But the thing is this: while there are aspects of visualization and analysis for monitoring and data quality that technically overlap with aspects of visualization and analysis for data use, the fact is that we can do a far better job delivering on our data quality mission if we stay focused on those needs as opposed to being dragged down the slippery slope toward the myriad ways that our users might use the data they collect.
At the end of the day, it’s all about where we fit in the world. Our focus is on secure, high-quality data collection. That already includes quite a lot – more than enough to keep us busy. And it is also an area that other companies tend to neglect. So we’d prefer to focus our energies there, and integrate with other solutions for data use and management. That way, at the end of the day, our users get the best overall solution for the best possible price.
4. Reliability and support: our users deserve a product that works reliably, and to get the help they need.
Free open-source software can often be less reliable than professionally-supported software, but even professional software seems to have gotten less and less reliable over time. New versions are pushed onto users constantly, and they too-frequently introduce new problems and headaches. In terms of reliability, standards seem to be falling. And meanwhile, the quality of technical support seems to be falling even faster. At Dobility, we understand how incredibly stressful and complex field data collection can be, and we understand that those engaged in that work need to be able to rely on the technology they use – and be able to get effective help when they need it.
So much of the work we’ve done, from the earliest days of SurveyCTO, concerns reliability and support. When we began building on Open Data Kit’s (ODK’s) strengths, we started by fixing bugs and improving performance for long, complex surveys; we added redundancies to protect against data loss even in the most challenging settings; and we architected our server hosting environment in a way that we knew would prove costly but also expected to be fundamentally safe, reliable, and scalable. Even today, with thousands of teams using SurveyCTO, every subscription has its own back-end database, its own software, and its own server memory space. And every paying user has access to free, expert support 24×7.
In the early days, it was only me, a part-time developer, and a part-time QC person. To answer user support queries in a timely manner, I would wake up before dawn, I would pull my car over mid-drive to peck out a quick response, I would respond from the tops of mountains when on holiday. As the team slowly grew, that level of dedication to responsiveness has continued. And in fact, I’ve had the pleasure of being slapped on the wrist by newer members of our support team when my own responses are not sufficiently quick or helpful; when I jump in on a support query and get something wrong, I get in trouble. And because SurveyCTO has grown so much over the years and is relied upon in such a vast array of challenging settings, it really does require a team approach to continue offering timely, helpful support; most queries still, behind the scenes, involve multiple people collaborating, suggesting, correcting, or, later, critiquing.
Our QC and R&D teams are also pulled into support cases whenever it looks like there might be some kind of product problem. Often it can be difficult to distinguish between software problems and, for example, form-programming problems. So our developers are continuously pulled into cases. It’s a huge time-sink and it distracts attention from new feature development, but we view it as absolutely critical to maintaining a high level of reliability and support. It’s a price we’re willing to pay.
Also, when there is a software problem, we generally drop everything and move heaven and earth as needed to get it fixed. As the CEO, I remain actively involved in helping to prioritize and roll out essentially every fix; and I try to reach out and personally apologize to anybody who’s been inconvenienced by some problem that we allowed to make it through our QC processes. It’s a major focus of mine, and it’s distracted attention from, for example, building up the sales and marketing side of our business. But if you want to provide a reliable, well-supported product, then there is a price you have to pay as an organization. To-date, we’ve been willing to pay that price, and I very much hope for that to remain true even as we continue to grow.
5. Empowerment: we should empower those who need to collect data to use our technology directly, with as few intermediaries as possible.
Back when digital data collection technologies required technical experts to “program” digital forms, there was a lot of friction and expense that stood between the researchers and M&E professionals who knew what they wanted and the technology that could meet their needs. Those who needed digital forms would describe or document their needs, experts would wield the technology on their behalf, and ultimately that technology would be deployed in the field. If problems or potential improvements were discovered in the field, often it was too hard to fix those problems or implement those improvements: there was just too much distance, friction, and expense between those using the technology and those who served as the technical gatekeepers.
So one core goal of SurveyCTO was to make the technology more directly accessible to those who actually need it – those who will actually use it – so that they can be more empowered to design, manage, and revise digital forms as they see fit. We wanted to eliminate layers of intermediaries so that the researchers and M&E teams would be able to wield the technology themselves. In terms of implementation:
- In SurveyCTO v1.0, we introduced a hosted version of ODK that (a) could be automatically launched as-needed by anybody filling out a simple sign-up form, and (b) had an entirely new user interface that simplified common tasks. Everything from learning about the platform to creating new forms to using cold room computers to creating encryption keys to creating field-validation expressions to using Microsoft Word’s mail-merge features for reviewing data became easier.
- In SurveyCTO v1.30, we introduced “automated quality checks” so that those without a high level of statistical training could still benefit from statistical checks that are important for monitoring data quality.
- In SurveyCTO v2.0, we introduced an entirely new user interface, in part because we’d added so much to the original product that the interface was becoming too complex; we needed a new design in order to keep adding new features and flexibility without making the product too hard to use. We also added a web interface for previewing forms, which was an important step in making it easier to develop and test new forms.
- In SurveyCTO v2.10, we added the online, drag-and-drop form designer, in order to make SurveyCTO accessible to a broader range of new users. Even for those users who will ultimately prefer doing a lot of their work in Excel or Google Sheets, we wanted to provide an easier, more-structured way to get started in form design.
- In SurveyCTO v2.20, we added the Data Explorer for being able to monitor and explore incoming data in a flexible and powerful way, without the need for expertise in outside visualization or analysis tools.
- In SurveyCTO v2.30, we added a new “enterprise” feature-set to make managing multiple projects and teams easier.
- In SurveyCTO v2.40, we added the new review and correction workflow, in order to further simplify the process of not only detecting data-quality issues – but also correcting them.
We’re proud of the work we’ve done in empowering users, in taking ODK’s core capabilities and extending them to be accessible by a broader and broader range of users. But, of course, our job here is never done: there is so much more we can do. So, in the coming months and years, we’re excited about bringing in additional UI/UX design talent and further improving our product’s interface and accessibility. We’re going to keep making it more powerful and flexible all the time, and, ideally, we’ll keep making it easier to use at the same time.