You are currently viewing Lessons on data security from MIT

Last week, we joined “Managing confidential research data,” a class on research design and methods at MIT taught by Dr. Micah Altman. We thought SurveyCTO users would appreciate the following take-aways:

  1. Always consider the sensitivity of data relative to its research utility. If there is more risk than reward, don’t collect it. At the extreme, consider anonymous data collection.
  2. Think holistically about the entire project lifecycle. Consider data security and use during project design, collection, analysis, storage, release, and disposal.
  3. Develop the appropriate safeguards for when you do share data. Benefits to sharing data include increased transparency and opportunities for social value. You may also be compelled to share data for legal reasons, so it’s important to be prepared.
  4. Familiarize yourself with the relevant legal frameworks. Laws in your own country and the country in which you conduct research apply. These laws can dictate what researchers should know, what permissions or consent they must obtain, and when/how/who to notify when certain types of events occur.
  5. Review this simplified guide to protecting your data:
    A) Identify all possible confidential or identifying data
    B) Encrypt your data at rest, on disk, and on devices
    C) Encrypt your data in transit
    D) Maintain “core info 
    hygiene” on your devices: have strong passwords, don’t share passwords, use anti-virus software, revoke access to users who leave your project, and train people who will be handling confidential data
    E) Dispose of data carefully

Finally, when protecting sensitive data, we think in terms of protecting individuals (one should not be able to figure out which individual corresponds to a specific record in the data). But there are other types of identification one has to be careful about:

Indistinguishability: Where anonymized data can’t be linked to an individual but can still be linked to a group or crowd of individuals. The group could be directly endangered or, if the group is small enough, one could still use that information to find and target individuals.

Limited adversarial learning: Anonymized information that can still help improve your likelihood of linking a record to an individual by a certain probability. For example, data that does not directly link to an individual but perhaps makes your guess about who the record links to 20% more accurate.

Thoughtful planning and design can help minimize the risks of handling confidential information. Good systems – and good tools – are critical to data security.

Image: “MIT” by Jorge Cancela is licensed under CC BY 2.0

Chris Robert


Chris is the founder of SurveyCTO. He now serves as Director and Founder Emeritus, supporting Dobility in a variety of part-time capacities. Over the course of Dobility’s first 10 years, he held several positions, including CEO, CTO, and Head of Product.

Before founding Dobility, he was involved in a long-term project to evaluate the impacts of microfinance in South India; developed online curriculum for a program to promote the use of evidence in policy-making in Pakistan and India; and taught statistics and policy analysis at the Harvard Kennedy School. Before that, he co-founded and helped grow an internet technology consultancy and led technology efforts for the top provider of software and hardware for multi-user bulletin board systems (the online systems most prominent before the Internet).