Last week, we joined “Managing confidential research data,” a class on research design and methods at MIT taught by Dr. Micah Altman. We thought SurveyCTO users would appreciate the following take-aways:
- Always consider the sensitivity of data relative to its research utility. If there is more risk than reward, don’t collect it. At the extreme, consider anonymous data collection.
- Think holistically about the entire project lifecycle. Consider data security and use during project design, collection, analysis, storage, release, and disposal.
- Develop the appropriate safeguards for when you do share data. Benefits to sharing data include increased transparency and opportunities for social value. You may also be compelled to share data for legal reasons, so it’s important to be prepared.
- Familiarize yourself with the relevant legal frameworks. Laws in your own country and the country in which you conduct research apply. These laws can dictate what researchers should know, what permissions or consent they must obtain, and when/how/who to notify when certain types of events occur.
- Review this simplified guide to protecting your data:
A) Identify all possible confidential or identifying data
B) Encrypt your data at rest, on disk, and on devices
C) Encrypt your data in transit
D) Maintain “core info hygiene” on your devices: have strong passwords, don’t share passwords, use anti-virus software, revoke access to users who leave your project, and train people who will be handling confidential data
E) Dispose of data carefully
Finally, when protecting sensitive data, we think in terms of protecting individuals (one should not be able to figure out which individual corresponds to a specific record in the data). But there are other types of identification one has to be careful about:
Indistinguishability: Where anonymized data can’t be linked to an individual but can still be linked to a group or crowd of individuals. The group could be directly endangered or, if the group is small enough, one could still use that information to find and target individuals.
Limited adversarial learning: Anonymized information that can still help improve your likelihood of linking a record to an individual by a certain probability. For example, data that does not directly link to an individual but perhaps makes your guess about who the record links to 20% more accurate.
Thoughtful planning and design can help minimize the risks of handling confidential information. Good systems – and good tools – are critical to data security.