You are currently viewing What is entity-based data collection, and when should you use it?

Track data gathered over time or steps in a process for entities like beneficiaries, patients, households and more in projects that require more than a basic survey

If you’ve been collecting primary data for a while, you may be familiar with the term entity-based data collection. You might even be proficient in it! However, if you’re newer to the research and data collection space, you might have only a vague idea of what this means, or even be reading this term for the first time.

Either way, you’re in the right place. We’re going to dig into entity-based data collection and related terms, explore the use cases and projects it’s a good fit for, and cover typical data collection challenges for this type of project. Then, you’ll get a breakdown of a mobile data collection solution that’s designed for exactly this type of work, and get some steps you can follow to get you started with your own entity-based project.

What is an “entity” in data collection, and what do entity-based workflows look like?

An “entity” can refer to anything you collect data on. That could mean program beneficiaries for an NGO, students in a school, patients who visit your medical clinics, or customers and products for private businesses. 

An entity doesn’t only refer to individual people, either. It can be a place, like a hospital or facility, individual households, or farm plots. An entity can also be incidents that get reported in a system, or equipment that gets inspected at facilities.

This type of data collection is foundational to many fields, since it organizes the management of data collected on subjects over time, or multiple actions taken to complete a specific process. 

Entity-based data collection stands in contrast to simpler methods of collecting primary data, where only a single form or survey is needed to gather all the data you need, and no additional forms or follow-ups with project entities is required. 

It also presents a few unique challenges:

  1. Incomplete data: In complex projects with multiple steps or interviews, there’s a risk of incomplete data through attrition, where data collectors are unable to follow up with all entities as the project progresses.  
  2. Tedious data aggregation: If you use more than one form for a project, getting all data into one clean dataset is time-consuming and labor intensive.
  3. Analysis: Analyzing entity-based data can often involve more advanced statistical techniques to account for the dependencies between observations from the same entities over time. Mixed-effects models and growth curve analysis are examples of these techniques.

Despite the challenges, entity-based projects can provide a level of insight into critical research questions that a cross-sectional study cannot. And in some situations, entity-based data collection is simply necessary to operate or evaluate programs or manage multi-step processes.

Is entity-based data collection the same thing as “case management”?

If you’re familiar with data collection terms, you may know the term “case management,” and wonder if that’s what we’re talking about in this article. The short answer is yes! The longer answer is more complicated. Case management is indeed sometimes used to describe entity-based data collection projects within the field of survey research. And SurveyCTO uses this term for our own built-in feature for entity-based data collection (more on that below!). But in this article, we wanted to peel back the layers of what case-based or entity-based data collection actually is, and lay out a big-picture perspective. The reality is, while seasoned survey researchers are familiar with these terms and concepts, lots of teams in the social impact space that are learning the ropes of monitoring & evaluation and fieldwork research might not be. 

This is one of the most important concepts and practices to be familiar with in M&E, international development, and fieldwork research, so we wanted to break it down so that it’s easy to understand, and easy to get started wit.

One critical point to understand: In the field of mobile data collection and survey research, entity-based data collection is closely associated with projects where you track key data points on the same survey subjects over multiple intervals. This is called longitudinal data collection, and these projects are referred to as a longitudinal study, or panel study. This type of survey research is fundamentally different from one-time surveys, also known as cross-sectional studies, where you collect data from many different respondents or entities at a single point in time without following up with them again.

Let’s explore an example.

Longitudinal data collection

Imagine that you’re measuring key indicators of maternal health for 100 expecting mothers in a remote area. In this scenario, you’re using computer-assisted personal interviewing (CAPI), meaning that field teams of enumerators travel to each respondent’s home and conduct an interview using a form on a mobile device. 

If you’re conducting a cross-sectional study, and your project is designed around a one-off survey, enumerators will interview the selected entities at a designated point in their pregnancy, and the results from this single set of survey questions will be what is analyzed, reported on and used for decision-making.

But what if your study decided to keep track of how these key aspects of maternal health changed throughout each pregnancy? What if you wanted to include postpartum data as well, perhaps at set intervals for the first year of each baby’s life?

A mother holds her toddler an smiles while being interviewed in Cape Town, South Africa.

Your study would then be longitudinal. And with such an important change in the structure and scope of your project, your data collection method options would change, too.

You could still create a new standalone survey for each interview you decided to do, and export data from those surveys once complete. 

Often, these types of projects result in a great amount of data aggregation, cleaning and other data-related work, since there would be no link between each one-off survey. Field teams would spend valuable time recording duplicate data, such as the name, study ID, and contact information of each study participant, each time they visited for the next follow-up. Unfortunate errors and messy data would be real risks in this scenario.

Fortunately, there are other options for complex workflows like this.

Building complex workflows

Mobile data collection platforms like SurveyCTO provide the flexibility and tools to help you build the data collection forms and systems that work for you. For entity-based projects, you can store data gathered across multiple forms in server datasets, which are repositories of data on a cloud- based server. This data is stored in a table, just like a CSV or spreadsheet–in fact, it can be helpful to think of datasets as simply cloud-based spreadsheets of data! 

Once the data is stored in server datasets, you can use pre-loading to pull data from your server datasets into as many surveys or forms as you need, and publish that data back to datasets.

Here’s one real example of a complex entity-based workflow from the non-profit research and policy organization Innovations for Poverty Action. In 2023, they increased political participation in rural Sierra Leone with a unique approach to tracking entities using the advanced offline features.

While custom workflows involving multiple forms and datasets, like the one IPA created, gives you many options and great flexibility in how you build your entity-based data collection project, it does take a lot of survey programming knowledge and data collection know-how to build this type of workflow. If you work at an organization with a talented team of survey creators, a custom workflow like this might be the right fit for your projects! But it’s also extremely valuable to be able to use a built-in feature set for complex workflows, too. 

At SurveyCTO, we’ve got exactly this in our product: A feature designed to make using server datasets for entity-based data collection highly efficient for all data collection teams, even those without previous entity-based data collection experience.

Introducing case management, a feature to streamline entity-based data collection

A hand holding a computer tablet and pressing a case management business concept.

By now we’ve established that entity-based data collection projects come with distinctive needs and challenges that differ from those of one-off, simple surveys or forms, and covered how longitudinal studies are a major use case for this type of data collection. We’ve also touched on ways you can build workflows for entity-based projects using features like server datasets and pre-loading.

But there’s more. High-quality, entity-based data collection is so important, it deserves its own feature to streamline data collection and simplify workflow building.

Enter case management.

Case management for mobile data collection is a solution that organizes and systematizes the process of collecting, managing, and tracking data for specific entities or survey subjects, using an app on a mobile device as your tool. SurveyCTO’s case management solution is designed to streamline entity-based data collection, multi-step processes and longitudinal studies. From collecting data in the field to automatically aggregating longitudinal data for cases to producing organized exported data, case management makes the process of entity-based data collection faster and better. 

Entities and cases both refer to the same thing: The individuals, households, or institutions you're collecting data on and follow up with using multiple forms or surveys.

Case management makes it easy to link multiple forms together and connect them back to cases.  It also enables users to systematically aggregate and analyze the complex datasets that result from tracking data from multiple visits, interviews, or steps in a process.

Consider the earlier example of the mothers in the longitudinal study. With case management, each individual mother would be centered in your data collection as the entities of your study, meaning that your cases dataset would aggregate each set of data gathered on them over time.

Here’s an example of what this looks like in SurveyCTO:

A SurveyCTO cases dataset.
A SurveyCTO cases dataset.

With case management, you could seamlessly track the right data for each mother in your study over time in the cases dataset. Data cleaning would be reduced or eliminated, simplifying work while improving data quality. Duplicate data would be eliminated, since key points like IDs, names, addresses, etc. would be stored in the cases dataset and made automatically available to enumerators in each form used by your project. And, if you used case management alongside SurveyCTO’s enumerator management feature, you could even link each mother to the data collector(s) who interviewed her during the project.

Advantages of case management for mobile data collection

The above example describes several benefits of case management for a longitudinal study–but there are even more advantages:

  1. Store historical data within your data collection app. Case management makes previously-collected data on cases available to data collectors in the form they’re currently using. If you use case management to track program beneficiary interviews, this means that data like addresses, directions and phone numbers collected in a baseline interview can be pulled into all later forms to avoid duplication and ensure the right respondents are being given the right follow-up forms. If you use case management to run a set of tests that students in a school will take on the same day, this could mean that student scores for each test could be visible to the teacher administering their next exam. Storing historical data within a data collection app also lets you create helpful downstream options, like setting up warnings or reminders for enumerators that can be triggered by certain criteria–for example, if a student didn’t show up to their first test and got marked absent, that information could be visible to the teacher administering the next one. 
  2. Enable better enumerator decisions. As you might imagine from the examples above, fIeld teams are best equipped to do their jobs if they have access to a case’s historical data.
  3. Customize questions for each case. When forms for a case are linked together, you can use relevance (also called skip logic) that is dependent on answers to questions in previous forms to determine which questions from a given form a case can receive. This makes it easy to eliminate unnecessary or insensitive questions where appropriate.
  4. Improve collaboration within field teams. When additional features, like offline case transfers, are combined with case management, it’s even easier for data collectors on the ground to work together to ensure that all cases get visited. This collaboration can be essential to reducing attrition in panel studies and longitudinal research.

In addition, a sophisticated case management solution for a mobile data collection tool will provide the following functionalities:

If you use mobile data collection tools, you’re likely gathering data in areas with limited or no internet connectivity. Many data collection apps work offline and sync back to the cloud once the Internet is available again. But what if you’re using a complex workflow for tracking cases over time, and need to pull data from one form to another during an interview? Effective case management solutions for mobile data collection will have full offline capabilities that allow you to use a case management workflow without a continuous internet connection. With SurveyCTO’s advanced offline functionality and case management features, you can create new cases and close cases once complete, all without the Internet!

Unique case IDs

A good case management solution will have you assign unique identifiers to each case to accurately track data. This way, the data collected at different points in time for each entity will always be aggregated to the right case.

Ability to assign cases to specific users

In a typical entity-based scenario, you’ll want to set your workflow up so that enumerators can only see and access the cases they are responsible for. Assigning cases ensures privacy for all involved, and prevents enumerators from becoming confused on which cases they are responsible for. This also avoids duplicating the efforts of your hard-working team by preventing them from accidentally working the same case!

Real-time data monitoring

Tracking data on specific entities means you’ll want to consistently monitoring that data over time, too. Real-time data monitoring allows organizations and decision-makers to keep an eye on the latest data on your cases as it’s being collected. This helps fix errors quickly and, if applicable, lets field teams address potential program issues quickly.

Outstanding data security

It’s always vital to keep data safe, and many longitudinal studies and multi-step processes cover sensitive topics. Look for a mobile data collection tool with a case management feature that also includes top-tier security features, like end-to-end encryption and ways to limit and control access to data to only those that absolutely need it.

How you can use case management

If you gather longitudinal data, case management is absolutely for you. Beyond a traditional longitudinal study, case management also has a wide range of uses for other types of entity-based data collection:

  1. Managing multi-step processes. Think of a patient visiting a doctor’s office or clinic: First, a patient completes an intake form with their basic info. Then, the doctor that checks them out completes a health assessment checklist. Finally, if necessary, a clinic worker may complete a referral form if a patient needs to see a specialist. Or, imagine you must complete multi-step safety assessments on facilities, or get multiple people in a company to sign different forms before you can proceed with a new project. In these scenarios, you need multiple forms completed for each entity, whether that’s a patient, a facility, or a project.
  2. Providing data collectors with a list of interviews they need to complete on their devices.
  3. Allow multiple field teams to gather data at a single point in time. You can do this using multiple forms that all contribute to one single case. For example, think of gathering a variety of data on citizens’ satisfaction with their local government in several towns. In this workflow, each individual town would be its own case. You could have one team do a survey on law enforcement, another on emergency services, a third on sanitation, and so on. 
  4. Break long forms into multiple forms for the same cases. Have a very long survey that might be better if taken as multiple shorter interviews? Use case management to break up lengthy questionnaires into several smaller ones, linking all forms to each case.
  5. Manage calls for computer-assisted telephone interviews (CATI) surveys. When each survey recipient is identified as a case in your CATI project, you can easily keep track of attempts to reach them by phone in the cases dataset.

To give you an even better idea of what entity-based data collection projects that use a case management feature look like, we’ve compiled a few real-life examples below.

How real organizations conduct entity-based data collection using case management

Young indian agronomist or banker showing some information to farmer in smartphone
  1. Making justice user-friendly in UgandaThe Hague Institute for Innovation of Law (HiiL) is an organization dedicated to improving legal justice systems. HiiL knows that people around the world have different ways of resolving legal problems outside typical formal institutions, from gatherings of community elders to informal tribunals. HiiL decided to study effective alternative means of justice in Uganda using social media surveys. HiiL created web forms in SurveyCTO, and used a unique two-wave methodology for their longitudinal study. First, they asked the cases in this study–individual people– about problems they encountered with their local justice system, and then followed up to ask them how they resolved those problems a few months later.
  2. Multi-dimensional accountability – In a creative example of what can constitute an “entity” in a data collection project, Oxfam designed a feedback study on their Middle East humanitarian response to give those affected by crises a voice on the support and services they were receiving. In this project, each instance of feedback was recorded as its own case in their system. Once logged, these incident report cases were passed on to other teams trained to resolve the type of problem identified in the feedback form. These problem-oriented teams filled out new forms, which then updated the feedback cases until all was resolved and the case could be closed. Because the data was made available in a single electronic system, all Oxfam teams working on the Middle East humanitarian response could then report on and analyze both the feedback received from recipients, and their team’s responses over time.

     

  3. Phone surveys in India and Nepal – During the COVID-19 crisis, The International Food Policy Research Institute (IFPRI) carried out a longitudinal study on farmers and women business owners in designated areas of India and Nepal. Centering their data collection around these cases, they measured the impact of COVID-19 on their livelihoods during the course of the pandemic. The surveys were conducted using SurveyCTO’s CATI features, integrating them with case management.

Four things you can do today to get started with case management

Young Farmer Using Smartphone to Take Notes on His Fields

So is using a case management solution right for your entity-based data collection projects? It depends. With SurveyCTO, you always have the option of building your own application for entity-based data collection on our platform, just like IPA did in our example from the beginning of the article: How IPA cut survey time in half.

However, if you’ve decided that the case management feature is the answer for your work, we’re pleased to share these next steps for getting started:

  1. Read SurveyCTO’s product documentation for an overview of how case management works on our platform.
  2. Get the hands-on training you need through this webinar from our success team on using case management to organize your data collection.
  3. Explore this in-depth Guide to Case Management from SurveyCTO’s customer success team for further insight.
  4. Get started! One great method here is to install a case management template from the Hub, and then edit it to fit your needs. We’ve got multiple options, including case management Workflows for Community Health Workers, sample management, and more.
Image of SurveyCTO's template library the Hub, filtered to show case management workflows.
If you have access to the Hub, you can apply filters to find case management templates.

Keep in mind

Entity-based data collection is a way of gathering repeated data on subjects over time or tracking multiple steps in a process for organized, efficient workflows. If your data collection work is going to go beyond standalone, one-off surveys, you’ll likely be setting up some entity-based data collection projects! With the right mobile data collection tool, you can:

A. Use a case management feature with sophisticated functionalities to make this work as simple and straightforward as possible, or
B. Use your team’s skills to create your own custom solution of flexible workflows of multiple linked forms and datasets.

Have questions about getting started with an entity-based data collection project? Not sure if you want to make use of our case management feature, or build your own workflow? Experts at the SurveyCTO Support Center can advise. If you’re on a paid subscription, don’t hesitate to reach out to them today.

Better data, better decision making, better world.

Melissa Kuenzi

Product Marketing Associate

Melissa is a part of the marketing team at Dobility, the company that powers SurveyCTO. She manages content across SurveyCTO’s external platforms, publishing expert insights on best practices for high-quality data collection and survey research for professionals in international development, global health, monitoring and evaluation, humanitarian aid, government agencies, market research, and more.

Her background in the nonprofit sector allows her to draw on firsthand experience as a user of software solutions for the social impact space to bring SurveyCTO’s tools for uncompromising data quality to researchers all around the world.