You are currently viewing 54gene enhances phenotypic data collection with the aid of SurveyCTO’s systemized platform

Guest Authors:

Guest Organization: 54Gene is an African genomics research, services and development company.

When faced with unacceptable levels of data entry errors due to a paper-based data collection system, 54gene transitioned to a mobile, virtual system to scale research efforts.

The dearth in the data from diverse populations that is available for genomic insight generation and precision medicine has been raised recurrently as a point of concern – not just regarding its impact in deepening inequality, but also how it limits the ability of scientists to have a more global understanding of variations in the human genome and how that can be used to improve global health (Petrovski & Goldstein, 2016). 

African populations have been shown to have more genetic diversity than all other global populations (Choudhury et al 2018), yet a 2021 study reported that only 1.1% of the genomic data being used for genome-wide association studies (GWAS) came from people of African descent (Fatumo, S. et al. 2022), which was even worse off than the 3.0% representation reported in 2019 (G Sirugo, SM Williams, SA Tishkoff, Cell 2019). Hence, 54gene’s vision is to pioneer the inclusion of the African genome in global research to help influence and diversify the drug discovery and development process and its outcomes to benefit people of all populations. 

54gene, a life sciences company with headquarters in Washington, DC and Lagos, Nigeria was set up in 2019 to significantly scale up efforts to address this diversity gap. Driven by its research partnerships with academic institutions and hospitals, the company is curating a rich resource of genomic data from across various African countries using Illumina’s high throughput genotyping and sequencing technology platforms. Its overall vision is to equalize precision medicine. 

The major challenge: Data collection and storage limitations 

Data collection is integral to the successful conduct of any research study as it allows adequate answers to research questions, hypotheses tested, and outcomes evaluated. However, researchers often face multiple challenges during this process, and 54gene is no exception. 

Owing to perceived challenges in adopting a full electronic data collection strategy raised by its initial research partners, 54gene started out collecting data for its research studies using a paper-based data collection and management system. This involved the collection of questionnaire and laboratory data using paper-based forms, and then transcribing the data collected onto electronic forms that were created on a research portal that was designed in-house with all industry-standard security features, as a repository of all research data.

Although this data collection method coupled with the research portal design was considered innovative in its own right, however as more studies were launched and more study partners were engaged, it became clear to all stakeholders that to allow for large-scale, quality, and reliable data as envisioned by 54gene, a more efficient data collection mechanism was necessary. This was a hard but quick lesson learned by the 54gene research team, when it identified unacceptable levels of data entry errors as part of the near real-time data quality assurance mechanism that had been put in place. 

Before adopting SurveyCTO, 54gene used paper questionnaires that were later manually transcribed into their data portal

Other issues with the paper-based method included its requirement for huge amounts of hours in transcribing data onto the research portal, risks that paper-based records would be accessed by unauthorized persons and/or lost in the research process, as well as an increasing back-log of data not yet entered unto the research portal, thereby affecting timeliness of QC and response. 

SurveyCTO’s addition expedites a large portion of the data collection process. 

Following a careful review of commercial research-focused electronic data collection solutions, 54egne found SurveyCTO, a solution provided by Dobility Inc, as a most suitable replacement for the data collection mechanism it was using at the time. 

SurveyCTO’s portal is a much more versatile and efficient alternative than any previously employed data collection platform. Since its introduction in August 2021, most of the data collection process has become mobile and virtual, further helping to advance 54gene’s research initiatives and objectives. 

A comparison of the old data collection method and the streamlined process with SurveyCTO

Some advantages of SurveyCTO’s applications include: 

1. Proper consent review via QC (Quality Control) and ethical standard compliance 

SurveyCTO’s Collect App (available on Android and iOS devices) allows research Site Team Leads or Quality Assurance personnel to review the consent data captured before finalizing and uploading it to the SurveyCTO database. 

Through its image capture feature, which allows site teams and the QC team interface, the authorization page containing required signatures is uploaded to SurveyCTO for review as soon as consent is received. Issues such as misspelled names or illegible text, among others, are flagged using Participant IDs and then reported to the participant data collector, also known as the Research Assistant (RA), for resolution.

Once resolved, all information is re-uploaded and reviewed to ensure participant data collection is in line with the ethical regulations and standards of the Institutional Review Board (IRB) and General Data Protection Regulation (GDPR). These standards are of great importance, as they assure participant safety, well-being, and privacy while ensuring the implementation of only ethically and scientifically valid research. 

The approved consent form is the minimum requirement for accepting participant biological samples into the 54gene lab or biobank. 

2. Better management of site teams 

SurveyCTO’s Collect App allows research site teams to collect data on mobile devices conveniently and efficiently. The interface is easy to use and understand by site teams, allowing for better data management by creating a workflow. 

The SurveyCTO database helps record and track flagged consent forms and allows more efficient resolution of the issues highlighted. This feature also enables the QC team to identify gaps and problem areas in a bid to facilitate more targeted training of data collectors, thus reducing errors. 

With SurveyCTO, the research team can test surveys and forms, gather feedback and make changes before launch. Thus, ensuring that surveys are developed with due consideration of the end users (data collectors). 

3. Maintaining a robust, high-quality data bank (by reducing duplication and waste) 

The PI Survey Library allows for direct data publishing from SurveyCTO to Python, where large volumes of data undergo edit-checks and reviews. Also, by publishing directly to Google sheets from SurveyCTO, events of missing data are avoided. 

As the volume of 54gene’s research datasets increased across its studies, it became imperative to augment Google Spreadsheet’s capabilities with a tool more accommodating of large volumes of data and proficient in data privacy and encryption features. The Python automation provided by SurveyCTO was the best solution for this. With this feature, 54gene’s research team can review, clean, and analyze large volumes of data, resulting in a robust, high-quality data bank. 

Constraints can be placed on data fields when using SurveyCTO to ensure accurate data entry. The 54gene research team utilizes this feature in their survey development process and has found it helpful in streamlining the data collected by the RAs. 

Aside from numerous functions, the platform offers a clean and accessible interface that allows for better flexibility, leaving little to no space for error.

“SurveyCTO has a clean and swift interface that is user-friendly and easy to navigate.” 

  • Annie, 54gene Site Quality Assurance Officer. 

Data collection process flow 

The image below depicts the current process flow of 54gene’s research team when using SurveyCTO:

An outline of 54gene’s data collection process flow

Research outcomes using SurveyCTO

Data collection for health research purposes with human participants can be quite challenging, more so, with the numerous ethical considerations that need to be taken into account. 

SurveyCTO has helped 54gene’s research team meet, and sometimes exceed, these ethical standards, enabling them to effectively manage the work done by site teams during data collection and to store data securely in ways that aid data analysis processes. 

Given that SurveyCTO is constantly devising innovative and more efficient tools and workflows for data collection, the 54gene team is assured of cost-effective, seamless, and streamlined approaches to improve their current processes and, ultimately, the quality of their data. 

“SurveyCTO helped us do large-scale retrospective review of our data and enabled us to clear the backlog of unreviewed data. This was possible because we could input the QC parameters into SurveyCTO, and it did the work of reviewing the data.” 

  • Lynda, 54gene Data Quality Control Specialist. 


  1. Choudhury, A., Aron, S., Sengupta, D., Hazelhurst, S., & Ramsay, M. (2018). African genetic diversity provides novel insights into evolutionary history and local adaptations. Human Molecular Genetics, 27(R2), R209-R218.
  2. Fatumo, S., Chikowore, T., Choudhury, A., Ayub, M., Martin, A. R., & Kuchenbaecker, K. (2022). A roadmap to increase diversity in genomic studies. Nature Medicine, 28(2), 243-250.
  3. Petrovski, S., & Goldstein, D. B. (2016). Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome biology, 17(1), 1-3.
  4. Sirugo, G., Williams, S. M., & Tishkoff, S. A. (2019). The missing diversity in human genetic studies. Cell, 177(1), 26-31.


Ezekiel Ogundepo, Blessing Bassey, Idris Ogunsola, Lynda Enemali, Okikioluwa Balogun, Jeremiah Oyedemi, Stephen Darkoh, Mobolaji Olabisi, Gift Eseoghene, Ndubuisi Ezumezu, Onome Braimah, Oyindamola Thomas, Aminu Yakubu, Arjun Biddanda, Colm O’Dushlaine, Esha Joshi, Golibe Eze-Echesi, Yemisi Osakwe, Olubukunola Oyedele.

Lindo Simelane

Digital Analytics Associate

Lindo is a part of the marketing team at Dobility, the company that powers SurveyCTO. He is responsible for all aspects of data management and analytics for SurveyCTO’s website, social media, and other marketing efforts.

Lindo has a passion for research to inform policy in international development and economics. He has extensive experience in the ICT4D space that includes research work, and a background in international business and finance.