How Equilibrium scaled remote tutoring sessions by leveraging AI feedback

"SurveyCTO is a crucial platform for Equilibrium's data collection and analysis, particularly remarkable in the remote tutoring project. Its flexibility and substantial storage capacity were instrumental in tracking approximately 200,000 tutoring sessions. The platform also enabled the organized management of tutoring session recordings, linking them to individual tutors. This streamlined process facilitated subsequent LLM analysis to deliver timely and precise feedback. Equilibrium is extremely satisfied with SurveyCTO's performance and features."

Meet Equilibrium | Business, Data, and Communities

Equilibrium BDC is a leading consulting and research firm based in Latin America and the Caribbean. With a team whose expertise lies in technical rigor, they generate evidence that helps clients make strategic, data-backed decisions.

To date, they have completed 200 consultancy projects for clients across 18 regional countries, spanning sectors such as business, finance, sustainability, health, and education. All of their efforts are done with the goal of generating evidence from the region, for the region.

Offices: Peru, Colombia, Ecuador, Paraguay, Venezuela, and Panama
Sector: Projects across the business as well as development sector
Use case: Market research and consulting
Features used: CATI, case management, audio audits, APIs, data encryption, enumerator management, default device configuration

The Challenge: Delivering personalized teaching feedback at national scale

The National Remote Tutoring Program in Paraguay is a pedagogical intervention designed to strengthen foundational math and literacy skills among primary school students. The program specifically targets children aged 9 to 11.

Delivered via low-tech mobile devices to ensure maximum accessibility, the program connects university students acting as tutors with primary schoolers for weekly 20-minute sessions. The curriculum utilizes the “Teaching at the Right Level” (TaRL) methodology, which prioritizes personalized instruction based on the student’s current understanding rather than their grade level, filling essential learning gaps that classroom instruction may have missed.

After a successful proof-of-concept for remote tutoring, which initially served around 1,000 students, this program was massively scaled by the Ministry of Education with support from the Inter-American Development Bank (IADB).

The operation then expanded across the country to serve over 15,000 students. This expansion involved onboarding 500 tutors who, between them, delivered more than 200,000 remote tutoring sessions. The logistical success proved the scalability of the phone-based tutoring mechanism, but simultaneously introduced a new challenge.

The challenge, commonly referred to as a “voltage drop” in the context of social programs, occurs when interventions that are highly effective in controlled pilot programs fail to maintain their efficacy when scaled up nationally.

In order to measure the efficacy of the remote tutoring program and combat the voltage drop, the IADB brought in Equilibrium to monitor tutor performance, ensure tutor accountability in their work, and provide personalized teaching feedback catered to each student.

The Solution: Leveraging case management, audio audits, and AI for rigorous quality assurance

To overcome these challenges, Equilibrium BDC together with Dequeni Foundation, a Paraguayan non-profit dedicated to improving education opportunities for children, created a robust data collection system centered on SurveyCTO.

This system was crucial for two main objectives:

Maintaining continuous data tracking over time (i.e. longitudinal continuity) and
Allowing for detailed confirmation (i.e. granular verification) of the services provided.

Equilibrium used SurveyCTO’s case management feature to effectively manage and formally track the relationship between each tutor and student. This integral workflow facilitated the monitoring of student progress, including remaining sessions, and allowed for the assignment of students to specific tutors within a unified platform.

SurveyCTO also captured valuable data from multiple tutoring sessions, which was essential for assessing both tutor and student capabilities.

Specifically, the data collection process used SurveyCTO’s audio audit feature to capture audio for a random sample of sessions, serving as a quality assurance measure. This feature enabled the collection of verifiable evidence of each interaction, establishing a secure audit trail for every session delivered within the national framework. These audio recordings were the primary data source that fed the next step: using AI to provide customized feedback to each tutor.

Experimental innovation: The AI-powered pedagogical feedback loop

Because tutoring sessions were every 2-3 days, feedback had to be delivered in a timely fashion so that tutors had time to digest and incorporate feedback into their lessons. Equilibrium used the high-fidelity audio data already securely captured and centralized by SurveyCTO to design an automated audio processing pipeline.

This automated processing pipeline became a feedback mechanism that delivered feedback within two days and significantly enhanced the effectiveness of human tutors.

Equilibrium engineered Google’s Gemini LLM to function as an expert educational consultant, analyzing transcripts of math tutoring sessions for primary school students (grades 4-6).

The AI evaluated the tutor’s teaching effectiveness by scrutinizing two core areas:

1) Instructional clarity (structure, accuracy, and the use of analogies) and

2) Student engagement (verification of understanding and interactivity)

Based on this analysis, Equilibrium’s prompt instructed the model to generate a set of direct, actionable, and encouraging recommendations—such as simplifying abstract concepts or increasing checking-for-understanding questions—designed to guide the tutor toward immediate professional growth.

The Results: Transforming data collection into a scalable, proactive education program

The resulting feedback mechanism consisted of a double loop as follows:

General feedback for tutors

Every week, the AI model evaluated the tutor’s overall pedagogical approach. It generated hyper-personalized feedback and tailored recommendations for each tutor by analyzing their interactions with students, and was delivered automatically via Whatsapp.

- Example from the field: In a math session covering division, the system detected that a tutor was rushing through the explanation.
- The AI suggested a concrete way to make the concept stick: “Instead of just saying ‘divide by 8’, try using a pizza analogy: ‘If we cut a pizza into 8 slices for your friends, how many slices does each person get?’ This makes the abstract math real.”
- This type of feedback gave the tutor a specific, relatable script for future use.

Targeted student feedback

The system went beyond evaluating tutor performance to identifying students struggling with specific conceptual gaps or showing signs of confusion. It provided a diagnostic analysis of student comprehension and offered instructional guidance to the tutor.

- Example from the field: For a student named Liam who was struggling with the concept of regrouping (often called “carrying” or “borrowing”) in addition, the AI generated a specific remediation plan.
- It advised the tutor to “practice simple addition problems that don’t require regrouping first to build confidence,” and then to “introduce regrouping by explaining that when a room (the ones place) gets too full, ten ones have to move next door to the tens room.”
- This transformed the tutor from a generalist into a data-informed specialist with a clear roadmap for Liam’s next lesson.

These interventions do more than just solve a math problem. They represent a fundamental shift in how the program manages educational risk. The integration of this automated feedback loop into the SurveyCTO infrastructure offers three profound implications for education policy, particularly in developing contexts:

First, by automating the detection of learning difficulties at the individual level, this system acts as an early warning mechanism. It allows the policy to shift from remedial (fixing problems after failure) to preventative (addressing confusion in a short time). This ensures that the most vulnerable students receive the targeted support necessary to keep pace with their peers.
Secondly, the initial program using low-tech mobile phones demonstrated strong cost-effectiveness and scalability by integrating technology with tutor monitoring and training. The new intervention with AI took this several steps forward by enabling national scale. The AI-powered feedback loop automated tutor supervision. Additionally, by offering personalized coaching, it allowed each tutor to effectively teach significantly more students without sacrificing quality. This operational efficiency substantially lowers the marginal cost of scaling, which makes high-quality, personalized tutoring a financially viable solution for national education systems.
Lastly, while “voltage drop” is a concern when scaling up programs to such a massive scale, what can mitigate its effects is creating a robust data infrastructure. With SurveyCTO, Equilibrium collected both structured and unstructured longitudinal data, ultimately creating a solid data architecture to serve as the foundation for ensuring continuous improvement, quality, and accountability, moving beyond its function as a simple administrative record.

"Our challenge was converting hundreds of hours of audio into actionable data. This workflow solved that by treating every session as a data point for improvement. The system automatically ingests the SurveyCTO recordings, transcribes the interaction, and uses Gemini to act as a 'master teacher,' identifying exactly where a student got stuck, but also, where the tutor needed improvement. The ability to push that specific feedback directly to the tutor’s WhatsApp—right before their next call—changes the game from monitoring compliance to ensuring quality at scale."