Testing voice AI agents with DSPy signatures and auto-healing graphs

Tuesday, February 03, 2026

Platforms like Retell, VAPI, and LiveKit have made it straightforward to build phone-based AI assistants. But testing these agents before they talk to real customers remains painful: platform-specific formats, per-minute simulation charges, and no way to iterate on prompts without bleeding money.

voicetest is a test harness that solves this by running agent simulations with your own LLM keys. But beneath the surface, it’s also a proving ground for something more interesting: auto-healing agent graphs that recover from test failures and optimize themselves using JIT synthetic data.

voicetest CLI demo

The interactive shell loads agents, configures models, and runs test simulations against DSPy-based judges.

The architecture: AgentGraph as IR

All platform formats (Retell CF, Retell LLM, VAPI, Bland, LiveKit, XLSForm) compile down to a unified intermediate representation called AgentGraph:

class AgentGraph:
    nodes: dict[str, AgentNode]      # State-specific nodes
    entry_node_id: str               # Starting node
    source_type: str                 # Import source
    source_metadata: dict            # Platform-specific data
    default_model: str | None        # Model from import

class AgentNode:
    id: str
    state_prompt: str                # State-specific instructions
    tools: list[ToolDefinition]      # Available actions
    transitions: list[Transition]    # Edges to other states

This IR enables cross-platform testing and format conversion as a side effect. Import a Retell agent, test it, export to VAPI format. But more importantly, it gives us a structure we can reason about programmatically.

DSPy signatures for structured LLM calls

Every LLM interaction in voicetest goes through DSPy signatures. This isn’t just for cleaner code—it’s the foundation for prompt optimization.

The MetricJudgeSignature handles LLM-as-judge evaluation:

class MetricJudgeSignature(dspy.Signature):
    transcript: str = dspy.InputField()
    criterion: str = dspy.InputField()
    # Outputs
    score: float        # 0-1 continuous score
    reasoning: str      # Explanation
    confidence: float   # 0-1 confidence

Continuous scores (not binary pass/fail) are critical. A 0.65 and a 0.35 both “fail” a 0.7 threshold, but they represent very different agent behaviors. This granularity becomes training signal later.

The UserSimSignature generates realistic caller behavior:

class UserSimSignature(dspy.Signature):
    persona: str = dspy.InputField()            # Identity/Goal/Personality
    conversation_history: str = dspy.InputField()
    current_agent_message: str = dspy.InputField()
    turn_number: int = dspy.InputField()
    # Outputs
    should_continue: bool
    message: str
    reasoning: str

Each graph node gets its own StateModule registered as a DSPy submodule:

class ConversationModule(dspy.Module):
    def __init__(self, graph: AgentGraph):
        for node_id, node in graph.nodes.items():
            state_module = StateModule(node, graph)
            setattr(self, f"state_{node_id}", state_module)
            self._state_modules[node_id] = state_module

This structure means the entire agent graph is a single optimizable DSPy module. We can apply BootstrapFewShot or MIPROv2 to tune state transitions and response generation.

voicetest CLI demo

Auto-healing the agent graph on test failures (coming soon)

When a test fails, the interesting question is: what should change? The failure might indicate a node prompt needs tweaking, or that the graph structure itself is wrong for the conversation flow.

The planned approach:

  1. Failure analysis: Parse the transcript and judge output to identify where the agent went wrong. Was it a bad response in a specific state? A transition that fired incorrectly? A missing edge case?

  2. Mutation proposals: Based on the failure mode, generate candidate fixes. For prompt issues, suggest revised state prompts. For structural problems, propose adding/removing transitions or splitting nodes.

  3. Validation: Run the mutation against the failing test plus a regression suite. Only accept changes that fix the failure without breaking existing behavior.

This isn’t implemented yet, but the infrastructure is there: the AgentGraph IR makes mutations straightforward, and the continuous metric scores give us a fitness function for evaluating changes.

JIT synthetic data for optimization

DSPy optimizers like MIPROv2 need training examples. For voice agents, we generate these on demand:

  1. Test case expansion: Each test case defines a persona and success criteria. We use the UserSimSignature to generate variations—different phrasings, edge cases, adversarial inputs.

  2. Trajectory mining: Successful test runs become positive examples. Failed runs (with partial transcripts) become negative examples with the failure point annotated.

  3. Score-based filtering: Because metrics produce continuous scores, we can select examples near decision boundaries (scores around the threshold) for maximum training signal.

The current implementation has the infrastructure:

# Mock data generation for testing the optimization pipeline
simulator._mock_responses = [
    SimulatorResponse(message="Hello, I need help.", should_end=False),
    SimulatorResponse(message="Thanks, that's helpful.", should_end=False),
    SimulatorResponse(message="", should_end=True),
]
metric_judge._mock_results = [
    MetricResult(metric=m, score=0.9, passed=True)
    for m in test_case.metrics
]

The production version will generate real synthetic data by sampling from the UserSimSignature with temperature variations and persona mutations.

Judgment pipeline

Three judges evaluate each transcript:

Rule Judge (deterministic, zero cost): substring includes/excludes, regex patterns. Fast pre-filter for obvious failures.

Metric Judge (LLM-based, semantic): evaluates each criterion with continuous scores. Per-metric threshold overrides enable fine-grained control. Global metrics (like HIPAA compliance) run on every test automatically.

Flow Judge (optional, informational): validates that node transitions made logical sense given the conversation. Uses the FlowValidationSignature:

class FlowValidationSignature(dspy.Signature):
    graph_structure: str = dspy.InputField()
    transcript: str = dspy.InputField()
    nodes_visited: list[str] = dspy.InputField()
    # Outputs
    flow_valid: bool
    issues: list[str]
    reasoning: str

Flow issues don’t fail tests but get tracked for debugging. A pattern of flow anomalies suggests the graph structure itself needs attention.

voicetest web UI

The web UI visualizes agent graphs, manages test cases, and streams transcripts in real-time during execution.

CI/CD integration

Voice agents break in subtle ways. A prompt change that improves one scenario can regress another. voicetest runs in GitHub Actions:

name: Voice Agent Tests
on:
  push:
    paths: ["agents/**"]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
      - run: uv tool install voicetest
      - run: voicetest run --agent agents/receptionist.json --tests agents/tests.json --all
        env:
          OPENROUTER_API_KEY: $

Results persist to DuckDB, enabling queries across test history:

SELECT
    agents.name,
    COUNT(*) as total_runs,
    AVG(CASE WHEN results.passed THEN 1.0 ELSE 0.0 END) as pass_rate
FROM results
JOIN runs ON results.run_id = runs.id
JOIN agents ON runs.agent_id = agents.id
GROUP BY agents.name

What’s next

The current release handles the testing workflow: import agents, run simulations, evaluate with LLM judges, integrate with CI. The auto-healing and optimization features are in POC stage.

The roadmap:

  • v0.3: JIT synthetic data generation from test case personas
  • v0.4: DSPy optimization integration (MIPROv2 for state prompts)
  • v0.5: Auto-healing graph mutations with regression protection
uv tool install voicetest
voicetest demo --serve

Code at github.com/voicetestdev/voicetest. API docs at voicetest.dev/api. Apache 2.0 licensed.


wt: A Git Worktree Orchestrator for Parallel AI Agent Development

Monday, January 26, 2026

Concurrent AI agent development introduces coordination problems when multiple agents modify the same repository. File conflicts occur when agents operate on overlapping code paths, and resolving these conflicts consumes agent context that would otherwise be applied to the task.

Git worktrees provide filesystem isolation while sharing repository state—each worktree maintains an independent working directory and branch reference without duplicating the object database. However, the native git worktree interface requires verbose commands, manual cleanup, and lacks primitives for managing multiple concurrent sessions.

wt is a CLI tool that addresses these limitations.

wt demo showing workspace creation and navigation

Overview

wt wraps git worktree operations in an interface designed for multi-agent workflows:

# Create an isolated workspace
wt new feature/auth

# Spawns a subshell with modified working directory and branch
# Shell prompt reflects workspace context
(wt:feature/auth) $ claude

# On completion, exit subshell and merge
exit
git merge feature/auth

Workspaces provide complete filesystem isolation. Each agent operates on an independent working directory, and changes integrate via standard git merge or rebase operations.

Workspace Management

wt provides both interactive and direct workspace access:

wt ls                  # Interactive picker (TUI) for workspace selection
wt use feature/auth    # Enter a specific workspace directly
wt which               # Display current workspace context
wt rm feature/auth     # Remove worktree and optionally delete branch

Tmux Session Management

wt integrates with tmux to coordinate multiple concurrent agent processes:

wt session new my-sprint
wt session add feature/auth
wt session add bugfix/header
wt session add feature/payments

Each workspace maps to a tmux window with a configurable pane layout: 2-pane (agent CLI + terminal) or 3-pane (agent + terminal + editor).

The wt session watch command displays agent activity state by monitoring pane output:

● feature/auth      (running)
○ bugfix/header     (idle)
● feature/payments  (running)

Activity detection uses output buffer analysis to distinguish active processing from prompt-waiting states.

Claude Code Integration

wt includes a /do skill for Claude Code that implements issue-driven workflows:

/do gh 123          # GitHub issue #123
/do sc 45678        # Shortcut story 45678

The skill executes the following operations:

  1. Fetch issue/story metadata from the respective API
  2. Create a worktree with a branch name derived from the issue identifier
  3. Populate agent context with issue description and acceptance criteria
  4. On task completion, create a commit with the issue reference in the message

Configuration

Configuration follows a precedence chain: CLI flags → repo-local .wt.toml → global ~/.wt/config.toml → defaults.

# .wt.toml (repository root)
panes = 3
agent_cmd = "claude"
editor_cmd = "nvim"

Files listed under a # wt copy marker in .gitignore are symlinked from the main worktree into each spawned worktree:

# wt copy
.env
.env.local
credentials.json

This mechanism handles environment files and credentials that are gitignored but required at runtime.

Implementation

wt is implemented in Rust. The crate structure:

  • WorktreeManager: Wraps `git worktree` subcommands, maintains branch-to-directory mappings, handles cleanup of stale worktrees
  • TmuxManager: Interfaces with tmux via `tmux` CLI commands, monitors pane output buffers for activity detection
  • ShellEnvironment: Spawns subshells (bash, zsh, fish) with modified `$PWD` and environment variables reflecting workspace state
  • SessionState: Persists session metadata to JSON, reconciles with actual tmux state on load

Branch names containing forward slashes require path sanitization. Git permits feature/auth as a branch name, but using this directly as a directory path would create nested directories. wt normalizes slashes to double-dashes in directory names (feature--auth) while preserving the git ref name.

Installation

Pre-compiled binaries are available for macOS and Linux (x86_64, aarch64). Building from source:

git clone https://github.com/pld/wt
cd wt
cargo build --release

The binary includes an install subcommand that configures shell aliases and installs the Claude Code skill definition.

Discussion

Single-agent workflows operate adequately with standard git branching. Multi-agent parallelism introduces coordination overhead that scales with agent count: conflict resolution, context switching between agent sessions, and manual worktree lifecycle management.

wt reduces this overhead by mapping each agent to an isolated worktree and providing primitives for session orchestration. The tradeoff is an additional abstraction layer over git; the benefit is that agents operate independently until explicit merge points.

Source: github.com/pld/wt


Building a macOS UI for Ollama GenAI models in Rust

Friday, March 07, 2025

As we integrate AI dev tools into our teams, I experimented by writing a desktop app in a language I have never used relying on LLM code generation. I primarily used Claude via web and API through aider, with some help on more generic Rust questions from Gemini Advanced.

iced LLM UI in Rust

The main areas for human intervention were

  • tweaking imports and syntax, that weren’t worth the tokens,
  • removing template code to get it to take user input,
  • fixing the eventing to get messages sent from the UI to local the Ollama API,
  • properly sharing state and using channels for realtime API to UI updates.

For some of these I’d manually intervene to get it on the right path then have it clean up. For others it probably would have figured it out with enough prompting direction, but it was easier for me to write the code than the prompts.

I have not tried this but if you’re looking for a proper macOS LLM GUI, Enchanted seems reasonable, but at time of writing last commit 4 months ago is a concern. Although it is Swift. I chose Rust with the goal of cross-platform, not sure if I will maintain this. See the code for a MacOS UI for Ollama GenAI models in Rust


Team-based care with OpenSRP 2

Tuesday, March 19, 2024

Introduction to team-based care

Team-based care is a model of health care delivery where health professionals come together to provide comprehensive and coordinated care for a patient. Team-based care aims to improve the quality and continuity of care, increase efficiency, and reduce administrative burden by sharing health records and patient-centric workflows among health practitioners. In 2022, the Patrick J. McGovern Foundation awarded Ona with a grant to build a suite of FHIR-native digital health tools to accelerate team-based care in Indonesia. Led by the Summit Institute for Development (SID), who also received a grant from Grand Challenge Canada for team-based care, Ona has developed OpenSRP apps for midwives (BidanKu app), vaccinators (VaksinatorKu app), and community health workers (CHWs - KaderKu app)—using a standardized data model and shared database—to provide coordinated care for patients in communities and facilities across Indonesia.

Vision behind team-based care

For over a decade, SID and Ona have been applying technology to improve health care outcomes through the digitization of patient data and health care workflows, the integration of diagnostics and laboratory information systems, and the implementation / publication of impactful research and training methodologies. One pillar of our collaboration is to build out the plug-and-play suite of platforms and integrations needed to bring patient-centered team-based care to the communities where it can most significantly reduce mortality and morbidity. Rebuilding the OpenSRP product has allowed us to flexibly deliver apps purpose-built for the health worker teams that SID works alongside in Indonesia.

Collaboration between midwife, vaccinator, and CHW is essential to ensure patients receive high quality care. Midwives play a crucial role in providing antenatal care to pregnant mothers, in part by monitoring the health of the mother and baby during pregnancy. The midwife also assists the pregnant mother in delivery and provides postnatal care to both the mother and newborn. The vaccinator is responsible for administering the correct vaccinations to people from 0 to 18 years of age, and following the guidelines defined by the Indonesian Pediatric Society and required by the Indonesian Ministry of Health. The CHW is a community-based volunteer who cares for children, pregnant women, teenagers and the elderly. They provide education and counseling on nutrition, remind people to visit the Posyandu (these are community-based integrated health posts established for a day per month), and provide complementary food for pregnant women based on their nutritional status and the village policy.

Figure 1: Midwives and vaccinators are assigned at the village level and responsible for patients in all the sub-villages. CHWs are assigned at the sub-village level and responsible for patients in a particular sub-village.

These frontline health care workers collaborate through:

  • Information sharing: Midwives, vaccinators, and CHWs share information about pregnancies, births and immunizations in assigned locations.
  • *Referrals: CHWs can refer patients (e.g. those needing ANC care, PNC follow up, assistance during delivery, or seeking follow up for minor illnesses) to midwives, and those seeking immunizations to a vaccinator or  midwife.

Figure 1 shows how different healthcare workers are assigned at different levels in the administrative hierarchy, with access to different groups of patients in their respective villages or catchment areas.

Some of the challenges reported by health workers, government supervisors, and patients, when using the current health information system in Indonesia, include:

  • Data quality challenges, such as data clumping, heaping, and other health record inaccuracies;
  • Limited and slow access to health records, leading to delayed or unavailable data analysis;
  • Non-synchronized data due to limited internet connection;
  • When using paper records, risk of damage or loss of the records themselves;
  • Limited or no ability to share data and communicate between systems;
  • Data is for reporting only not for providing feedback to frontline workers, supervisors and community members nor for taking actions in a real time basis;
  • Difficulty managing patients at scale. Health workers need to manage thousands of patients and schedule visits manually on paper, which has led to missed appointments for essential health services.

Figure 2: The left panel shows the main menu in the BidanKu app, where midwives can select the particular set of clients they are interested in viewing. The right panel shows the patient view for a synthetic patient and the dropdown menu with actions to take for that synthetic patient.

To address these challenges, SID and Ona developed three FHIR-native OpenSRP 2 apps that enable team-based care between three health care workers, later expanding to additional front line health care workers. The initial three apps are:

  • BidanKu for midwives to use in primary health care centers and during Posyandu,
  • VaksinatorKu for vaccinators to use during Posyandu and service delivery at primary care facilities (Puskesmas)s, and
  • KaderKu for community health workers (CHWs) to use during Posyandu and in household visits.

Figure 3 shows how the three apps are used in overlapping locations that are part of the patients care journey.

Figure 3: Midwives, vaccinators, and CHWs operate in overlapping locations, making information sharing essential to providing efficient patient-centered care.

Shared data = patient centered care

The three apps — BidanKu, VaksinatorKu, and KaderKu — all synchronize their transactional health data, such as patient records, and health workflow data, with the same health data store. This ensures that the most up to date data is synced to the health data store. If a health care worker is providing services to the same patient they can view the latest information on that patient.

Figure 4: FHIR-native components communicate directly with the health data store, which can be used to store shared health records (SHRs) as well as patient, facility, and other registries. These systems can use various intermediaries to communicate with additional external systems, both apps and clinical electronic medical record systems (EMRs).

The health data store can integrate with direct to client apps, through web apps, smartphone apps, or messaging platforms, like WhatsApp. For example, caregivers can fill in the details of sick child assessment forms from their phones at home, and then a midwife and CHW can access those assessments in OpenSRP 2 to view their symptoms, a preliminary diagnosis, and follow-up actions.

OpenSRP 2 is a platform for team-based apps

Each of these apps achieves their unique user-based functionality through configurations written in standards-based FHIR files. This gives us a well defined separation between the platform (OpenSRP 2) that runs the apps and the apps themselves. This allows SID, Ona, and future collaborators to continue working together to build additional apps tailored to more health care workers and related service providers, all of whom are essential to delivering unified team-based care at the community level.

The other unique separation of responsibilities, which we have implemented through OpenSRP 2 and demonstrated in this work, is between the apps and where health information is stored. Using an off-the-shelf FHIR store for health system data allows seamless interoperability between the apps and external systems that understand FHIR. This lets the apps function as different “views” of the data, instead of silos holding health vertical or worker specific information.

Figure 5: Reference workflow content, is combined with reference worker content and the latest app features to produce an app with the correct health workflow and worker content for the Indonesian context, while producing generic and reusable materials.

For at least the past decade, team-based care has been widely discussed in digital health, but to our knowledge it has never been deployed in a global health context supported by smartphone apps. Collaborating closely with SID, an experienced institution for health program implementation, as well as with the health care workforce members delivering care has allowed us to ground the features required for any team-based care platform in reality. Building on-top of a standards-based platform has allowed us to define the features and configurations required for team-based care in a way that goes beyond the specific location, health care workers, and health care workflows we are currently targeting.

Written in collaboration with:

Yuni Dwi Setiyawati CEO, Summit Institute for Development

Edward Sutanto DPhil Student, Nuffield Department of Medicine, University of Oxford

Anuraj Shankar Research fellow, Nuffield Department of Medicine, University of Oxford


Using AI to estimate and target immunization coverage with True Cover

Monday, March 04, 2024

Ona created True Cover, a methodology and set of tools that combine high-resolution satellite imagery, spatial sample modeling methods, and lightweight mobile data collection, to enable health facility staff to more accurately calculate coverage for their catchment area. The type of coverage is flexible — it can be the coverage of up-to-date and fully immunized children or adults, mothers receiving fully antenatal care, the distribution of prophylactic malaria medications, or any other metric. This modeling technique lets health implementers use AI to target their services towards the poorly covered areas and households in their community that can benefit the most from increases in coverage.

Developing the True Cover algorithm

To develop True Cover, we talked to statistical sampling experts and spatial modelers in order to explore different approaches to this problem. We initially planned to use random cluster spatial sampling, an approach commonly used to fill in missing spatial data. In a previous partnership with a university research team, we had successfully used random cluster spatial sampling to conduct a national water point verification exercise with the government of Tanzania. However, while powerful in the hands of trained statisticians, this approach requires considerable expertise to interpret, and discipline in what data to collect and how to use it. Additionally, using data only from sample and replacement points limits the ability to benefit from the full set of coverage data collected.

Figure 1: The True Cover algorithm chooses samples from an area of interest to collect data against, then based on the output model’s predicted accuracy recommends additional points to sample.

We expect that many local health teams would struggle to follow the rigor such an approach requires, and wanted to find an approach that let us leverage as much of the existing data as possible. In discussions with Dr. Hugh Sturrock, an epidemiologist at UCSF who we are collaborating with on malaria elimination work, we realized we could leverage a set of spatial modeling approaches he developed to help predict the prevalence of neglected tropical diseases (NTDs) and malaria in populations. Following these discussions, we partnered with Dr. Sturrock to develop, test, and create the True Cover algorithms and build a web service around them.

The True Cover algorithm works by taking as an input observations of service coverage at specific spatial locations (defined by a longitude and latitude tuple) and then predicting the coverage for that service across all structural points of interest that we expect people to live in within the target area. In the case of a local community, these are the locations of household structures. Advances in the availability of high resolution satellite imagery and machine learning based feature identification methods have made household structures available at the global scale. For example, in 2019 the Microsoft Bing team released 18 million building footprints in Uganda and Tanzania [1]. Over time we can expect to know where all people in a community live via satellite imagery, establishing an accurate and complete population to model and predict across. To make True Cover successful, we are bringing these two innovations together, then packaging and operationalizing them for use in the communities that can benefit most from them.

Figure 2: Given a set of unsampled points of interest (e.g. households) and a chosen geographical area (i.e. catchment area), the left panel shows all the households in that catchment area tagged for analysis. The right panel shows the True Cover algorithm selecting the set of households to sample, which it predicts will lead to the highest increase in prediction accuracy.

The resulting steps of the True Cover algorithm are:

  1. GIVEN high resolution satellite imagery, OUTPUT household labels for where community members live,
  2. GIVEN household labels AND optional known data on households AND an adaptive spatial sampling method, OUTPUT a spatially distributed sample of households to collect data from,
  3. GIVEN coverage data on household labels, OUTPUT estimates and confidence for all households,
  4. GOTO STEP 2.

We then developed a task based data collection approach, on top of OpenSRP, to allow health workers to download and view all structures on a map and collect data on the structures they are assigned to visit. Using the OpenSRP app they can easily navigate to the house they are assigned to — indicated by color-coding the structure green — and collect data related to coverage. In the case of immunization, this is the number of children at the household who are up-to-date on their immunizations or fully immunized.

Figure 3: The OpenSRP 2 app shows POIs overlayed on high-resolution satellite imagery. OpenSRP 2 color-codes POIs to guide the user toward where to collect data and updates the color-coding based on users’ completed forms linked to POIs. All OpenSRP 2 configurations and data are FHIR-native.

The collected observations are fed into True Cover, which combines these new observations with previously collected observations to update the predicted coverage across all structures in the community, both those with and without observations. When visualized, this produces an intuitive coverage map that makes it easy to identify problem areas. Because True Cover is an active learning model, the accuracy of its predictions improve as more observations are added.

Figure 4: Left panel shows structures color-coded by True Cover’s prediction of the likelihood of household members being fully immunized (coverage). Right panel shows those same structure color-coded by True Cover’s confidence in its coverage estimates.

One of its key features is True Cover’s ability to calculate the confidence of each prediction it makes. The right panel in Figure 4 shows both the areas where the model is able to predict with high confidence, and the areas where more data is needed to improve the prediction. The model’s confidence scores are an input to the adaptive sampling approach, which selects the locations to collect data from that will provide the greatest increase to the model’s overall predictive accuracy. This adaptive sampling approach is more efficient compared to random sampling because it requires less data collection to achieve an equivalent improvement in accuracy. Less data collection leads to greater impact, e.g. increase in vaccination coverage rates, with less time and money spent.

Validating True Cover

Our original goal was to field test the full True Cover methodology in a pilot site. However, we were unfortunately unable to due to delays in project site approval and the reprioritization of community efforts towards COVID-19 prevention through testing and vaccination. As a result, we found opportunities to validate the model across a number of different use cases with projects that could make their existing data available to us.

Figure 5: Left panel shows the BCG vaccination coverage estimates in Uganda generated by True Cover. Right panel shows confidence in BCG vaccination coverage estimates.

Using publically available Bacillus Calmette–Guérin (BCG) data we were able to generate a BCG coverage and confidence map for Uganda. This map shows lower coverage in south western Uganda up to the border, and generally lower confidence in the predictions along borders. This would align with demographic data showing that wealth is more concentrated in larger cities, such as Kampala, in central Uganda, and the sparser data available near the borders.

Working with a research partner in Bangladesh, we were able to use True Cover to predict antenatal care coverage rates, which interestingly were found to correspond with maternal middle upper arm circumference (MUAC) classifications from a different data source. This also shows the expected pattern of lower coverage rates closer to the Brahmaputra river, where people are more economically disadvantaged.

Figure 6: Left panel shows True Cover’s prediction of antenatal care coverage in the area studied in Bangladesh. Right panel shows the average maternal MUAC score aggregated by sector (the finest-grained administrative boundary) in the same area.

Why current approaches to measuring coverage fail

We set out to develop a methodology and set of supporting analytical and data collection tools that enable health teams to better visualize service coverage (e.g. immunization coverage) at the community level. Our goal with True Cover is to enable communities to develop strategies to improve health equity by helping them better understand where problems in their community lie.

True Cover addresses some of the challenges inherent in measuring and estimating coverage at the community level. Most methods of estimating service coverage rely on denominators based on target populations. As coverage rises, however, coverage estimates become increasingly sensitive to errors in target population estimates [2].  For example, a 10% error in the target population estimate of 90% coverage, would lead to a coverage range from 81% - 99% (because 10% of 90% equals 9%) whereas the corresponding error range when estimating at 50% coverage would be from 45% - 55%.  Additionally, population estimates are largely based on five to ten year old census data and are often incorrect, sampled at too high an administrative level (e.g. district level), and are sensitive to factors like human migration, which is more prevalent in at-risk and marginalized communities.

To circumvent these denominator challenges, demographic and health surveys (DHS) and multiple indicator cluster surveys (MICS) are used. These methods are very expensive and are only able to provide an accurate snapshot of coverage once every several years. These surveys are sampled to provide accurate coverage estimates at national and sub-national levels (e.g. district or sub-district). If you are a nurse at a health facility, this will only provide a general sense of immunization coverage within their area. They will not know if local coverage is higher or lower over time, or be able to visualize the heterogeneity of coverage within their communities. Their ability to improve coverage is fundamentally a visibility problem. If we can empower them to see where in their community people are more likely to miss vaccinations, they can target those areas and improve coverage.

Applying True Cover to save lives

With True Cover we have successfully built the technical foundations to generate on demand predictive coverage maps capable of playing an active role in guiding vaccination, resource distribution, and other campaigns. This integrates with the OpenSRP 2 app’s task-based data collection approach, which links tasks to household structures and has been successfully used to coordinate the spraying of hundreds of thousands of homes during indoor residual spraying (IRS) campaigns. Further field testing of the system is required to demonstrate True Cover’s value with campaign based apps and validate the effectiveness and utility of the True Cover approach to estimating coverage.

That said, our partners have already found the predicative coverage maps generated by True Cover useful in their decision making. One of our research partners in Bangladesh has used True Cover to spur important discussions, including programmatic changes. While we wish we had been able to go further to validate this approach, we are excited to see that the ability to spatially sample points using the True Cover algorithm has proven useful on its own.

As the frequency of geospatial monitoring and the power of spatial machine learning continue to increase, we are increasingly convinced in the underlying value of the Ture Cover.

Acknowledgements

This work was made possible with the support of the Bill and Melinda Gates Foundation, Johnson and Johnson, and the Goldsmith Foundation.

References

  1. Microsoft releases 18M building footprints in Uganda and Tanzania to enable AI Assisted Mapping
  2. Assessing the quality and accuracy of national immunization program reported target population estimates from 2000 to 2016

Peter
Lubell-Doughtie

about
projects
archive