How this data was gathered, cleaned, weighted, and presented.
Yoot is a WhatsApp-based civic polling infrastructure operated by Youth Ki Awaaz. Questions are deployed through WhatsApp Business API to a growing panel that currently stands at 12,000+ registered participants across India. Over 500 days of polling, the panel has collectively generated 2,00,000+ responses across 500+ questions.
Participants join voluntarily through organic recruitment across Youth Ki Awaaz's platforms. Each poll is sent as a WhatsApp message; respondents tap to answer. Not all participants answer every question. Response counts vary per question, depending on topic, timing, and community engagement.
Raw response data exported from BigQuery undergoes a multi-step cleaning pipeline:
For this archive, only questions with identifiable categorical response options are included. Free-text and open-ended questions are excluded from the scatter visualisation.
Yoot applies raking-based post-stratification to adjust for demographic imbalances in the panel relative to India's young population. Weights are calculated at the question level, accounting for differential non-response across questions.
Population benchmarks are drawn from Census 2011 projections and NFHS-5 data. The raking algorithm iteratively adjusts weights across a three-way State × Gender × Age cross-classification until convergence (tolerance: 1e-6, max 50 iterations). Weights are normalised to a mean of 1 at each iteration.
Design effects of 1.5 to 2.0 are typical for weighted digital surveys of this kind. Margins of error are reported using effective sample sizes that account for these design effects, using Wilson score intervals at 95% confidence.
Note: The visualisation on the Questions tab shows unweighted response counts and percentages. Weighted analysis is applied in Yoot's thematic reports and partner deliverables.
To map 500+ questions into a navigable visual space, we used the following pipeline:
all-MiniLM-L6-v2 sentence transformer model (384-dimensional embeddings), which captures semantic similarity between questions regardless of surface wording.Dot size corresponds to response count. Colour corresponds to cluster assignment.
This archive is a civic data project, not a nationally representative survey. We want to be clear about what it is and what it is not:
We share this data because we believe civic transparency matters more than methodological perfection. These responses are authentic, even if the sample in several instances is imperfect. We see this as a contribution to the well-recognised need for civic data commons in India, and we welcome scrutiny as part of that process.