Evaluation & Impact in Civic Technology
“If we can’t measure what matters, we’ll keep funding what’s loudest.”
Why Evaluate?
Builds credibility and trust — essential in civic tech and policy.
Turns stories into strategy — bridges activism and evidence.
Ensures resources go where impact is real , not assumed.
Translates innovation into policy adoption .
Open by asking: “Who here has had to prove impact to a funder or policymaker?” Emphasize evaluation as learning + accountability .
What Counts as Evaluation?
Evaluation ≠ Reporting
What we did
What difference it made
Levels:
Outputs → activities or reach
Outcomes → behavioural or system change
Impact → long-term societal effects
Evaluation is the bridge from prototype → policy.
Causality 101
Goal: Understand if outcomes happened because of your intervention.
Correlation ≠ causation
Counterfactuals matter — what if the program didn’t exist?
Example: A civic app increases volunteer sign-ups — but so did a local festival that month. Which caused it?
Designing Randomness In
For credible evaluation, you either design randomness in or find it later .
Prospective approaches (experimental):
RCTs : randomized treatment/control groups
A/B tests : feature flags, phased rollouts
Pre-registration : analysis plans, power calculations, balance checks
Visual flow:
Design Phase → Random Assignment → Measure Outcomes → Compare
Key advantage: Strong causal claims because you control the assignment
Emphasize: if you can randomize, do it. It’s the gold standard.
Finding Randomness Later
Retrospective approaches (quasi-experimental):
Natural experiments : staggered rollouts across councils
Regression discontinuity : eligibility thresholds (age, geography)
Instrumental variables : policy timing, external shocks
Visual flow:
Existing Variation → Identify It → Exploit It → Measure → Compare
Sanity checks needed: parallel trends, placebo tests, robustness checks
When randomization isn’t possible, look for “as-if random” variation in the real world.
Evaluation Approaches
RCTs
Randomly assign treatment/control
Strong causal inference
Costly, sometimes unethical
Quasi-experimental
Natural variation or rollout timing
Often feasible
Harder to prove causality
Observational
Compare existing data patterns
Easy, fast
Prone to bias
Qualitative / Participatory
Interviews, ethnography, co-design
Context-rich
Hard to generalize
Choosing Your Method: Decision Tree
graph TD
A[Can you randomize assignment?] -->|Yes| B[RCT / A-B Test]
A -->|No| C[Is there natural variation?]
C -->|Yes| D[Quasi-experimental<br/>Natural experiment, RD, IV]
C -->|No| E[Strong theory + controls?]
E -->|Yes| F[Observational<br/>with caution]
E -->|No| G[Qualitative / Mixed Methods<br/>Focus on mechanism]
Case Study: Sure Start (UK)
Early evaluations: little measurable impact.
Long-term follow-ups: strong benefits in education and health.
Demonstrates value of persistence and longitudinal data.
Lesson: Good programs take time to show impact.
Sure Start Impact Study →
Case Study: FixMyStreet (mySociety)
Evaluated via behavioural data (reports before/after launch).
Found sustained increases in citizen engagement.
Method: Quasi-experimental (variation by council rollout).
Lesson: Behavioural data can be powerful when context is understood.
mySociety Research →
Case Study: CitizenLab
Combines platform analytics (quant) + partner interviews (qual) .
Found: engagement increases when participants see visible outcomes.
Illustrates value of mixed methods .
CitizenLab Impact Report →
Case Study: D.A.R.E. (US)
Popular anti-drug education program.
Evaluations showed no measurable effect on drug use.
Yet widely funded for decades.
Lesson: Popular ≠ effective — evidence must challenge assumptions.
Quantitative vs Qualitative
How much, how often
Why and how
Numbers, patterns
Stories, meaning
Scale
Context
Best practice: Combine both — Quant shows pattern , Qual explains mechanism.
Theory of Change: The Foundation
Visual framework:
Inputs → Activities → Outputs → Outcomes → Impact
↓ ↓ ↓ ↓ ↓
Staff, Workshops, # People Behaviour Long-term
Funding Platforms Reached Change Societal
Change
Example (civic tech):
Input: Developer time, funding
Activity: Build reporting platform
Output: 1,000 reports submitted
Outcome: Council response time ↓ 40%
Impact: Improved local accountability
Designing Meaningful Metrics
Principles (beyond SMART):
Decision-linked : if the metric moves, someone acts
Behavioural : measure revealed behaviours, not just attitudes
Attributable : plausibly tied to your intervention (via design/ID)
Affordable : feasible to collect reliably
✅ Examples:
% of residents who resolve an issue via the platform (not just reports)
Response-time distribution by council before/after tool launch
Evidence of policy uptake : citations, budget lines, guidelines changed
Push “outcomes, not outputs”; tie each metric to a use-case/decision.
Finding the Right Metrics
Start from theory of change → map inputs → activities → outputs → outcomes → impact
Ask: “If it worked, what would be observably different in 3, 6, 12 months?”
Prioritise leading indicators (early, sensitive) + a few lagging indicators (durable)
Sources:
Product logs / admin data
Surveys (pre/post, panels)
Open data / FOI
Policy documents / Hansard / guidance changes
Data Quality Challenges
Common issues:
Missingness : incomplete records, dropout
Duplicates : same user, multiple accounts
Bots & spam : automated submissions
Channel shifts : users move between platforms
Definition drift : what counts as “engagement” changes over time
Mitigations:
Instrument events early; pilot measures
Pre-register core outcomes; freeze queries
Track adoption & exposure (who actually saw the “treatment”?)
Design & Ethics Challenges
Key considerations:
Ethics & consent : DPIA, minimal data collection, retention policies
Statistical power : enough N to detect plausible effects (MDE thinking)
Exposure tracking : who actually experienced the intervention?
Timing : seasonality, external events affecting baseline
Best practices:
Get ethical approval early
Calculate required sample sizes upfront
Document all design decisions
Plan for sensitivity analyses
Communicating Findings
Why it matters: Evidence without communication ≈ no impact.
Compare these:
❌ Weak communication:
“500 people used our app and we got positive feedback.”
✅ Strong communication:
“Digital reporting reduced council response times by 40% (from 14 to 8 days, p<0.01, RCT with n=20 councils, 6-month follow-up).”
Key elements: Outcome + magnitude + uncertainty + method + context
Audience-First Packaging
Tailor your message:
Policymakers/Funders: 2–3 page brief, counterfactual logic, cost/benefit, timing
Practitioners: playbooks, checklists, implementation notes
Public/Media: plain-language summary, visuals, examples
Do better than the PDF:
One-pager + open dataset + reproducible notebook
Accessible dashboards with methods appendix
Share failures + “what we’d change next time”
Checklist: headline, context, method (in one paragraph), 3 findings, limits, action items
Linking Evaluation to Policy Impact
Example: “Digital participation improves local accountability” > “Users liked our app.”
Common Pitfalls
Evaluating too early or too narrowly.
Over-attributing success to tech layer.
Ignoring social/economic context.
Treating evaluation as funder compliance, not learning.
Project: Design Your Evaluation
Task (in groups of 2-3):
Pick a civic-tech/policy project (real or planned)
Sketch the theory of change (inputs → impact)
Propose one key metric + one identification strategy
Where’s the randomness/variation?
Draft how you’ll communicate results
Who’s the audience? What format?
Deliverables:
5-minute presentation to class
One-page evaluation plan (template provided)
Give 20 minutes for group work, then 2-3 groups present.
Key Takeaways
Evaluation moves civic tech from experiment → evidence → policy .
You need randomness or strong quasi-ID for credible impact.
Good metrics are decision-linked and feasible to measure.
Communication is part of the science.
“What gets measured gets improved — and what gets understood gets funded.”
Further Reading
Andrew Leigh , Randomistas: How Radical Researchers Changed the World
Nesta , Standards of Evidence
mySociety Research Reports
Institute for Fiscal Studies , Sure Start Evaluations
What Works Digital , Evaluation Guidance for Digital Services (2024)