Research for democratic capabilities
There is an ocean of research required to help democracy keep pace with AI.
The gap map contains over 250 research questions linked directly to the capabilities they will improve and the goals they will help meet. If you see other gaps the map is missing, you can make suggestions for improving the coverage of the map on the contributions page.
This page includes sections for:
What needs to be done first
Resolving a few key bottlenecks has the potential to massively speed up the overall ecosystem. This is a list of goals that, once met, would improve the maturity of their parent capability, albeit only for the capabilities that we’ve tagged as the most urgent because of either their importance in speeding up the rate of improvement (simulation and measurability) or because they fill critical gaps in getting processes to the point of “good enough” for near-term use cases (resisting manipulation and reaching participants).
Can AI generate its own suggested changes and test them to search the latent space for optimal solutions?
What design variables in deliberative formats can AI systems reliably identify as leverage points for optimization through automated multi-agent simulation?
For what uses, in what contexts and with what level of faithfulness is it helpful or appropriate to use simulations, and what are the philosophical, moral, and political implications?
What simulation fidelity level (agent realism, dialogue authenticity, decision distributions) accurately predicts outcomes for specific deliberative formats under real-world constraints, and where does increased fidelity stop improving predictive value?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
What are the key technical blockers (agent behavior calibration, emergent group dynamics modeling, preference faithfulness) to effective and trustworthy multi-agent simulation, and which are tractable with current methods?
What kinds of systems are appropriate for simulation?
How can the impacts of interventions on complex systems be simulated quickly and accurately?
What is the Pareto frontier of speed, accuracy and easy-to-use interactability?
What consent, anonymization, and data governance protocols (comparing opt-in vs. opt-out, persistent vs. temporary storage, restricted vs. open licensing) enable practitioners to balance participant privacy and autonomy against the research value of maintaining rich deliberative records?
How do downstream effects from participation systematically vary across different deliberative process formats (comparing citizens' assemblies, deliberative polls, mini-publics, and online forums), and what process features predict effect heterogeneity?
What particular knock-on effects from participation (spanning civic engagement, political efficacy, discussion spillover, network influence, or policy awareness) are most important to measure, and what longitudinal methods best capture them without excessive participant burden?
What observable deliberative quality dimensions (such as turn-taking equity, argument depth, perspective inclusion, or respectfulness) can be reliably measured through automated content analysis or human observation in real time, and what does measurement reveal about facilitator behavior changes?
What measurement approaches (comparing explicit belief statements, semantic mapping, implicit preference tasks, or network analysis of argument adoption) best capture individual and group learning and preference shifts while remaining feasible to administer at deliberation intervals?
How do different methods for measuring preference transformation (pre/post surveys, in-process journaling, exit interviews, or network tracking) correlate with one another and with long-term behavioral change, under different deliberative process formats?
What recording modalities (comparing video, audio-only, spatial tracking, or multimodal combinations) most reliably preserve the substance of deliberation while remaining minimally intrusive and respectful of participant discomfort?
Which transcription and annotation approaches (comparing human verbatim, human semantic, hybrid human-AI, or AI-only) best handle cross-talk, non-verbal communication, and emotional valence while maintaining accuracy standards?
How can we design adaptive learning systems that provide personalized learning programs?
What are the best methods for efficiently educating people?
How can individual learning be mediated through group learning to lift all boats?
How can individual learning agents identify and pair learning partners for defined objectives (idea crosspollination, depolarization, information gaps)?
How can AI systems translate, generate and integrate learning materials into diverse formats (text, audio, visual, etc)?
Which evaluation metrics (comparing single-dimension vs. composite indices) are sensitive enough to detect quality differences within similar processes but robust enough for valid comparison across different topics, geographies, and participant populations?
What constellation of outcomes (spanning legitimacy, recommendation quality, participant satisfaction, opinion change, and downstream policy impact) must any democratic process achieve to be considered successful, and how do these vary with process purpose?
How can process outcomes (spanning legitimacy, recommendation quality, participant satisfaction, opinion change, and downstream policy impact) be operationalized as measurable indicators practitioners can feasibly collect?
How can practitioners balance (through adaptive protocols or meta-evaluation frameworks) universal standards for cross-context learning against context-specific adaptations required by local stakeholder concerns and governance structures?
What are the most efficient ways of recruiting participants?
How best to implement global sortition given limited resources or access to population data?
How can we handle the real-world failure modes of recruitment?
What are the best approaches to recruiting a participant pool that captures the complexity and intersections of society while minimising self-selection biases?
What strategies can be used to motivate participation in less-democratic contexts?
For a given budget, location, panel size, and unique quotas, how can we design a recruitment plan that will maximize response rates and the representativeness of the sample?
How to manage recruitment in geographies with incredibly poor access and digital and physical infrastructure?
How can we quantify the fairness of different approaches to sampling the population?
What kinds of recruitment methods reach which kinds of people?
How can we distinguish between legitimate persuasion and manipulative influence in deliberative settings?
What behavioral indicators reliably signal attempts to game deliberative processes?
How can we create standardized integrity assessment frameworks for evaluating completed assemblies?
How can we develop manipulation impact metrics that distinguish between minor and outcome-altering influences?
How can we design information presentation formats that minimize susceptibility to framing effects?
What are the tradeoffs between openness/transparency and manipulation resistance?
How can we develop real-time detection systems for coordinated manipulation attempts during participant recruitment and selection?
How can we quantify and test the manipulation resistance of different assembly design choices?
What conditions allow commitments to remain binding when the regulatory or political environment shifts significantly after the commitment was made?
Under what conditions is it reasonable to not stick with commitments? (e.g. does the reversal of a commitment require an explicit mandate, either through an election or a subsequent deliberative process?)
What is the amount of carrots vs. sticks necessary to protect commitments internally?
What are the internal barriers that prevent commitment from happening? (e.g. employee pressure, incentive systems, decision-making culture, organizational structure?)
What role can legal or compliance infrastructure play in embedding deliberative commitments into operations? Under what conditions can it be counter-productive?
How do we measure commitment drift, i.e. commitments that have not stuck over time?
What properties should commitments have to make them truly adaptable? (e.g. specificity vs. breadth, time boundedness, rules for how commitments evolve over time)
What practices protect commitments from reversal when leadership or staff changes in an organization or government?
Could there be templated approaches to socialising and developing internal commitments?
What are the most common barriers that prevent AI labs from binding to deliberative outcomes? Which barriers are structural versus contingent on political will?
What alternative mechanisms most effectively replicate the functional properties of a legal bind?
Regarding timelines, when does the obligation need to begin? How long a delay, after a decision has been made, is acceptable for a bind to be considered respected? What prevents indefinite deferral?
Is there a demonstrable trade-off between the degree of legal bindingness imposed on AI labs and their capacity for rapid AI innovation? If so, under what governance designs is that trade-off minimized?
How should the degree of bindingness be calibrated to the characteristics of the decision at stake?
Under what conditions should a binding deliberative outcome be legally contestable or reversible?
What are the most common barriers that prevent governments from binding to deliberative outcomes? Which barriers are structural versus contingent on political will?
What existing analogues (e.g. binding arbitration) provide legal precedents, and what do they fail to address for AI governance contexts?
How does the degree of isolation of a citizen participation office affect its resilience to political interference? What level of integration vs. independence optimizes legitimacy?
What legal mechanisms can a private company set up to make deliberative outcomes enforceable?
Underserved niches
Many actors are working on the same obvious problems, leaving other key challenges comparatively neglected. This list contains research questions whose parent capabilities are victims of either “high” or “extreme” neglectedness, and “minimal” or “low” maturity. There is ample reason why some of these challenges have been neglected. Some call for entire teams to tackle a complex problem, like simulation and forecasting, while others are a little less glamorous and require specific knowledge of the processes and systems they feed like output implementability, gathering process data, and activating learning in participants.
Can AI generate its own suggested changes and test them to search the latent space for optimal solutions?
What design variables in deliberative formats can AI systems reliably identify as leverage points for optimization through automated multi-agent simulation?
For what uses, in what contexts and with what level of faithfulness is it helpful or appropriate to use simulations, and what are the philosophical, moral, and political implications?
What simulation fidelity level (agent realism, dialogue authenticity, decision distributions) accurately predicts outcomes for specific deliberative formats under real-world constraints, and where does increased fidelity stop improving predictive value?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
What are the key technical blockers (agent behavior calibration, emergent group dynamics modeling, preference faithfulness) to effective and trustworthy multi-agent simulation, and which are tractable with current methods?
What kinds of systems are appropriate for simulation?
How can the impacts of interventions on complex systems be simulated quickly and accurately?
What is the Pareto frontier of speed, accuracy and easy-to-use interactability?
What consent, anonymization, and data governance protocols (comparing opt-in vs. opt-out, persistent vs. temporary storage, restricted vs. open licensing) enable practitioners to balance participant privacy and autonomy against the research value of maintaining rich deliberative records?
How do downstream effects from participation systematically vary across different deliberative process formats (comparing citizens' assemblies, deliberative polls, mini-publics, and online forums), and what process features predict effect heterogeneity?
What particular knock-on effects from participation (spanning civic engagement, political efficacy, discussion spillover, network influence, or policy awareness) are most important to measure, and what longitudinal methods best capture them without excessive participant burden?
What observable deliberative quality dimensions (such as turn-taking equity, argument depth, perspective inclusion, or respectfulness) can be reliably measured through automated content analysis or human observation in real time, and what does measurement reveal about facilitator behavior changes?
What measurement approaches (comparing explicit belief statements, semantic mapping, implicit preference tasks, or network analysis of argument adoption) best capture individual and group learning and preference shifts while remaining feasible to administer at deliberation intervals?
How do different methods for measuring preference transformation (pre/post surveys, in-process journaling, exit interviews, or network tracking) correlate with one another and with long-term behavioral change, under different deliberative process formats?
What recording modalities (comparing video, audio-only, spatial tracking, or multimodal combinations) most reliably preserve the substance of deliberation while remaining minimally intrusive and respectful of participant discomfort?
Which transcription and annotation approaches (comparing human verbatim, human semantic, hybrid human-AI, or AI-only) best handle cross-talk, non-verbal communication, and emotional valence while maintaining accuracy standards?
What checks and balances are needed, when making fully-binding decisions?
How can cryptographic mechanisms create locking mechanisms and binding incentive structures?
How can technically binding decisions integrate with AI alignment in gradual ways?
How can we design adaptive learning systems that provide personalized learning programs?
What are the best methods for efficiently educating people?
How can individual learning be mediated through group learning to lift all boats?
How can individual learning agents identify and pair learning partners for defined objectives (idea crosspollination, depolarization, information gaps)?
How can AI systems translate, generate and integrate learning materials into diverse formats (text, audio, visual, etc)?
How to unobtrusively measure individual and group understanding?
How to balance finding common ground within a limited time, while minimally sacrificing depth of final outputs?
What are the best methods for providing impartial robustness checking and critical friend support for output refinement?
How can we measure the concreteness of statements and recommendations?
How can we ensure that outputs go beyond abstract, high-level principles to specific, actionable proposals?
What are the most efficient ways of recruiting participants?
How best to implement global sortition given limited resources or access to population data?
How can we handle the real-world failure modes of recruitment?
What are the best approaches to recruiting a participant pool that captures the complexity and intersections of society while minimising self-selection biases?
What strategies can be used to motivate participation in less-democratic contexts?
For a given budget, location, panel size, and unique quotas, how can we design a recruitment plan that will maximize response rates and the representativeness of the sample?
How to manage recruitment in geographies with incredibly poor access and digital and physical infrastructure?
How can we quantify the fairness of different approaches to sampling the population?
What kinds of recruitment methods reach which kinds of people?
How can we effectively account for uncertainty in scenario consequences?
How can we enumerate a comprehensive set of scenarios or cases that a policy needs to address?
How can we identify the likelihood that key scenarios are missing?
How can we represent scenarios in an interactive and educational process (not predictive modelling)?
How can we track and mitigate biases within scenario mapping?
How can we develop criteria and methods for prioritizing scenarios based on likelihood, impact, and relevance to deliberative decisions?
How should we best treat low probability but high impact edge cases?
How can deliberative outputs be developed to accommodate revisions over time whilst preserving their intended motivations?
What data triage and routing processes (structured as decision trees vs. algorithmic vs. moderator-driven) enable process organizers to respond to emerging issues during deliberations, measured by time-to-action and intervention appropriateness?
Which visualization and dashboard designs (comparing temporal vs. spatial vs. network-based layouts) best support real-time information use by facilitators under time pressure, and when do practitioners choose to ignore dashboard signals?
How can deliberative outputs be formatted as functions such that they can automatically adapt?
What are the best methods for enabling iterative and ongoing citizen engagement so recommendations can be updated as contexts shift?
What machine translation and annotation approaches (comparing human-in-the-loop vs. automated vs. hybrid) maintain semantic accuracy for multilingual data in international or diverse assemblies, particularly for idioms and context-dependent meaning?
Which open standards and API specifications (building on ActivityPub, NDJSON, or deliberation-specific formats) best enable interoperability between different tools while operating within organizations' existing tech stacks and governance constraints?
What unified data models and schema (using RDF, JSON-LD, or domain-specific approaches) enable structured and unstructured inputs to be harmonized across different deliberative tool ecosystems, without losing fidelity to participants' original contributions?
To what extent can AI be used to provide reliable live-time fact-checking within deliberations?
Under what conditions can AI-simulated participants maintain democratic legitimacy?
How can we ensure simulated participants accurately represent missing demographics?
How can automatic logging of key events improve access for verifiers?
How do we balance efficiency with resilience in resource-constrained environments?
How do we communicate changes to stakeholders without undermining confidence in outcomes?
How to enable AI-provided context that is appropriately comprehensive and sufficiently unbiased?
How can we measure and address the "conflict hangover" effect on subsequent deliberations?
What redundancies and buffers are most cost-effective for different types of disruptions?
What are culturally-sensitive approaches to conflict that work across different contexts?
What are the tipping points where adaptation compromises core democratic values?
What transparency and consent mechanisms are required for hybrid assemblies?
How can we identify verbal and non-verbal cues that predict conflict escalation in deliberative settings?
How to fairly identify and fill perspective or empirical gaps in the background information?
How to suitably treat information hierarchies and data privacy while accumulating and mapping the information space?
To what extent can a structured repository of interpretive precedents — built from annotated implementation decisions linked back to the deliberative rationale that grounds them — function as a reliable 'case law' for navigating ambiguity in process outputs?
How reliably can language models trained on deliberative transcripts, stated rationales, and value-elicitation outputs distinguish between implementation decisions that are consistent with versus divergent from the normative commitments embedded in process outputs?
How can we measure whether conflict resolution preserved or suppressed minority viewpoints?
How can we define and measure "minimum viable" conditions for different assembly objectives?
What pre-commitments and transparency measures best preserve legitimacy during adaptations?
How do we prevent gaming or manipulation of AI backup systems?
How to develop real-time dashboards that track process health across multiple dimensions?
How do we distinguish between productive tension that enhances deliberation and destructive conflict?
How can we design responsive information systems that provide accurate context in real-time?
What restorative practices are most effective in deliberative settings?
What role can sentiment analysis and emotion recognition play in real-time conflict monitoring?
How can we systematically stress-test assembly designs before implementation?
Can AI generate its own suggested changes and test them to search the latent space for optimal solutions?
What hybrid approaches can combine fast simulation with selective human input to optimize both speed and accuracy for urgent decisions?
What are the best methods to measure the faithfulness of simulations?
What are the best methods to measure the accuracy of simulations?
How can we solve the technical blockers to effective and truth-worthy multi-agent simulation and modelling?
How can we develop realistic simulation environments that accurately predict how different deliberative formats will perform according to different design choices?
For what uses in what contexts and with what level of faithfulness is it helpful or appropriate to use simulations? What are the philosophical moral political etc. implications?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
How can deliberative processes operating at different governance layers be coordinated such that they inform rather than contradict each other, especially when underlying values or priorities differ across regions?
What structural alignment mechanisms enable deliberative outputs from multiple jurisdictions to coherently influence transnational policy bodies while respecting subsidiarity and local democratic autonomy?
Under what conditions do transnationally-integrated deliberative processes strengthen the legitimacy of transnational institutions versus creating legitimacy backlash by appearing to bypass national democratic processes?
The most impactful research
There are twelve capabilities that we have rated as extremely important because improvements to them will play a massive role in improving the overall maturity of deliberative processes. These are the capabilities that are particularly load-bearing when it comes to process legitimacy, feasibility, and implementation in high-stakes settings. Part of the reason why they’re so important is that the work is technically difficult and requires direct insight into practical implementation barriers.
Can AI generate its own suggested changes and test them to search the latent space for optimal solutions?
What design variables in deliberative formats can AI systems reliably identify as leverage points for optimization through automated multi-agent simulation?
For what uses, in what contexts and with what level of faithfulness is it helpful or appropriate to use simulations, and what are the philosophical, moral, and political implications?
What simulation fidelity level (agent realism, dialogue authenticity, decision distributions) accurately predicts outcomes for specific deliberative formats under real-world constraints, and where does increased fidelity stop improving predictive value?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
What are the key technical blockers (agent behavior calibration, emergent group dynamics modeling, preference faithfulness) to effective and trustworthy multi-agent simulation, and which are tractable with current methods?
What kinds of systems are appropriate for simulation?
How can the impacts of interventions on complex systems be simulated quickly and accurately?
What is the Pareto frontier of speed, accuracy and easy-to-use interactability?
What consent, anonymization, and data governance protocols (comparing opt-in vs. opt-out, persistent vs. temporary storage, restricted vs. open licensing) enable practitioners to balance participant privacy and autonomy against the research value of maintaining rich deliberative records?
How do downstream effects from participation systematically vary across different deliberative process formats (comparing citizens' assemblies, deliberative polls, mini-publics, and online forums), and what process features predict effect heterogeneity?
What particular knock-on effects from participation (spanning civic engagement, political efficacy, discussion spillover, network influence, or policy awareness) are most important to measure, and what longitudinal methods best capture them without excessive participant burden?
What observable deliberative quality dimensions (such as turn-taking equity, argument depth, perspective inclusion, or respectfulness) can be reliably measured through automated content analysis or human observation in real time, and what does measurement reveal about facilitator behavior changes?
What measurement approaches (comparing explicit belief statements, semantic mapping, implicit preference tasks, or network analysis of argument adoption) best capture individual and group learning and preference shifts while remaining feasible to administer at deliberation intervals?
How do different methods for measuring preference transformation (pre/post surveys, in-process journaling, exit interviews, or network tracking) correlate with one another and with long-term behavioral change, under different deliberative process formats?
What recording modalities (comparing video, audio-only, spatial tracking, or multimodal combinations) most reliably preserve the substance of deliberation while remaining minimally intrusive and respectful of participant discomfort?
Which transcription and annotation approaches (comparing human verbatim, human semantic, hybrid human-AI, or AI-only) best handle cross-talk, non-verbal communication, and emotional valence while maintaining accuracy standards?
What checks and balances are needed, when making fully-binding decisions?
How can cryptographic mechanisms create locking mechanisms and binding incentive structures?
How can technically binding decisions integrate with AI alignment in gradual ways?
How can we design adaptive learning systems that provide personalized learning programs?
What are the best methods for efficiently educating people?
How can individual learning be mediated through group learning to lift all boats?
How can individual learning agents identify and pair learning partners for defined objectives (idea crosspollination, depolarization, information gaps)?
How can AI systems translate, generate and integrate learning materials into diverse formats (text, audio, visual, etc)?
How to unobtrusively measure individual and group understanding?
How to balance finding common ground within a limited time, while minimally sacrificing depth of final outputs?
What are the best methods for providing impartial robustness checking and critical friend support for output refinement?
How can we measure the concreteness of statements and recommendations?
How can we ensure that outputs go beyond abstract, high-level principles to specific, actionable proposals?
Which evaluation metrics (comparing single-dimension vs. composite indices) are sensitive enough to detect quality differences within similar processes but robust enough for valid comparison across different topics, geographies, and participant populations?
What constellation of outcomes (spanning legitimacy, recommendation quality, participant satisfaction, opinion change, and downstream policy impact) must any democratic process achieve to be considered successful, and how do these vary with process purpose?
How can process outcomes (spanning legitimacy, recommendation quality, participant satisfaction, opinion change, and downstream policy impact) be operationalized as measurable indicators practitioners can feasibly collect?
How can practitioners balance (through adaptive protocols or meta-evaluation frameworks) universal standards for cross-context learning against context-specific adaptations required by local stakeholder concerns and governance structures?
What are the most efficient ways of recruiting participants?
How best to implement global sortition given limited resources or access to population data?
How can we handle the real-world failure modes of recruitment?
What are the best approaches to recruiting a participant pool that captures the complexity and intersections of society while minimising self-selection biases?
What strategies can be used to motivate participation in less-democratic contexts?
For a given budget, location, panel size, and unique quotas, how can we design a recruitment plan that will maximize response rates and the representativeness of the sample?
How to manage recruitment in geographies with incredibly poor access and digital and physical infrastructure?
How can we quantify the fairness of different approaches to sampling the population?
What kinds of recruitment methods reach which kinds of people?
How can we distinguish between legitimate persuasion and manipulative influence in deliberative settings?
What behavioral indicators reliably signal attempts to game deliberative processes?
How can we create standardized integrity assessment frameworks for evaluating completed assemblies?
How can we develop manipulation impact metrics that distinguish between minor and outcome-altering influences?
How can we design information presentation formats that minimize susceptibility to framing effects?
What are the tradeoffs between openness/transparency and manipulation resistance?
How can we develop real-time detection systems for coordinated manipulation attempts during participant recruitment and selection?
How can we quantify and test the manipulation resistance of different assembly design choices?
What are the best ways of anticipating key objections core power holders may raise against recommendations?
How can deliberative processes produce outputs that meet legal, technical, or administrative requirements without compromising participant ownership?
What are the most effective methods and formats for presenting process outputs to decision makers, and what tools can support this process?
What are the most effective methods of testing the compatibility of outputs with legal/constitutional/jurisdictional or other fundamental constraints on recommendation uptake?
What conditions allow commitments to remain binding when the regulatory or political environment shifts significantly after the commitment was made?
Under what conditions is it reasonable to not stick with commitments? (e.g. does the reversal of a commitment require an explicit mandate, either through an election or a subsequent deliberative process?)
What is the amount of carrots vs. sticks necessary to protect commitments internally?
What are the internal barriers that prevent commitment from happening? (e.g. employee pressure, incentive systems, decision-making culture, organizational structure?)
What role can legal or compliance infrastructure play in embedding deliberative commitments into operations? Under what conditions can it be counter-productive?
How do we measure commitment drift, i.e. commitments that have not stuck over time?
What properties should commitments have to make them truly adaptable? (e.g. specificity vs. breadth, time boundedness, rules for how commitments evolve over time)
What practices protect commitments from reversal when leadership or staff changes in an organization or government?
Could there be templated approaches to socialising and developing internal commitments?
Can AI systems identify their own biases and reasoning errors more reliably than individual humans can identify their own cognitive biases when making sense of inputs?
How much authentic human value is lost at each level of AI involvement (AI note-taker vs. AI facilitator vs. AI co-deliberator) and where is the steepest drop-off in the value-cost curve?
If 'doing the work' of synthesizing and clustering is more valuable than having an AI do it, do participants benefit equally from 'doing this work' or does it privilege those with more skills and stamina?
How to develop an AI facilitator that is attentive to power imbalances, adaptive to group dynamics and effective in guiding groups towards successful outcomes?
How can digital tools assist human facilitators to more effectively facilitate deliberations?
What are the effects of AI facilitation on public perceptions, group dynamics and deliberative quality?
How can delibtech tools expand the space of policy scenarios and considerations in a transparent and fair way?
Where your effort can have an outsize payoff
Some research projects aren’t as glamorous as others. As a result, some of the more consequential and foundational research is still waiting for someone capable to tackle it. Addressing certain research questions could create the opportunity to make a big impact with a comparatively smaller investment of resources. These research questions are rated on a scale of “opportunity” for improving the parent capability, which means that we think there is significant headroom for investment to improve maturity.