So What do I take Away from The Great Evidence Debate? Final thoughts (for now)

The trouble with hosting a massive argument, as this blog recently did on the results agenda (the most-read debate ever on this blog) is that I then have to make sense of it all, if only for my own peace of mind. So I’ve spent a happy few hours digesting 10 pages of original posts and 20 pages of top quality comments (I couldn’t face adding the twitter traffic).

(For those of you that missed the wonk-war, we had an initial critique of the results agenda from Chris Roche and Rosalind Eyben, a take-no-prisoners response from Chris Whitty and Stefan Dercon, then a final salvo from Roche and Eyben + lots of comments and an online poll. Epic.)

On the debate itself, I had a strong sense that it was unhelpfully entrenched throughout – the two sides were largely talking past each other,  accusing each other of ‘straw manism’ (with some justification) and lobbing in the odd cheap shot (my favourite, from Chris and Stefan ‘Please complete the sentence ‘More biased research is better because…’ – debaters take note). Commenter Marcus Jenal summed it up perfectly:

‘The points of critique focus on the partly absurd effects of the current way the results agenda is implemented, while the proponents run a basic argument to whether we want to see if our interventions are effective or not. I really think the discussion should be much less around whether we want to see results (of course we do) and much more around how we can obtain these results without the adverse effects.’

There were some interesting convergences though, particularly Whitty and Dercon’s  striking acknowledgement of the importance of power and politics, which are often assumed to be excluded from the results agenda. But what they actually said was

‘Understanding power and politics and how to assist in social change also require careful and rigorous evidence.’

True, but what about reversing the equation? Does understanding the role of evidence in development also require a careful and rigorous understanding of power and politics? They never fully address that crucial point, which is at the heart of Roche and Eyben’s critique.

Both sides (rather oddly, as acknowledged experts in their fields) decried the role of experts. Whitty and Dercon called for ‘moving from expert (i.e. opinion-based, seniority-based and anecdote-based) to evidence-based policy’. Ah, turns out that what is actually being suggested is a move from one kind of expert (practitioners) to another (evidence/evaluation).

As a non number-cruncher I also took exception to their apparent belief that only those who understand the methodological intricacies of different evaluation techniques are eligible to pass judgement. On that basis politicians would be out of a job, and only rocket scientists would get to pronounce on Trident.

There was also a really confusing exchange on the hierarchy of evidence. Whitty and Dercon show a surprising (to me at least) commitment to multi-disciplinarity: ‘Methods from all disciplines, qualitative and quantitative, are needed, with the mix depending on the context….. it is not a matter of just RCTs, but of rigour, and of combining appropriate methods, including more qualitative and political economy analysis.’

Music to the ears of the critics, but is it actually, you know, true? Everything I hear from evaluation bods is that DFID does actually see RCTs as the gold standard, and other forms of evidence as inferior. Roche and Eyben returned to the attack on this in their response, arguing that what Whitty and Dercon call the ‘evidence-barren areas in development’ are only barren if you discount sociology and anthropology, among others, as credible sources of evidence. By the way, Ed Carr has a brilliant new post on the (closely linked) clash between quants and quals, arguing that while quants can establish causation, only quals can explain how that causation occurs.

But the exchange did provide me with one important (I think) lightbulb moment. It was about failure. Whitty and Dercon were particularly convincing on this: the evidence agenda ‘involves stopping doing things which the expert consensus agreed should work, but which when tested do not’. This is a nice Popperian twist – the role of evidence is not to prove that things work, but to prove they don’t, forcing us to challenge received wisdom and standard approaches. This is indeed what I noticed about Oxfam’s recent ‘effectiveness reviews’ – if you find no or negative impact, then you (rightly) start to re-examine all your assumptions. But if this is the proper role for the evidence agenda, is it politically possible? By coincidence I have just read Ed Carr’s forceful critique of Bill Gates’ approach to evaluation, arguing that failure is often airbrushed out in order to safeguard funding and credibility. That seems a pretty fundamental contradiction.

The comments were just as thought-provoking. One of the key messages that emerged is the gulf between these debates and what those in charge of gathering results in aid agencies actually face – highly constrained resources, crazy time pressure, and the need to deliver some (any!) results to feed the MEL machine. Oxfam’s Jennie Richmond reflected on the gap between theory and practice yesterday.

Commenter Enrique Mendizabal asked whether we are demanding a different role for evidence in poor countries than in our own.

‘In the UK, health policy is decided by a great many number of factors or appeals (evidence, sure, but also values, tradition, biases, political calculations, etc). We may complain about it but we accept that it is a system that works. But health policy for Malawi (or other heavily Aid dependent countries) is decided mainly by evidence (or what often passes as evidence at the time) and usually by foreign experts…. would we be happy with USAID funding a large evidence-based campaign to reform the NHS or our education policy?’

But he took his argument a step further – if the final decision should be left to the interplay of evidence (of different sorts), politics and negotiation, then DFID and other donors would be better advised to boost the ‘enabling environment’ for such debates and decisions by investing in tertiary education in developing countries:

‘strengthening economic policy debate is a more adequate objective than achieving policy change (even if it is evidence based).’

Commenter David highlighted a fundamental point that rather went missing in the initial exchange – how the results agenda does or doesn’t work in complex systems:

‘The results agenda approach tends, by presenting development as objectively knowable if broken down into discrete and small bits, todrive attention toward small, more easily measurable interventions to test, particular those that are suited to situations that are simple or complicated rather than complex. Current processes around evidence-based results fail to grapple with complex systems, interaction effects, and emergent properties that dominate most aid project landscapes.

A fundamental critique of the evidence-based revolution is that it actually diminishes efforts to get rigorous evidence about addressing complex challenges. We all want evidence, it’s a question of whether the current framing of “evidence-based” is distorting what types of evidence we gather and value. For those who think that the current emphases on methods to test what works are distorting how we value the evidence coming in (RCT=gold, qualitative methods=junk), this offers little other than platitudes about lots of other methods existing.

Personally, I would be a bigger proponent of the evidence-based revolution if it was coming to folks interested in power, politics, and development, and asking them what their questions are and what evidence might contribute to their work. Absent a learning agenda set to fit complex space and concern itself with power, it will continue to seem to me to be an instance of methods leading research – or searching for keys under the light rather than inventing a flashlight.’

To be fair, Roche and Eyben explicitly chose to focus on the politics of evidence, rather than the implications of complex systems (for example, the question of external validity in complex systems – or lack of it – raised by Lant Pritchett in our recent conversation.)

Final thoughts? After about 500 votes, the poll went narrowly to Whitty and Dercon (34% v 31% for Roche and Eyben, with a pleasing late rally for the ‘totally confused’ camp – my natural habitat). I think Chris Roche and Rosalind Eyben need to work on their communication style (more punchy, less abstract, more propositional). Chris Whitty and Stefan Dercon should give some examples of gold standard anthropological or sociological evidence to allay the doubts over their true commitment to multi-disciplinarity, and take the complex systems question more seriously.

A massive thankyou to all who took part, and please can you come back for another go in a year or so? This one isn’t going away.

Subscribe to our Newsletter

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please see our .

We use MailChimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to MailChimp for processing. Learn more about MailChimp's privacy practices here.


12 Responses to “So What do I take Away from The Great Evidence Debate? Final thoughts (for now)”
  1. kieran

    I wonder if part of the debate is really rooted in the potentially “abstract” world of “truth”. When we say how do we know something has worked any philosopher will be able to run off various arguments to prove there really is no such thing as truth. To which a Tabloid editor might say tell that to people who have just been bombed. However, we know paradigms do shift and values, politics and power really do play a big part in how people perceive and interact with the world
    I have spent 25 years working for various INGO’s and have jumped through many M&E hoops. My sense has been that many staff sadly do see M&E as something they do for others and as a manager you are often looking for PR rather than real learning. We are not really into celebrating learning from multi million pound failures.
    Furthermore it is often a struggle getting budgets for M&E teams and there is often a feeling in many INGO’s that staff in M&E and Policy teams are on another planet. So M&E can often become a contractual obligation that is hastily delivered to keep funders happy.
    The answer? perhaps we need to look outside the world of INGO’s and ask why Govt’s at a time when so much modern ideology is unravelling seem intent on defining development in stark number crunching terms. It may sound a bit soft and liberal but surely development agencies, donors and Govt’s need to acknowledge that often we do not have the “big” answer but we can try and learn by using a range of methodologies that must include listening to the people we are seeking to benefit. It may not be so “sexy” to sell a learning approach as against guaranteed “magic bullets” but in the end I would argue that really is the best we can do.

  2. David Hudson

    A truly heroic summary, Duncan. Congratulations. And on hosting the debate too. Epic indeed.
    I think Enrique Mendizabal’s point that you flag up here is central.
    We absolutely need evidence. This can be RCT, survey data, process tracing, focus groups, ethnographies, and so forth. Agreed.
    But, what is done with this evidence base matters. It is rarely clear cut and there are always, always trade offs between goals and resources. The decision to fund or champion one policy initiative always comes at the expense of another. And the benefits of the two are not always comparable – is reducing poverty, inequality, violence, or environmental damage better?
    If what we’re really interested in is success (which I think is the case) we need to talk about what success is (not just evidence) and how this is achieved. For my money – and I know this is close to your heart too, Duncan – it’s about political change. Political change that is locally-owned, locally-appropriate, legitimate and sustainable. See the Developmental Leadership Program for more along these lines (lots of case studies)
    Like in the UK, other donor countries, and (say it quietly) in DFID and the World Bank even, we need to accept that evidence is part of (and a force for good in!) a complicated process of consensus building, compromise, political and personal competition. To assume differently is Utopian and naive.
    Three cheers for rigorous evidence, but let’s get real about the politics too.

  3. Duncan

    This from Ben Ramalingam (who seems allergic/too busy to comment directly):
    ‘Have enjoyed the posts on results in the last week or so. Just wanted to suggest that one way around the impasse / wonk war is to try and apply a horses-for-courses / portfolio approach.
    See my June 2011 post avoiding civil war in results here: and the Sept 2012 follow-up post, co-authored with Owen, here:

  4. A few points to contribute to this excellent discussion:
    On who should be making choices:
    I should clarify that my intention was to suggest that while in developed countries decisions are public and political (or at least we demand them to be) in developing countries where donors still have a say in how resources are allocated decisions are often made privately and presented as ‘technocratic’. At least this is what the industry tends to say when confronted with claims of ideology: ‘no, no,’ the seem to say: ‘we are doing this because its evidence based (or informed)’.
    e.g. Cash transfers are a new-classical (don’t give services, instead, give money and let individuals maximise their utility by buying the services they want) solution that came out of Latin America at a time of economic liberalisation. In between decades of left wing socialist idealism, governments, and even dictatorships. They served very Latin American purposes: addressed the social pressures associated with budget cuts, were controlling (had conditions) but not too much (market freedom was at the centre of politics at the time), and fitted nicely into a very Latin American political practice: populism.
    When donors and NGOs push them left right and centre as ‘evidence based’ solutions they forget their rich (and interesting and important) political origins and history.
    So whether the idea emerged from research or experience or ideology is, I think, not the most important thing. What is important is: WHO makes the choice because this can define whether it incorporates politics and power or not.
    There is an excellent speech from Woodrow Wilson in which he warns of the threat of technocrats and experts to democracy. If all policy discussions are controlled by experts the public will soon be crowded out from all spaces where decisions about them are being made.
    There is a great study about Argentinean economic policy that shows this sudden change from ‘public experts’ (in political parties, NGOs, universities) or ‘private experts’ (from consultancies, foreign banks, int dev organisations) when it came to advising economic policy in the decade before the last crash there.
    Now, what appears to be taking place is that it is not enough to be an expert on a subject or country or culture, etc. but its more important to be an expert on a particular type of research method.
    In the UK the Institute of Ideas organises a fantastic festival of debates in which the panel is not the expert and has no more time to voice its views than the audiences. Everyone is allowed an opinion.
    This inclusiveness happens because the debates ask important and fundamental questions: what is the purpose of education? and not problem solving ones: how to get more young people from poor backgrounds into university? The former lets everyone in (from all backgrounds and level of specific knowledge and expertise) while the latter makes it easier for the ‘experts’ to control the conversation.
    In Aid (and RCT’s in a way need this) we seem to have answered the fundamental questions (even if we are still asking them in our own ‘donor’ countries) and appear to focus on the technical ones: how to.
    But RCT (if and when carried out properly) should and can also be employed to address these fundamental concerns.
    On studying politics and power:
    I particularly liked the characterisation of this often repeated statement:
    ‘Understanding power and politics and how to assist in social change also require careful and rigorous evidence.’
    What many researchers and decision makers in donor or intervening countries do not seem to get is that the people that live in these countries about which more careful and rigours research is needed do not need more careful and rigorous research.
    The idea that they are able to survive without understanding power and politics and how to navigate the complex social, economic, political and intellectual spaces that they live and work in is not just patronising but also ridiculous.
    The problem is not that ‘they’ (in Malawi, Indonesia, Peru) do not know; the problem is that ‘we’ (in London, New York, Canberra) do not know. And this is only a problem because we want to intervene and make choices for them.
    But if we cannot stop interfering in other people’s business we should at least acknowledge that to be better informed we do not need to reinvent the wheel. We are lucky to have the tools to understand (or get as close to understanding) other cultures, societies and political systems: how about some good old fashion anthropology, sociology, demographics, economic and political history, language studies, etc?
    Why not? Because it takes too long and it is hard work. And because, I think, few Aid workers today have the patience nor the interest they claim to have on the countries, peoples and societies that they work with.
    They are not interested to know them; just to change them.
    On the debate:
    It would be unfair to dismiss the value of RCTs. I am not one to say that they are NEVER useful. They can be and are quite a fantastically powerful source of argumentative power; as well as fun to design and implement.
    But we must not give them more power than they deserve -certainly not at the expense of methods that, as Ed Carr says, help to explain to explain the findings of RCTs. After all, it would be impossible to design an RCT intervention without anthropology, sociology, history, politics, etc.
    And since they are powerful tools we should not forget they are also political ones. Conducting an RCT on programme X is a political statement that programme X is liked, preferred to others that did not even get an RCT, or supported.
    These are not cheap interventions and since they have been designed on theories and theories are inherently ideological (they are after all based on assumptions what how they world is and is not and this is based on our assumptions of human nature and our understandings of fundamental concepts such as justice and fairness) sanctioning one or the other can be seen as a proxy to sanctioning the ideological assumptions that underpin them.
    And here lies the beauty of the debate: by choosing a particular method (and expertise) as the most desired method to inform the manner in which we go about solving society’s problems, its advocates are being just as ideological as those who claim that markets or the State should solve social problems.

  5. Thanks for the debate and summary.
    This bit made me think: “Chris Whitty and Stefan Dercon should give some examples of gold standard anthropological or sociological evidence to allay the doubts over their true commitment to multi-disciplinarity, and take the complex systems question more seriously.”
    Perhaps they should indeed give some examples but I think the purpose of this type of research is fundamentally different from the purpose of an RCT. One of the attractions of RCTs is that by randomising they can deal with the problem of bias that cannot otherwise be measured/adjusted for; and that therefore the conclusions are fairly generalisable. In my view, qualitative and sociological evidence is much more focused on identifying differences and biases and subjectivities within contexts. The methods might be applicable, but while the findings of this sort of research might tell us what we should be watching out for when implementing a given intervention, I don’t think they can ever be generalised. So while an RCT can tell us what effect, all else being equal, we can expect an intervention to have, qual research is essential to figuring out how the intervention may play out in a specific situation.

  6. Not allergic, Duncan 🙂 just too busy to write something substantive last week, and a little reticent about simply posting links…
    Very much agree with Marcus Jenal’s summary – I think the set up as a ‘wonk-war’ may have contributed to both sides talking past each other. In fact, the crux of the two arguments were less contradictory, and more coherent, than the authors suggest. I just wanted to add two points to the debate, drawing on previous writings:
    1. In one previous post ( I suggest that there is more that unites the different philosophies around results than divides them. Specifically:
    – Data availability, coverage and quality are perennial problems
    – Participation and ownership; as Robert Chambers might ask: ’whose results count?’
    – Incentives and disincentives to use information and results, especially when they run counter to individual and institutional interests
    – Bureaucratic inertia: all too often results-related work is placed on top of and increases the already considerable bureaucratic and administrative burden on aid agencies, rather than simplifying and reducing it
    – Risks and fear of failure: How can we manage and be transparent about the different kinds of risk failures inherent to development projects & programmes?
    – Many conflicting imperatives: learning vs accountability, policy vs operations, domestic vs international
    I think it is clear that these issues apply equally to both sides of the results ‘tug of war’. By bridging the divide, we can deal with these collectively rather than in entrenched intellectual silos.
    2. In a follow-up post ( Owen Barder and I argued for what we call results 2.0, which explores what an complexity-informed approach might look like. I don’t know how to post visuals into blog comments, but this is the framework we suggested could be utilised:
    We suggest that the key starting point for any discussion of results approaches is to think through the problem, the context and the intervention at hand. We conclude that “there is no contradiction between an iterative, experimental approach and a central place for results in decision-making: on the contrary, a rigorous and energetic focus on results is at the heart of effective adaptation.”
    I’d suggest that a good follow up to this series of exchanges would be to identify the things that people actually agree on, and to establish a shared agenda for moving this area of work forward? A kind of wonk’s reconciliation commission, perhaps?

  7. If the Big Push Forward is going to help southern CSOs and NGOs participate in the big evidence of impact debate, then it would be performing an important service. To date most of civil society has been absent from this debate and have been passively on the receiving end of whatever the methodological flavour of the month is in the donor community.While many of us complain about the log frames that we need to fill out or the lack of nuance in some of the RCTs being conducted, we have not done much to propose an alternative.
    One point of concern, though: When Roche and Eyben argue that permissible evidence should include ‘other ways of doing and knowing’ and ‘multiple perspectives and understandings of what’s at stake’ it sounds a lot like they are saying that anything goes or at least that it should. Of course cultural and political sensitivity is good, but a suggestion that everyone has their own culturally relative set of definitions and standards is not helpful.
    In the experience of the International Budget Partnership, southern civil society has a deep and nuanced understanding of their own impact.We should find ways to bring that understanding into the evidence debate, not yield to the temptation of scientific relativism that will struggle to get beyond a discussion of ideology and power analysis.

  8. Kate

    The new DFID how to note on Assessing the Strength of Evidence ( is interesting given your comment on whether DFID does actually favour RCTs. The guidance notes the value of multiple methods, but overall suggests that strong evidence = experimental designs and systematic reviews. (I’m sure my reading of it will get slammed by DFID staff – apologies in advance for any misinterpretation, and would be good to hear more from those in the know.)