Can we demonstrate effectiveness without bankrupting our NGO and/or becoming a randomista?

Back in March there was a fascinating exchange on this blog between Ros Eyben and Claire Melamed on the role of measurement in development work (my commentary on that debate here). Now one of Oxfam’s brightest bean counters (aka ‘Programme Effectiveness Adviser’), Karl Hughes, explains where Oxfam has got to on this: Eric Roetman, in a recent 3ie working paper, A can of worms? Implications of rigorous impact evaluations for development agencies, tells a provocative tale of the experiences karlof International Child Support (ICS) in Kenya carrying out randomised control trials (RCTs) in partnership with several  world-renowned quantitative impact evaluation specialists.  ICS saw itself evolve into a “development lab”, where the bulk of its staff became devoted to supporting the organisation’s research, as opposed to development, operations.  Given ICS’s desire to revert back to its roots, it eventually opted to get out of the RCT business. ICS’ story relates directly to issues further explored in another recent 3ie working paper I recently co-authored with Claire Hutchings, another one of Oxfam GB’s global MEL advisers, entitled Can we obtain the required rigour without randomisation?  Oxfam GB’s non-experimental Global Performance Framework.  The central issue is this: We in the international NGO community are all too aware of our need to up our game in both understanding and demonstrating the impact – or lack thereof – of the various things we do.  But what really baffles us is just how to do so without going down the “development lab” route.  (This is not to imply that “development labs” are bad; in fact, the more their findings inform our programming, the better.) The bottom line, as outlined in our paper, is that evaluation is research, and, like all credible research, it takes time, resources, and expertise to do well.  This is equally true no matter what our epistemological perspective – positivist, realist, constructionist, etc.  This is perhaps why, rather than using  those offered by mainstream academia, we as a sector are so quick to experiment with seemingly more doable alternatives such as Most Significant Change, social return on investment (SRI), outcome mapping, and participatory M&E.  They’re all very well, but those of us who feel a need to go further find ourselves at a loss. One popular way of attempting to demonstrate effectiveness, being pursued by several international NGOs, which we comprehensively bash in the paper, is dubbed “global outcome indicator tracking.”  Here, the organisation in question gets all its programmes/partners  to collect common data on particular outcome measures, e.g. household income.  All these data are then aggregated (only the gods know how) to track the welfare of global cohorts of programme “beneficiaries” over time.  If there is positive change in relation to the indicator from time 1 to time 2, the organisation can boast about how much impact it is generating.  Aggregation complexities aside, the underlying foundations of this approach are inherently precarious.  In general, outcome level change is influenced by numerous extraneous factors, e.g. rainfall patterns in rain-fed agricultural communities.  Consequently, even if we are able to capture reliable data on a decent outcome indicator, its status will go up and down and all around not matter what our interventions are and/or how well they are implemented.  Any consideration of attribution is entirely absent. But what of the fact that donors have been encouraging us to pursue outcome indicator tracking for decades now through instruments such as the logframe, as part of ‘good practice’?  In a paper entitled, The Road to Nowhere, Howard White argues that the United States Agency for International Development (USAID) identified the futility of the outcome indicator tracking strategy some years ago and, consequently, abandoned it.  I worked on a USAID funded orphan and vulnerable children (OVC) programme from 2005 to 2010, and yes we were only required to report on outputs, so perhaps this was the consequence of this realisation.  (Incidentally, USAID also came bean counterto the realisation that there was no evidence-base established on what works and what does not in OVC programming after all the billions that it spent and seems to regret not having supported the rigorous evaluation of key OVC care and support interventions.)  To what extent have the other donor agencies recognised the fallibility of outcome indicator tracking?  Sadly, there is plenty of evidence to suggest that many are still operating in this outdated paradigm. So where does this leave us as NGOs?  While Oxfam GB has not come up with a panacea, it is attempting to pursue a strategy that is reasonably credible.  Each year, we are randomly selecting and then evaluating, using relatively rigorous methods by NGO standards, 40-ish mature interventions in various thematic areas.  The causal inference strategy differs depending on the nature of the intervention.  For community-based interventions, for instance, where we are targeting many people (aka large n interventions), we are attempting to mimic what RCTs do by statistically controlling for measured differences between intervention and comparison populations.  Evaluating our policy influencing and “citizen voice” work (aka small n interventions), on the other hand, requires a different approach.  Here, a qualitative research method known as process-tracing is being used to explore the extent to which there is evidence that can link the intervention in question to any observed outcome-level change. It is not that the above approaches are free of  limitations.  In the case of large n interventions, for instance, given that programme participants have not been randomly assigned to intervention groups, coupled with the conspicuous absence of proper baseline data, we cannot absolutely guarantee that any observed outcome differences are the result of the workings of the intervention in question.  The  process tracing approach is also retrospective in nature, when ideally the research should take place throughout the life of the advocacy or popular mobilisation initiative.  But, hey, what we are doing is not too shabby, especially considering that we are no “development lab.”  Moreover, every evaluation design –  even the golden RCT – has  inherent limitations.  Nonetheless, if anyone has any suggestions on how NGOs in general and Oxfam in particular can do a better job at both understanding and demonstrating impact, I’d love to hear them.]]>

Subscribe to our Newsletter

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please see our .

We use MailChimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to MailChimp for processing. Learn more about MailChimp's privacy practices here.


17 Responses to “Can we demonstrate effectiveness without bankrupting our NGO and/or becoming a randomista?”
  1. Thanks for the great post. It’s always fascinating to hear how NGOs are tackling these issues in practice, so I really appreciate the update. Please let us know how you get on.
    I couldn’t agree more about the problems with tracking outcome indicators. Unfortunately, there still seems to be a lot of it about. I recently blogged on the same point. Some of the best alternative approaches to generating management data (rather than evaluations) seem to come from building bottom up tools that assess performance from beneficiaries’ views. There are some fascinating examples emerging. See
    Are you looking at this kind of management approach? I’d also love to hear how you are intend

  2. Thanks for this post, and the interesting accompanying report.
    I have a couple of comments. The first relates to the discussion of RCTs. I think there are other reasons why RCTs are not be the most appropriate means of evaluating the impact of a programme or organisation. The value of an RCT is in testing the efficacy of an intervention (or set of interventions), so that the most efficacious ones can be promoted and hopefully adopted as policy. Any organisation that is trying out a new intervention should seriously consider testing it using an RCT. The purpose of such a test isn’t to prove the value of an organisation or a programme, but to further knowledge about what works.
    But I don’t see RCTs as appropriate means of evaluating programmes if they are using interventions that have already been shown to be worthwhile. Not because I don’t think they can provide the answers, but because I think ultimately, if we are using good interventions, we shouldn’t be interested in having “controls” any more, we should be trying to get everyone into the intervention group. And also because experimental studies are expensive and complicated, as you point out.
    My second comment is about the business of evaluating the work of NGOs. The proposals in this post and the report are very useful, and as an independent I was very happy to see the report mention the problem of consultants being engaged to evaluate based on vague terms of reference and poor data. I’d like to add something to your discussion on attribution. While it is true that attribution is difficult because of the “noise” around an organisation’s work, in some cases I think it is impossible, irrespective of the noise. It is increasingly that what a lot of NGOs do is deliver one or more “components” of a programme or intervention, but not the entire intervention. In some cases it might be possible to evaluate and attribute the NGOs’ contribution to the success or failure of the programme: for instance, if each component has an incremental effect. But what if some of the components are essential but not on their own sufficient for success? Here’s an overly-simplified example. An HIV programme in a given town might be dependent on condoms or treatment being available as well as the population knowing about them and being willing to use condoms and take treatment. The availability of condoms or treatment, on their own, are not sufficient for impact. Population knowledge and ability are not sufficient either. Different organisations might be delivering each component, contracted by different government departments or donors. Even with excellent data, none of the implementing organisations can claim responsibility for the impact. The challenge here is to shift thinking from donors, governments and NGOs toward more collaborative forms of impact evaluation which recognise the roles of each actor.

  3. Nice post, thanks. I have found these two papers by Victora, Habicht and others (links below) to be very helpful when thinking about these issues. As you imply, not all evaluations have to be RCTs – indeed, for many interventions RCTs are not possible. The papers lay out a continuum of evaluation – from adequacy, to plausibility, to probability evaluations. NGOs such as Oxfam may most often find themselves at the adequacy end of the continuum – and that is appropriate. Well done adequacy evaluations of development interventions are needed, but are in short supply.

  4. Karl Hughes

    Just a couple of quick responses to these useful comments:
    Overall – I am in basic agreement will all points made.
    Alex: All of the evaluation reports will be available from the OGB website, as per our open information policy. Hopefully, some of the more interesting work will be published in journals. We also hope to organise some learning events around reviewing the findings, perhaps each year, inviting some critical friends to keep us in check. However, this is all emergent at the current time.
    Katerine: You can read more in Eric’s paper, but it was not that ICS did not find the research valuable; it was more a case of undesired mission drift.
    Alan: The title of the blog is all Duncan’s doing – not mine! Personally, I find the work of “development labs” such as J-PAL and Innovations for Poverty Action very valuable and, as NGOs, we need to start informing our interventions with their findings (where relevant). We already openly acknowledge the fact that facilitating development and transformative change is not a simple thing and should, therefore, welcome any of their useful insights with open arms.
    Matt: Yes, could not agree with you more. Bearing in mind external validity issues, if we have good evidence that something works, we should focus ourselves on ensuring the sound implementation of the intervention in question. In my view, there is a lot of bad implementation out there, and NGOs, as well as other actors, could certainly up their game in this area as well.
    However, many NGO-type interventions are emergent, the outcome of participatory, bottom-up, and contextually grounded planning processes. Many NGOs also do not implement standardised intervention models. I guess this is where the innovation comes in. Moreover, many of these emergent, untested interventions are in their developmental stages, and may not be “ripe” for rigorous RCT style evaluation. Michael Quinn Patton argues that his “developmental evaluation” approach is much more suited for such interventions. This is where evaluative questioning, reasoning, and whatever evaluative evidence happens to be available are used to develop and improve the intervention. This approach is particularly suited for advocacy interventions, but even, I would argue, some large n type interventions as well.
    Your point about intervention components and the fact that different actors may be responsible for delivering each is also very relevant. As we randomly select projects, we are directly confronted with this issue. And, in fact, we don’t try to separate the project out from any larger intervention that it may be a part of, which may involve other actors (including other Oxfams). In fact, if Oxfam or a partner has been working in an area for a good number of years through various projects, we can only really evaluate the entire package, particularly if the projects are inter-related.

  5. Karl, thanks for your response. I take the point about the character of NGO-type interventions. I’m an NGO type myself, but I sometimes wonder if this can be a little over-indulged. I think there is some pressure on NGOs to be seen to be different and innovative, but while we don’t know the answer to everything (far from it) I wonder if NGOs are sometimes slow on the uptake of the things we do know.
    I think your commitment to evaluating collective efforts is really encouraging by the way, I hope it catches on.

  6. Hi
    Re Karl’s comment "Many NGOs also do not implement standardised intervention models. I guess this is where the innovation comes in." – readers may be interested in reading a recent blog posting of mine which looks at what can be done to assess impact where interventions are not standardised (and thus not suited to an RCT). It is titled "Relative rather than absolute counterfactuals: A more useful alternative?" and can be found here at Karl has added some comments on the end of this blog posting

  7. Very good post. I will definitely look into the full paper and the Oxfam evaluation strategy.
    However, we should aim higher, through cooperation like Alnap, or perhaps donor-led, we should be able to compare approaches not only within an NGO, but across actors.
    While RCT is methodologically tempting, it is far better to do what is realistically as good as possible, than aim for the best, and fail dismally, or just not do it at all in the end.

  8. Cathy Shutt

    Great to see a pushback against the tracking global indicators models. There has been far too little discussion about the implications of the results agenda on the organisational management practices of complex international organisations. Given Oxfam’s influence on the sector I hope others will engage with that part of the paper as well as the critique of RCTs. Nice to see the necessity of expensive baselines being questioned too. I am glad you have flagged different epistemological positions and would be interested to learn more about whether the methodological implications of different positions are discussed within Oxfam? In my experience these differences are too often glossed over, which can result in evaluation methodologies that seem to include quite contradictory and methodologically incommensurable elements. Similarly, do you talk much about the values underpinning different inquiry paradigms?
    What most interested me in your paper is the 4.3 Searching for Signatures and Smoking Guns/ section. We definitely need to do better qualitative contextual research to understand how and why change happens and our contribution to the process. I feel that developmental evaluation approaches have something to offer all evaluations (even those with a large n) How extensively does Oxfam explore the effects of Oxfam and partner staff’s implementation models, attitudes and behaviours on change? The kind of issues that Rick talks about in his blog? Do you use complexity thinking at all to focus lenses on individuals and relationships? Many of the ‘successful’ initiatives are largely dependent on individuals, yet they are almost always invisible in evaluation reports. It is a difficult and sensitive issue for partnership organisations to tackle. How do you go about it?
    And, lastly has Oxfam any experience using participatory numbers/statistics promoted by Robert Chambers and Carlos Barahona within your evaluation approaches?

  9. Thanks a lot for a very thoughtful article, triggering also excellent comments.
    I am responsible for measuring and reporting on the development results in the International Finance Corporation (IFC). Many of the issues you are struggling with are also relevant for us.
    In my view, no single indicator or method can give you all the answers, and what is important is to combine different approaches for maximum effect: Ultimately to find out what works and what doesn’t and to continuously adjust your operations to improve results.
    The higher up on the results chain you go (from inputs, to outputs, to outcomes, to before-and-after comparisons, to ultimately impacts allowing you to attribute a result to your activities), the more difficult, costly and time-consuming it typically gets to obtain the necessary information.
    Ideally your monitoring system should give you sufficient information up to and including outcomes (with as much as possible standardized indicators, to allow you to compare and aggregate results), but it is often very difficult to track impacts across your portfolio.
    As a general rule, if a lower level indicator already tells you something isn’t working, try to understand why and don’t waste time on the higher level indicators. For example, if you are not reaching the intended beneficiaries (or only very few of them), you don’t need to check whether their livelihoods are improving – focus on why you are not reaching them.
    That’s where evaluations come in – and there are many different evaluation methods, RCTs being just one of them: If used strategically (and we are still working on improving our own evaluation strategy), they can help fill the gaps the monitoring system can’t address.

  10. I would like to echo Cathy’s first statement. I have been frustrated at the renewed interest in metrics and indicators–not that I don’t think they are useful–in the social enterprise/entrepreneurship and PPP sector. What is difficult to explain is that the data is out there and just how much energy it will take to get it.
    I also greatly appreciate the helpful links by commenters Jaime, Rick, Peter and Alex.
    It sounds banal but it really is all about triangulation and striking a balance between knowing and doing.
    RCTs are great, but even in the best case scenario they are not enough; at the same time, we can only do so much measurement without compromising what we are doing.
    Until we find some magic feedback mechanism that flawlessly tells us whether people prefer what they are getting to what they could otherwise get (like the profit motive in business) I think we will always be working to find that balance.

  11. Thanks for a really interesting article, report and discussion. Two points:
    1 – Chris Blattman had a nice phrase (link below), arguing that NGOs should do R&D not M&E – implying Oxfam shouldn’t randomly pick the projects to evaluate but rather evaluate the projects which offer the greatest likelihood of an interesting result. But I can understand why Oxfam would choose to M&E, given the strength that it gives to communications. I supposed this is the tension between being a development lab (valuing possible future improvements) and an NGO (valuing effective communication, due to budget concerns). The unexpected point is that incentives point the academic to cherry pick interesting findings, and the NGO to randomise which interventions to study.
    2 – The report suggests NGOs should establish links with universities. I would suggest a more pragmatic approach – aim for PhD students. I suspect that while many interventions are difficult to evaluate there are a large number where the data exist but the time/skills don’t. A suitable PhD student (best found through faculty) offers more attention than faculty can, a much cheaper alternative (given that a lot of PhD students would love such an opportunity, for free) and the chance of a long-term relationship. Alas, the suggest comes to late to help me!

  12. Jerry Adams

    Very helpful and encouraging. It is important to challenge the myth the RCT’s are a panacea when they can destroy what they are attempting to measure (a bit like nailing jelly to the wall – apologies to Mike Edwards for pinching that). I am moving towards keeping impact studies to a reasonable scope and scale with a view to developing an incrweasing body of evidence that is critiqued and updated, rather than trying to attempt a study that tries (and fails) to answer all of the issues.

  13. Thank you for this very useful post.
    I wonder if the solution is to, instead, to look at inputs and activities.
    A couple of years ago I had to interview someone for a job as communications manager. We asked her what processes she would incorporate to make sure that the outputs coming out of her team were of high quality. We were looking for a list of processes and maybe a description of steps to follow (peer reviews, external reviews, etc.):
    ‘Hire the right people’, she said. If they are good a their job then the outputs will be good.
    I thought that was brilliant. I think now that we spend too much time worrying about processes and outcomes when we should be looking at what is the cause of all our problems (if there are any).
    Given that we cannot control the effects of an intervention and we do no have evidence that something always works, we should turn to those things that we can control and can ‘measure’.
    My focus is mostly influencing interventions. Take an initiative to influence education policy. We do not know if a media strategy will work or not. We may decide that given the current context, this is a good option: education stories tend to sell as they concern parents and children. But it is perfectly possible that the recommendations made by the NGO or think tanks are never properly picked up by the media, that other issues concern them more, etc.
    We could spend years trying to assess if we had ANY influence (and some organisations will bend the truth to demonstrate it) but it would best to look at what we can know and can control. What can we know:
    Was the research good?
    Were the researchers competent?
    Was the media strategy properly developed?
    Was it properly implemented? (did the press releases go out on time and were they properly written, etc?)
    You will probably find that most of the time, failures can be explained by either bad luck or poor practice (and often this is linked to having poorly qualified staff).
    If NGOs want to do one thing then they should show that they have the right people and are doing the right things -and that care has gone into it.
    The chances are that properly planned, by qualified and experienced people, NGO interventions will be more successful (and fewer) and ‘showing’ impact or value will be much easier -it will be much more obvious.
    Duncan: Couldn’t agree more Enrique. I’ve always been struck by how cavalier the job recruitment is – on the basis of some paperwork and a couple of hours of interviews in an incredibly false setting, and we make decisions that will profoundly affect the organization (and the people hired/rejected). Wouldn’t it be better to ask the best 2 or 3 to each come and work with us for a week (paid) before we make the decision? The only decision which is even more baseless is probably buying a house after a half hour visit…….

Leave a Reply

Your email address will not be published. Required fields are marked *