Dean's Speech: The science that dares not speak its name by Professor Gabriel Leung at 2018 ST Lee Oration, University of Sydney

21 August 2018

(Professor) Andrew (Wilson), (Professor) Adam (Elshaug), distinguished guests, colleagues, students, ladies and gentlemen,

Thank you so very, very much for giving me this honour of being the 2018 ST Lee Orator this evening, completing the first decade of this annual lecture series. I am particularly pleased that through this lectureship, I may now claim to have become part of the scholarly fraternity that has been carefully curated and endowed by Dr Lee Seng Tee (李成智), his father Dr Lee Kong Chian (李光前) and the Lee Foundation since the 1950s. One look at this rich worldwide web of distinguished institutions affirms the scholarly legacy the Lees have laid over two generations, and, may that long continue. This once again testifies to the critical partnership role of private philanthropy in academic life. To the business community then, such largesse must be seen as an investment that provides excellent value – a topic to which I will return.

Today I should like to share a few thoughts on the topic of clinical evaluative science, more commonly and better recognised as technology assessment writ large. I would assert that it was Archie Cochrane who christened the birth of this field through Effectiveness and efficiency: random reflections on health services published in the year that I was born, 1972.  Therefore we are just a few years shy of its half century. It may be an opportune time to look forward by looking back.

Cochrane, who is now universally celebrated as the patron saint of clinical evaluative science and whose eponymous Collaboration provides the gold standard of clinical evidence, was an irreverent anti-establishment maverick. His detractors, of which there were many and particularly from the preeminent professorial ranks, often pilloried his person and his work. These clinical authorities would not recognise his indefatigable pursuit of evidence, as opposed to expert opinion, as a proper science.

For that one had to await the dynamic duo of David Sackett and Gordon Guyatt of the then fledgling McMaster medical school in Ontario, Canada.  Encouraged by the founding dean John Evans, incidentally Tim Evans’s father who also established the World Bank’s Population, Health and Nutrition division that his son now leads, they mentored a whole generation of like-minded individuals who together developed the discipline of clinical epidemiology and importantly popularised its application at the bedside as evidence-based medicine (EBM).

In 1994, Sackett “retired” from McMaster only to start a new programme at Oxford and continued the evangelisation of EBM using John Radcliffe Hospital as his base. The Oxonian ground of medical scientificism had long been most fertile of course, having been brilliantly tilled by another scientific father-and-son pair – Richard Doll and Richard Peto of the Clinical Trial Service Unit and latterly Green College.  Doll and Peto literally defined the conduct of clinical trials, mega cohorts and meta-analyses.

The final and most recent iconic builder of the scientific base of clinical epidemiology I should like to highlight is John Ioannidis, a New York-born Athenian at Stanford who has broadened meta-analyses to meta-research more generally. His mission has been to increase the yield of validated and useful findings through rigorous empirical evaluation.

In the parallel universe of economics, to address the “efficiency” half of Cochrane’s original magnum opus, there are John von Neumann and Oskar Morgenstern who together founded the field of game theory and economic behaviour.  Incidentally, this work was undertaken at around the same time after the Second War as when Richard Doll and Austin Bradford Hill carried out their landmark smoking and lung cancer case-control study. Directly relevant to clinical evaluative science, they were the first to show that under certain axioms of rational behaviour, a decision maker (be her a patient, payer or provider) will maximise the expected value of her utility function when faced with a set of probabilistic outcomes. This preference-based expected utility theorem laid the foundation of game theory generally, and particular to technology assessment, the basis of the quality-adjusted life-year (QALY). The QALY metric and process are of course central to all cost-effectiveness research. It was however a pair of Harvard professors who applied the von Neumann-Morgenstern theorem to clinical decision making. They are Harvey Fineberg, who was successively dean and provost at Harvard before becoming President of the then US Institute of Medicine (now National Academy of Medicine), and Milt Weinstein.  

Thus the twinned strands of epidemiology and economics have become intertwined as tightly as the DNA double helix to form the scientific basis of technology assessment. Little wonder then that governments around the world have instituted clinical evaluative agencies, or at least embedded such programmes within the health system, to implement this new science during the past two decades.  In fact the 2016 ST Lee orators in their double act explained how their respective countries set fundable priorities at the UK’s NICE1 and Thailand’s HiTAP2.  

Except for some growing pains and bumps typified by the Oregon experience along the way, it has actually been, perhaps surprisingly, more or less so far so good worldwide, until now potentially. Here is why I worry. While technocracy may have reigned supreme, recent changes in the body politik around the world could have knock-on effects thus changing the way we make health care decisions. The most obvious system-wide example is of course the spectacular, albeit thankfully still failing, attempt by Donald Trump to dismantle the Affordable Care Act.  

Don’t just take it from me. David Runciman, Professor and Head of Politics and International Studies at Cambridge just published a new book How Democracy Ends chronicling, amongst other contemporary phenomena, the denigration of expertise and celebration of ignorance; in other words, the wholesale repudiation of technocracy. This observation has already translated into everyday work vis-à-vis drafting guidance for FY2019 federal budget requests issued by the US Department of Health and Human Services to the CDC. These are the seven infamous terms that dare not speak their names. It led to outcries of Orwellian censorship and sparked a huge backlash in the public health and scientific communities, amongst whom “#sciencenotsilence” quickly went viral.  

Mind you, one should not blame it all on Trump. Fact check: such censorship edicts that concern making use of scientific evidence in health care decisions had already been issued by the Obama administration.  As part of the Affordable Care Act, a Patient-Centered Outcomes Research Institute or PCORI in short was to be established. However, and I quote from the relevant paragraph in the legislation  :

PCORI…shall not develop or employ a dollars per quality adjusted life year (or similar measure that discounts the value of a life because of an individual's disability) as a threshold to establish what type of health care is cost effective or recommended. The Secretary shall not utilize such an adjusted life year (or such a similar measure) as a threshold to determine coverage, reimbursement, or incentive programs under title XVIII.

In other words, there should be no mention whatsoever of QALYs or cost-effectiveness. I rather suspect it was a political compromise to garner the necessary votes in Congress.  The White House was probably very keen to counter the mischaracterisation of Obamacare as an extreme form of “socialised medicine” and to respond to charges of  “death panels” sending granny to an early grave. 

Whatever the reasons, we are indeed in an uphill battle to preserve the role of technocracy and to improve on it, lest we allow “fake disruption followed by institutional paralysis, and all the while the real dangers continue to mount. Ultimately, that is how democracy ends,”3 so concludes Runicman.

As the scientific community come together to fight this good fight, there is an important distinction to make. We must not allow technocracy to morph into epistocracy.  In other words, the rational deployment of science to understand how the system could work optimally must not turn into “rule by the knowers”. As Runciman puts it, “technocracy is more like plumbing than philosophy”. While it is virtually impossible to define who the “knowers” are – should we administer tests to vet potential voters? – there is no lack of prominent advocates of epistocracy, from Plato to John Stuart Mill to the contemporary philosopher Jason Brennan.  The frightening eventuality that the knowers would deem themselves infallible is one reason why the masses may have turned against experts. To scientists, Galileo’s fate at the hands of the Vatican’s infallibility doctrine still chills. To err is human as Pope, Alexander that is, not the lodger at the Apostolic Palace, famously wrote. Therefore, the self-correcting feature of democracy, reliant on the wisdom of and importantly the vested mandate by the masses is the Churchillian least worst form of government. Of course, as with authoritarianism or indeed any other form of government, it needs to be buttressed by the intelligence and diligence of technocrats who would generate the evidence base on which political decisions can be sensibly made.

So phew! There is still a job for us after all.

Now I would like to share with you one aspect of my own research journey during the past two decades as I grew with the developing field of clinical evaluative science. In particular, I show how the popular will and contemporary sociopolitcal trends have shaped the direction of the science I have undertaken.

My first project as a wet-behind-the-ears junior lecturer at HKU was looking at mammography screening in context. It was an apparently foolish choice since the practice had long become usual care in the West and that the Hong Kong authorities were about to expand coverage after a few pilot offerings. I also carried the double burden that my head of department, an aetiological epidemiologist, did not think clinical evaluative science and particularly its translation into health policy would be sufficiently “academic” to prove my worth. If I were to insist on doing work in this field, it would have to be in my own time. During the week, I must burnish my credentials publishing high-impact papers as a card-carrying conventional (read aetiological) epidemiologist. In retrospect, however, it was one of the most important choices I made as a public health academic. Even the authoritative US Preventive Health Services Taskforce recommended, in its latest 2016 guidance, a narrower eligible age range, questioning the benefits of mammography for younger women with lower baseline risk – more on this later.  

The story begins with understanding the biology of breast cancer.  A test can only work optimally when there is a detectable pre-disease state. The fact is that pathologists have yet been able to identify such a precursor lesion, quite unlike say colorectal or cervical cancer. DCIS or ductal carcinoma in situ is not it I am afraid, contrary to popular myth.

That said, we first carried out a meta-analysis of the eight randomised controlled trials of screening cumulatively involving half a million Caucasian women and concluded that in those populations there would likely be 20% fewer deaths due to breast cancer. We assumed that the same risk reduction would apply in Hong Kong Chinese, even though our disease rates are less than one-half of those found in western women with a demonstrably different age distribution.

We then asked the next logical question:  if there were new funding to save lives with mass cancer screening generally, what would give the best bang for the buck?  We considered the only three cancers which could apply to all women and that there was a validated screening test – namely breast, cervical and colorectal cancers.  Here is a set of efficiency frontiers from a generalised cost-effectiveness analysis, which Chris Murray, David Evans and Tessa Tan Torres at WHOHQ was very keen to push as part of the WHO-CHOICE programme at the time. The curves show that we should invest in ensuring all women receive free Pap smears every three years first and foremost, then cover colonoscopy every ten years for those aged 50 or over, and if we still have extra funds consider mammography for those on the wrong side of 50. In Hong Kong, the most recent government Budget promises mass screening for colorectal cancer but we are still leaving women out of pocket for Pap tests and HPV vaccines which would avert deaths from cervical cancer. Therefore consideration for the less cost-effective publicly-funded mass mammography would be premature.

Next we asked what if we only focused on breast cancer and set out to prevent as many related deaths as possible, disregarding other cancers for the sake of argument. We looked at the entire spectrum of services, from screening to palliation and compared current standard of care offered by public hospitals with best available care that money can buy.  Again, the efficiency curves show that in descending order of value for money, we should reduce waiting time, enhance home-based palliative care for those close to the end of their suffering, provide aromatase inhibitors and endocrine adjunctive treatment where appropriate, and then if there were still additional funds available consider screening well women at average risk of disease by mammography.

Both sets of efficiency analyses converge in that screening mammography would yield the least bang for the buck at the societal level, even assuming it works just as well in Chinese women who present differently compared with their western counterparts.

In parallel, Peter Gotzsche who directs the Nordic Cochrane Centre began casting serious doubt about the methodological rigor of the eight original trials, five of which were carried out in the Nordic countries, thus implicating the validity of the beneficial effect of mammography thereby demonstrated. In part, he had been prompted to question mammography screening when he noticed that breast cancer death rates appeared no different in communities that had a population programme compared to those that did not. A couple of prominent groups in the US  and UK  also raised queries about screening effectiveness as well as disproving the putative benefits of screening those in their 40’s   .

So far I have presented to you the technocratic answer to the screening mammography question. In so far as that the cumulative corpus of findings, both our own and those of others elsewhere, appear to converge should give us added confidence in their robustness. In other words, we are as certain as we can be that the qualitative direction should be correct, if not the precise magnitude of effect.  Archie Cochrane summed up my feelings better than I can when he wrote in Random Reflections: “what I decided I could not continue doing was making decisions about intervening when I had no idea whether I was doing more harm than good.”

On the other hand, the interests of radiologist mammographers and breast surgeons in alliance with women’s rights groups, cancer NGOs and the glitterati do-gooders (including our lady Chief Executive) have mounted formidable resistance that does not address the science head on, rather appeals by postmodern relativist persuasion and Confucian ethics of care arguments. While this is not exactly populist bombast, the net effect is similar if not stronger precisely because it is much more in tune with conservative Chinese sensibilities, even in the most westernised city of Hong Kong.

I keep reminding myself to practise what I preach, by not allowing technocracy to turn into epistocracy. Moreover, it is likely that there is indeed a useful role for mammography screening for a segment of the population, just not for all well women, however defined by age alone. After all, this is precision preventive medicine, which targets those whose benefit to harm ratio is greater than unity. Therefore, we should find a way to inform individual women their personalised risk, preferably taking into account their own attitude to risk, present it in an easy-to-understand manner, then let them decide. Return the kratos, or power, to the people while keeping the knowledge (episteme) or know-how (techne) separate from it. Fixing power to knowledge, however justifiable and sensible it may seem at the outset, is a recipe for disaster in the long run as I have tried to explain earlier.

This is exactly what we are attempting to do at the moment.  Rather than stratifying risk simply by age, which have led to our previous conclusion of an all-or-none policy recommendation for mass screening for a specified age cutoff, we reason that risk assessment could be personalised firstly by applying a prediction algorithm taking into account the traditional risk factors as well as a genetics risk score. The more astute amongst you will immediately realise that the feasibility of this precision preventive measure would depend on a sufficiently cheap multiloci gene chip.  Second, we benchmark the predicted absolute cancer risk with inferred thresholds of well established mass screening programmes elsewhere. Finally, we personalise this threshold by each individual’s risk averseness as assessed by standard gamble. Therefore, one can begin to foresee an integrated, self-administered interface that would be able to facilitate personalised decision making in a more precise, evidence-based way. The power of making one own’s decision returns to the people albeit informed by robust science – QED.

Of late, clever business types have recast the science of cost-effectiveness decision analysis into a discourse about value. One of the cleverest amongst that ilk is Michael Porter at Harvard, whose eponymous “five forces” are jokingly parodied as the strategy consulting world’s Obi Wan Kenobi blessing: may Porter’s Five Forces be with you! Fundamentally however, my rather pedestrian medical brain cannot detect any difference between Porter’s “value” reconceptualisation and what I have learnt and practised in the past two decades.  

Be that as it may, here goes the refreshed proposition. Complex, chronic, non-communicable diseases impose the dominant burden in any health system, even in resource-poor populations.  The care cycle involves multiple specialties and numerous interventions over decades, usually from diagnosis until death.  Providers create value by their interdependent and cumulative efforts over the full cycle of care. Thus value should be measured by tracking outcomes and costs longitudinally. “What is not measured cannot be improved”, so goes the Druckerian mantra. Therefore value can only be enhanced by measuring and comparing outcomes and costs.

We recently embarked on a value of diabetes care project to estimate and compare outcomes of cohorts of diabetics across the Asia Pacific. In fact this project forms part of an ongoing collaboration with the 2014 ST Lee Orator Dr Karen Eggleston of Stanford University, which had in turn grown out of work with her doctoral advisor Professor Joe Newhouse at Harvard.

Using OECD data, we compared avoidable admission rates and spending for diabetes-related complications in Japan, Singapore, Hong Kong, and rural and peri-urban Beijing in the period 2008–14. We found that spending on diabetes-related avoidable hospital admissions was substantial and increased during the period of observation. Annual expenditures for people with an avoidable admission were 6-20 times those for people without an avoidable admission.  In all of our study sites, when we controlled for severity, we found that people with more outpatient visits in a given year were less likely to experience an avoidable admission in the following year, which implies that primary care management of diabetes has the potential to improve quality and achieve cost savings.

Given these ecological observations, we proceeded to assess the potential value that can be brought about by optimal routine diabetic care. First, we specified a highly sensitive and specific case definition in order to prosecute the population database of the Hospital Authority – Hong Kong’s National Health Service equivalent.  We then formally estimated the burden of diabetes in terms of prevalence and incidence.

Next, we developed prediction algorithms for death and cardiovascular disease amongst diabetics, using the same local data source and validated them in Singapore’s diabetic population cohort maintained by the Ministry of Health.  The prediction rules demonstrated better discrimination and calibration than existing models.

Then we were ready to estimate the value of care by assessing changes in mean modifiable five-year risk as per the prediction equations and compared across four biennial periods. We computed the latest risk estimates in each period keeping age and diabetes duration at baseline values, thus the difference between risk estimates reflected changes in risk factors potentially attributable to clinical care. This next slide shows modest reduction in absolute risks from baseline, ranging from 0.44% to 0.85%. We are currently carrying out the costing exercise, that is estimating the denominator in the value equation of improvement in clinical outcome per unit cost. The cross-country comparison should provide interesting data with direct policy relevance, especially in this 40th anniversary year of the Alma Ata Declaration.

As you can tell by now, there is much beyond mastery of epidemiology and decision science in clinical evaluative research, as my own journey testifies. I am a firm believer that there remains an important role for the “jobbing public health practitioner”, who is a generalist with a strategic command of the full ecoscape of all that matter to population health but one who can at once become a competent, even expert, specialist in a specific area when called upon. Or as the 2017 ST Lee Orator Professor Kee Seng Chia put it, “one mile deep and one mile wide”. We need public health leaders who are fashioned after Thomas Young, dubbed “the last man who knew everything”, “who proved Newton wrong, explained how we see, cured the sick and deciphered the Rosetta Stone, amongst other feats of genius”.

Ladies and gentlemen, let me finish with a postscript on the title. Not only does it reflect my genuine worry about a post-truth, post-technocratic era instead defined by blind populism, the title borrows from the last line of Lord Alfred Douglas’s Two Loves. The final stanza goes:

…What is thy name?' He said, ‘My name is Love.'
Then straight the first did turn himself to me
And cried, ‘He lieth, for his name is Shame,
But I am Love, and I was wont to be
Alone in this fair garden, till he came
Unasked by night; I am true Love, I fill
The hearts of boy and girl with mutual flame.’
Then sighing, said the other, ‘Have thy will,
I am the love that dare not speak its name.'

Bosie was of course better remembered as Oscar Wilde’s friend and lover. I made this free association because the last time Andrew and I saw each other was during the second WHO WPRO Universal Health Coverage Technical Advisory Group meeting in Manila, when all the Aussies in the room had been paying more attention to refreshing their smart phone screens anticipating the referendum results on legalising gay marriage than the proceedings at hand. Well done my Australian friends. Let us pray that the world would not need to wait another 123 years, the time between when Two Loves was published and the vote, for clinical evaluative science to be fundamentally embedded in routine medical practice. For that to happen sooner, the Menzies Centre’s work takes on special importance. Good luck and thank you , thank you  and thank you again  for this privilege of sharing my thoughts with you today.

1 National Institute for Health and Care Excellence
2 Health Intervention and Technology Assessment Program
3 Runciman D. Is this how democracy ends? London Review of Books. 2016;38(23):5-6.