Are machines intelligent or are they just “intelligently” imitating humans? Dr. Mark Gooding highlights the difference, when it matters, and when it doesn’t in the field of healthcare.
Rapid advancements in artificial intelligence (AI) have ushered in a new era of automation with cutting-edge systems and technologies based on machines that can be trained to learn, predict, and imagine. But as with all disruptive technologies, AI is sparking both fear and fascination in a wide range of domains, including healthcare.
While AI has shown great promise to improve clinical outcomes, assist decision-making, and solve long-standing scientific issues, its overwhelming potential can be frightening to many. At our current rate of progress, it is easy to imagine that hospitals may one day replace human doctors with robots and AI entirely, making presently valuable medical skills obsolete. Looking back, these fears are far from new as they can be traced back to the dawn of computers when scientists first imagined the possibility that machines may evolve to think for themselves.
In a webcast held on the 19th of October, 2021, Dr. Mark Gooding, Chief Scientific Officer of Mirada Medical, grappled with the concept of AI and shared how it is being introduced into clinical practice. By considering ideas and lessons learnt from Alan Turing’s paper of the 1950s, “The Imitation Game,” and his own example from radiation oncology, Gooding explored how hurdles of the past that parallel present clinical challenges can help us better understand AI’s role in society today.
What is Artificial Intelligence?
AI is the simulation of human intelligence by computer systems through learning and reasoning. Since human intelligence can be measured and evaluated through many aspects, AI systems are also classified into various subcategories including but not limited to reasoning and/or planning as seen in GPS navigators that guide us along the best and shortest route possible, vision used in self-driving cars to “see” roads and detect obstructions, speech and/or language like ones used for language translations, and motion exemplified by Boston Dynamics Atlas robots doing parkour.
An emerging branch of AI, called machine learning, enables a computer to learn to make predictions based on past examples or experiences. A class of machine learning is deep learning, a model based on convolutional, deep multi-layered neural networks that imitate the way humans gain certain types of knowledge. Rather than traditional linear algorithms, deep learning algorithms are stacked in a hierarchy of increasing complexity and abstraction.
Despite our raging concerns of redundancy and uncertainties of the unknown, there is excitement to welcome AI into our lives since such technologies have undeniably enhanced the speed, precision, and effectiveness of human efforts in ways never seen before. Another reason, according to Gooding, may be due to our long-standing interest in artificial beings and robots that tap into our imagination.
Imagination, however, allows us to envision the best and worst possible scenarios. Today, scientists have successfully invented “intelligent” robots by equipping them with incredible processing speed, meticulous revision, decision-making, strategy planning, among many other superhuman abilities. Naturally, many believe that it is possible to create artificial systems that not only rival human intelligence but perhaps even exceed us. However, a central question in all this remains overlooked: what exactly is intelligence?
Intelligence According to Turing
“Turing’s paper from the 1950s in MIND was all about artificial intelligence. In fact, the term ‘artificial intelligence’ [had not] been defined at that stage. [But] Turing was asking the question about what is intelligence. Can machines think?” explained Gooding.
In the paper, Alan Turing sought to answer this question using the imitation game, now called the Turing test. When thinking of the Turing test, many are likely to imagine the scenario popularised by the Loebner Prize competition, wherein a human must try to determine whether they are talking to another human through an interface or a computer. However, this is not what the Turing test is.
In the original imitation game, a human interrogator must ask questions to two other players to correctly determine their gender. One player must attempt to assist the interrogator to make the right decision, while the other is to trick the interrogator into making the wrong decision. Now, Turing proposed to replace the lying player with a computer and the interrogator must now interrogate the honest player and the computer. Under these new conditions, the interrogator must once again determine the genders of the players. The point of interest, however, is not on whether the interrogator can succeed by guessing correctly.
“It’s about playing the game as well as a human can play the game,” explained Gooding. “What Turing says is, essentially, we can determine the intelligence of the machine if it can trick [the interrogator] as well as [the lying player] could trick [the interrogator].”
The seemingly minor distinction between what Turing proposed and what many of us believe the imitation game to be elevates the complexity of machine intelligence. Instead of merely passing off as a human, the machine must be skilled enough to bluff and convince as well as a human can to be considered intelligent. This has led many scientists and researchers to develop machines that can play games that involve bluffing and lying to test their intelligence. The results were astonishing.
Back in the early 2000s, IBM supercomputer Deep Blue managed to beat world chess champion Garry Kasparov. In October 2015, the Google DeepMind AI program AlphaGo beat European champion and 2-dan professional Fan Hui in all five games. To raise the challenge, scientists began to test how machines would perform in a game of poker, which involves not only bluffing but also luck. AI DeepStack was then developed and successfully defeated some of the top Poker players in the world. All of these matches demonstrate that AI can exhibit some form of intelligent behaviour at least equivalent to humans, but are they enough to support the deployment of AI in clinical practice?
How the Turing Test Applies to Healthcare
Current applications of artificial intelligence in healthcare are largely focused on robots being used as an interface for doctor-patient communication. Chatbots, in particular, are becoming increasingly popular to assist diagnosis, prescribe medications, and help in emergencies. However, scientists believe that artificial intelligence has the potential to improve the accuracy, precision, efficiency, and overall quality of radiation therapy for patients with cancer.
“The task of planning a radiation therapy treatment is one of maximising the radiation to the tumour to kill it, whilst minimising all of the radiation to those healthy tissues that we want to preserve,” said Gooding.
In radiotherapy, oncologists use treatment planning systems to contour around tumours and healthy tissues to distinguish between areas of irradiation and indicate the dose and distribution accordingly. However, these treatment plans are constrained to the capabilities of clinicians. Because the organs and structures of humans are roughly the same shape, size, and at similar locations, AI systems have been developed to automate the contouring process. Using a simple threshold of brightness, organs like the lungs can be easily contoured by differentiating between bright and dark areas on CT scans. But contouring becomes more difficult when soft tissues meet other soft tissues.
“Another technique that came into play was atlas-based contouring. In atlas-based contouring, we take a previous [patient’s CT scan image] that we have already contoured – because we have to contour them for treatment – and we perform what’s called a deformable image registration. Essentially, we align that previous patient to our current patient and we can deform the image to make it match as well as possible. And having worked out that deformation we would need to match our previous patient to our current one, we can then bend the contours from that previous patient onto the new patient and we get a good outline,” said Gooding.
In Dr. Gooding’s project, he set out to create an auto-contouring system that relies on a database containing contoured scans of previous patients to perform atlas-based contouring for new patients. He proposed to pool the databases of many hospitals into one cloud to increase the likelihood of finding the best matches which in turn accelerates the auto-contouring process. However, upon further research, his team found that the measures people use to find similar scans do not work well. The surrogate was found to correlate poorly with the performance of contouring.1
This prompted the researchers to explore deep learning networks in which they discovered that deep learning can produce much better results than atlas-based contouring. Instead of having to find the best match for every new patient, the neural network learns how to contour CT scans based on the many thousands of contoured images fed to the network for training. In a few months, Gooding’s team successfully developed the envisioned system – an AI that can contour many organs in a CT scan in a fully automated manner– which led to the first clinical evaluation in 2018 that compared deep learning contouring with atlas-based contouring.2
Typically, comparisons are done quantitatively – by measuring the distance between two surfaces or the overlapping areas between the deep learning contours and the ground truth clinical contours as drawn by radiation oncologists. However, Gooding wondered whether these quantitative measures are valid at all. These measures require machines to perfectly reproduce the contours done by humans. But even between oncologists, no two contoured images could ever be exactly identical given human variability.
This dilemma inspired Gooding to perform his own imitation game. He placed two contoured CT scans – one made by a computer, and the other by a human – and asked human observers whether they could tell which was made by whom. The blind test revealed that the rates of acceptance of computer contours were similar to that of human contours, showing that the AI performed its task well enough to not only mimic humans but also be considered to be done well enough according to human standards.
The Limitation Game: Addressing the Flaws of AI
While Gooding’s blind test demonstrated the strengths of AI to perform a clinical task, it does not show that machines are intelligent. Of the nine objections Turing put forward in his paper to address possible arguments against his proposition, the Disability argument contends that because machines can only do specific tasks and not all that a human can do, it is not intelligent. For instance, compared to experienced doctors who can spot tumours, diseases, and various complications from CT scans, a computer would miss anomalies they were not programmed to detect.
Another argument called the Lady Lovelace objection asserts that machines inherently lack human creativity and to be intelligent, one must be able to invent and create as opposed to performing simple imitations. Human doctors can learn, problem-solve, and research for specific patients, but computers fall short as they merely follow instructions programmed into them. Both objections highlight that “mimicking is not good enough” to define intelligence.
“Mimicking the task the doctor is trying to do is not good enough because it can’t go further; it can’t do the research piece, it can’t extend human knowledge,” said Gooding.
However, scientists have argued whether the niche quality and simplistic functions of AI are valid reasons to completely disregard its potential to advance healthcare. The Rock objection, for one, makes an interesting counterpoint. The Rock objection draws the scenario: if a person sticks their foot through a hole in the door and fails to tell the difference between their foot being stomped on or a rock being dropped on them, does that mean the rock is intelligent?
The answer is obviously no. But, “In the case of auto-contouring, the task we’re trying to do is contour around it. Actually, to do that task, I don’t care about intelligence. I care about getting the job done, so if I can save myself the effort of stomping on somebody’s toe […] by having a machine that drops a rock on their toe, well, I don’t actually care if the rock is intelligent. What I care about is their toe getting thoroughly pounded,” said Gooding.
In such instances, mimicking is perhaps good enough because it “gets the job done.” Although not to be considered intelligent in some ways, a computer’s ability to perform specific, simple duties quickly, accurately, and efficiently may be an asset just as valuable, or even more, than intelligence if we choose to welcome it.
“This is perhaps where we have the biggest problem in healthcare. It’s not [about] whether the computer is good enough to do the job, but it’s about whether we accept that the computer is good enough to do the job.”
“Looking at AI in healthcare, we need to think about what we want to achieve. Really, we’re not focused on the intelligence and we shouldn’t be focused on intelligence. We’re not trying to make computers that are smart and intelligent and able to do creativity or do different tasks. We’re focused, actually, on the artificial. What we want to do is mimic human behaviour. We want to build a better rock. But we’re still faced with the head in the sand objection and the only way we can overcome that head in the sand objection is through education, is through showing the performance of the machine, and explaining what AI is and how it works, what it can do and what it can’t do. And that’s the task we now face,” concluded Gooding.
- Schipaanboord et al. (2019). An Evaluation of Atlas Selection Methods for Atlas-Based Automatic Segmentation in Radiotherapy Treatment Planning. IEEE Transactions on Medical Imaging, 38(11), 2654-2664.
- Lustberg et al. (2018). Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer. Radiotherapy and Oncology, 126(2), 312-317.
- Gooding, M. (2021, October 19). Artificial Intelligence, healthcare and the Turing test [Webcast]. Nature Research.