AI claims to be able to read human facial expressions, but the science says this claim is both wrong and dangerous to society

In Brief: Startups and Big Tech claim that their new AI-based affect recognition systems can detect human emotions by analyzing “universal” facial expressions. Indeed, this ancient concept has influenced both professional and lay actors for decades. New research, however, disputes the claims made by the makers of these systems, concluding that there is little, if any, scientific basis for the idea that facial expressions are universal and thus readable by a machine.

Analysis: Facial analysis is one of the more public, prevalent, and intriguing applications across the current AI landscape. The goals behind systems that can identify someone from an image, known as facial recognition, are generally understood by most people. Computers are trained to scan the contours and details of a face and then identify that person correctly. Affect recognition, however, another aspect of facial analysis, is less understood; however, its increasing use means that it’s essential to understand this less common technique’s uses and efficacy. New research from Kate Crawford, a professor at USC Annenberg, aims to do just that.

The modern story of affect recognition has an unlikely birthplace: the Salpêtrière asylum in Paris, which in the 1800s’ housed up to 5,000 people with a wide range of mental illnesses and neurological conditions. It was there that a doctor named Guillaume Duchenne de Boulogne decided to use patients to photograph and categorize facial movements and expressions. His analysis, Mécanisme de la physionomie humaine, which tried to connect facial expressions with emotional and psychological states, was foundational to luminaries like Charles Darwin but also to contemporaries of ours, most notably a man named Paul Ekman.

Ekman enters the story in the 1960s when he arrives in Papua New Guinea to conduct a novel experiment. His goal was to show the native people images and then register their emotional reactions. Ekman believed that facial expressions of emotions and sentiments were universal and believed his experiment would prove this to be true. However, the experiments were a failure, a frustrating experience both to Ekman and his subjects. Ekman, however, was undeterred and was determined to press on, notes Crawford:

During the mid-1960s, opportunity knocked at Ekman’s door in the form of a large grant from what is now called the Defense Advanced Research Projects Agency (DARPA), a research arm of the Department of Defense. DARPA’s sizable financial support allowed Ekman to begin his first studies to prove universality in facial expression. In general, these studies followed a design that would be copied in early AI labs…Subjects were presented with photographs of posed facial expressions, selected by the designers as exemplifying or expressing a particularly “pure” affect, such as fear, surprise, anger, happiness, sadness, and disgust. Subjects were then asked to choose among these affect categories and label the posed image. The analysis measured the degree to which the labels chosen by subjects correlated with those chosen by the designers.

Once again, Ekman’s research was problematic:

From the start, the methodology had problems. Ekman’s forced-choice response format would be later criticized for alerting subjects to the connections that designers had already made between facial expressions and emotions. Further, the fact that these emotions were faked would raise questions about the validity of the results.

Again, Ekman was not held back by his poor results. In a few years, Ekman (with his collaborator Wallace Friesen) published something called the Facial Action Coding System (FACS) in 1978. Though difficult and tiresome to use, this system proved a success, and the U.S. government continued to support the field in the coming years. So did labs in other countries, notes Crawford:

By the end of the decade, machine-learning researchers had started to assemble, label, and make public the data sets that drive much of today’s machine-learning research. Academic labs and companies worked on parallel projects, creating scores of photo databases. For example, researchers in a lab in Sweden created Karolinska Directed Emotional Faces. This database comprises images of individuals portraying posed emotional expressions corresponding to Ekman’s categories. They’ve made their faces into the shapes that accord with six basic emotional states: joy, anger, disgust, sadness, surprise, and fear. When looking at these training sets, it is difficult to not be struck by a sense of pantomime: Incredible surprise! Abundant joy! Paralyzing fear! These subjects are literally making machine-readable emotion.

Fast forward to 2021, and affect recognition systems are now widely deployed, particularly in hiring. For example, the AI hiring support systems from HireVue, which lists Goldman Sachs, Intel, and Unilever among its clients, use machine learning to infer people’s suitability for a job. In addition, companies such as Microsoft, IBM, Amazon, Apple are all investing in emotion detection systems in the belief that computers can, in fact, detect underlying mental states from the expressions we make. But is this really the case? By all indications, notes Crawford, the answer is a resounding no:

2019 systematic review of the scientific literature on inferring emotions from facial movements, led by the psychologist and neuroscientist Lisa Feldman Barrett, found there is no reliable evidence that you can accurately predict someone’s emotional state in this manner. “It is not possible to confidently infer happiness from a smile, anger from a scowl, or sadness from a frown, as much of current technology tries to do when applying what are mistakenly believed to be the scientific facts,” the study concludes

Yet another critic of the Ekman approach is the historian of science Ruth Leys, “who sees a fundamental circularity in Ekman’s method.” Barrett herself puts it bluntly: “Companies can say whatever they want, but the data are clear. They can detect a scowl, but that’s not the same thing as detecting anger.”

In short, Crawford’s new research makes a strong case that there is very little scientific basis for the claims made by the creators of affect recognition systems. Yet these systems are in use in hiring decisions, security screening and worker management in increasing numbers. As Crawford states, “a narrow taxonomy of emotions—grown from Ekman’s initial experiments—is being coded into machine-learning systems as a proxy for the infinite complexity of emotional experience in the world.”

For Crawford and those who agree with her, the rush to validate AI facial recognition abilities is part of a wider effort to validate the efficacy of AI systems in general and to embed them in society without fully acknowledging both limitations, weaknesses and risks. Writing in 2019, Crawford and Alexander Ocampo of the University of Chicago, coined the term enchanted determinism for the paradox that arises when AI creators assign their systems both super-human, almost magical abilities, while at the same time claiming that they are ruled by the laws of logic and programming. As they note:

We term this ensemble enchanted determinism: a discourse that presents deep learning techniques as magical, outside the scope of present scientific knowledge, yet also deterministic, in that deep learning systems can nonetheless detect patterns that give unprecedented access to people’s identities, emotions and social character. These systems become deterministic when they are deployed unilaterally in critical social areas, from healthcare to the criminal justice system, creating ever more granular distinctions, relations, and hierarchies that are outside of political or civic processes, with consequences that even their designers may not fully understand or control.

The authors make the point that at least some AI creators often discuss their systems as if they had some magical power beyond human understanding (and that this is somehow acceptable):

As early as 2012, before deep learning techniques were widely used, the machine learning researcher Pedro Domingos wrote, “developing successful machine learning applications requires a substantial amount of ‘black art’ that is difficult to find in textbooks.” More recent articles in the deep learning literature continue to express a tension between performance or efficacy and lack of knowledge. To quote a typical example, “deep neural networks have proved astoundingly effective at a wide range of empirical tasks…Despite these successes, understanding of how and why neural network architectures achieve their empirical successes is still lacking.

AI enchantment is most dangerous when empirical accuracy and predictive success defy the intuitions of even the most knowledgeable experts, who admit that they don’t fully understand the theoretical basis for why deep learning works as well as it. This is because in 2021 AI systems don’t just describe or analyze the world, they create it in companies and courtrooms, hospitals and homes, public and private spaces. They also shape it, the authors note, “deepening and naturalizing socially contested classifications and hierarchies and foreclosing contestation or political discussion.”

With facial recognition, as with many other tasks that AI systems are now asked to do, the problem is that when people see their results they often see “super-human” powers at work. But that is not really true. What they see is “a form of complex statistical modeling and prediction that has extraordinarily detailed information about patterns of life but lacks the social and historical context that would inform such predictions responsibly — an irrational rationalization.” In other words, just because a computer can do something a million times more in a second than a human can does not mean it is doing anything other than increasing the rate at which it gets something right — or, as in so many cases with AI facial recognition, wrong. Moreover, the simplification, or in some cases elimination, of wider social and human context from these determinations means that what little social context there is comes — consciously or unconsciously — from the AI programmers themselves who are neither accountable to society nor to those impacted by their systems.

The authors conclude as much in their paper:

The work of classifying and predicting identities, credit risks and many other social characteristics is not done in a social vacuum, neither does it reflect a set of underlying social signals that are perceptible only by the “genius” of deep learning systems. It is a process that is actively shaped by system designers and the data used to reflect the world. When we see discourses of enchanted determinism at work, we should ask whose interests they serve and where the responsibility for the impacts of that system will ultimately rest.

Their final point is the critical issue. With facial recognition, as with so many other aspects of AI, we should pause to understand the sources and validity of claims made for these new technologies by their creators before accepting that they have a place in our most critical decisions. Fortunately, the recently proposed European AI directives are a significant first step in the right direction. The EU debate is one all free societies should have, for there is too much at stake already to get let these technologies expand their hold on our lives without a wider social debate about their true meaning and their creators’ greater purpose.

The Research

Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak, S. D. (2019). Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest, 20, 1–68. doi:10.1177/1529100619832930

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements

Atlas of AI Power: Politics, and the Planetary Costs of Artificial Intelligence

Crawford, K. and Campolo, A. Enchanted Determinism: Power without Responsibility in Artificial Intelligence. ESTS, Vol. 6 (2020). doihttps://doi.org/10.17351/ests2020.277

Posted by:Carlos Alvarenga