In 2017, the Washington County Sheriff’s Office, just outside Portland, Oregon, wanted to find a man covered in dollar bills. The unidentified man, whose profile photo showed him lying on a bed covered in paper money, had been making concerning posts on Facebook, the sheriff’s office said. So the department ran the image through a powerful new facial recognition and analysis system built by Amazon, called Rekognition, and used Rekognition to compare it with booking photos used by the department. According to Chris Adzima, a senior systems information analyst, the officers “found a close to 100% match.”
Adzima touted the efficacy of Rekognition in a guest blog post for Amazon. He had other examples of its usefulness, too: a suspect who was wanted for allegedly stealing from a hardware store, another who’d been captured on surveillance cameras using a credit card later reported as stolen. Overall, Adzima wrote, Rekognition represented “a powerful tool for identifying suspects for my agency. As the service improves, I hope to make it the standard for facial recognition in law enforcement.”
For the most part, that’s how Rekognition has been introduced and sold: as a wondrous new tool designed to keep the public safer; Amazon’s one-stop superpower for law enforcement agencies.
But superpowers tend to come with unintended consequences, and Rekognition, in particular, has some prodigious—and highly concerning—blind spots, especially around gender identity. A Jezebel investigation has found that Rekognition frequently misgenders trans, queer and nonbinary individuals. Furthermore, in a set of photos of explicitly nonbinary individuals Rekognition misgendered all of them—a mistake that’s baked into the program’s design, since it measures gender as a binary. In itself, that’s a problem: it erases the existence of an already marginalized group of people, and, in doing so, creates a system that mirrors the myriad ways that nonbinary people are left out of basic societal structures. What’s more, as Rekognition becomes more widely used, among government agencies, police departments, researchers and tech companies, that oversight has the potential to spread.
This isn’t a new problem. As Vice wrote earlier this year, “automatic gender recognition,” or AGR, has long been baked into facial recognition and analysis programs, and it’s virtually always done so in a way that erases trans and nonbinary people. Jezebel’s investigation shows that these same issues exist deep within Rekognition.
The program is designed around Amazon’s assumptions about gender identity, an omission that becomes even more disturbing as Amazon’s software gets silently integrated into our lives. On Github, a platform where software developers maintain their code, there are 6,994 instances where Rekognition and gender are mentioned together. These projects represent a future where Rekognition’s technology—and assumptions of gender—forms the backbone of other apps and programs; thousands of basic systems baked into society all silently designed to flatten gender identity. (A spokesperson for Amazon initially agreed to speak to Jezebel about our research, then failed to respond to five followup emails and a phone call.)
As Os Keyes, a PhD student at the University of Washington, writes in their 2018 study of the subject:
If systems are not designed to include trans people, inclusion becomes an active struggle: individuals must actively fight to be included in things as basic as medical systems, legal systems or even bathrooms. This creates space for widespread explicit discrimination, which has (in, for example, the United States) resulted in widespread employment, housing and criminal justice inequalities , increased vulnerability to intimate partner abuse and, particularly for trans people of colour, increased vulnerability to potentially fatal state violence.
These harms have the potential to be even more insidious as the technology becomes imbedded in our day-to-day lives. What happens when you can’t use a bathroom because an AI lock thinks that you shouldn’t be there? What happens to medical research or clinical drug trials when a dataset misgenders or omits thousands of people? And what happens when a cop looks at your license and your machine predicted gender doesn’t match what they see? A world governed by these tools is one that erases entire populations. It’s a world where individuals have to conform to be seen.
When Amazon introduced Rekognition in November 2016, it was depicted as a fun search engine—a service meant meant to help users “detect objects, scenes, and faces in images.” The product was relatively non-controversial for its first two years of life, but that changed in May of 2018, when the ACLU of California revealed Rekognition was being sold to police departments as a fundamentally different—more serious, and more powerful—tool. Amazon had been aggressively marketing the product as a potent surveillance tool, the ACLU reported, suitable for both government agencies and private companies.
It’s not just Amazon, of course: as companies, governments, and technologists tout the business and security insights gleaned from machine intelligence, public debate over the rollout of such software has escalated—particularly as examples of gross misuse emerge. The Chinese government has used facial recognition to profile and track Uighurs, a persecuted, largely Muslim minority population. In the U.S., CBP continues to tout the efficacy of biometric surveillance both on the U.S Mexico Border and within the country’s interior, where airports nationwide are beginning to roll out facial recognition based check-in.
In April of this year, Microsoft’s president said the company had rejected a request from a law enforcement agency in California to install their facial recognition system in officers’ cars and body cameras due to concerns of false positives, particularly of women and people of color, since the system had largely been tested on photos of white men. Those concerns are proving to be well-founded: last year a study from MIT and University of Toronto found that the technology tends to mistake women, especially those with dark skin, for men. Separately, an investigation by the ACLU found that Rekognition falsely matched 28 members of Congress with booking photos.
Meanwhile, though, facial recognition products continue to be scooped up by police departments and private companies, and used in opaque, unregulated, and increasingly bizarre ways. Amazon clearly wants to be the industry leader: the company’s shareholders recently voted down a proposal by activist investors to ban the sale of facial recognition to governments and government agencies.
The Rekognition technology, and its flaws around identifying trans and nonbinary people, arrives at a particularly bad time for those populations in the U.S. The Trump administration recently proposed a measure that would revoke an Obama-era rule that prevented discrimination against transgender people in medical settings and within health insurance. The Department of Housing and Urban Development recently proposed another draft rule that would allow federally funded homeless shelters and housing programs to bar transgender people from entry. Earlier this year, the Trump administration even tightened its definition of “biological sex,” basing it on a person’s “chromosomes, gonads, hormones, and genitals.” These regulations demonstrate how powerful and potentially violent definitions of sex and gender can be.
“When we’re thinking about imperfect systems like facial recognition there are two distinct concerns,” Daniel Kahn Gillmor, a senior staff technologist at for the ACLU’s Speech, Privacy and Technology Project, says. “One of them is that the systems might not be good enough, that they’ll sweep up the wrong people, that they’re biased against certain populations because they haven’t been trained on those populations, that they’re more likely to make mistakes or put people in certain categories based on biases that are built in.”
But on other hand, Gillmor says, “If there are no technical problems with the machinery,” which might be the case in a decade or so, “we have another set of problems, because of the scale at which this technology can be deployed.” Facial recognition and analysis, Gillmor says, “provides a mechanism for large scale potentially long-lasting surveillance” at a scale, he says, “that humans have never really been able to do before.”
That’s why the technology is a concern, regardless of whether it’s extremely accurate or extremely inaccurate, according to Gillmor. “If it’s not good enough, it’s a problem,” he told us. “And it’s a problem if it is good enough. It’s always a problem.”
To understand how Amazon’s software analyzes trans and nonbinary individuals, we built a working model. But it was unclear whether we could responsibly look into Rekognition’s gender analysis systems at all.
Researching corporate machine learning algorithms is fraught with ethical concerns. Because of their ability to optimize in real time, the act of testing a system inevitably ends up refining it. And by calling out flaws in its design, you may be tacitly condoning an update to the system rather than recommending that the entire premise of Automated Gender Recognition should be reconsidered.
And everything that we load into the program becomes part of the large Amazon’s large corpus of data. According to Amazon’s FAQ, Rekognition stores image and video inputs in order to improve their software; Rekognition data becomes Amazon’s training data. This means, for instance, that Happy Snap, a Rekognition powered find-and-seek adventure mobile app designed for kids, is likely inadvertently training the Washington County Sheriff Office’s surveillance operations. (Happy Snap did not immediately return an email requesting comment.)
You are able to opt out of this default behavior, but the process isn’t exactly easy or clear. Before we felt comfortable using Rekognition to investigate their AGR, we wanted to ensure that our research would not inadvertently optimize a surveillance system that we’re simply not sure should have existed in the first place. To opt out, Jezebel contacted Amazon’s Technical Support, who in turn contacted the Rekognition engineers. After two weeks we received confirmation that the images we would use to test the AGR components of Rekognition would not be used to optimize the system. Only then did we feel comfortable starting our experiment.
Amazon provides developers with detailed documentation for how to build a facial analysis and recognition system using their Rekognition software—a troubling fact given how easily such technology can be weaponized. We used this documentation to build a version of Rekognition that compared the gender predictions and confidence thresholds across hundreds of photos of nonbinary, queer, and trans individuals with binary ones.
Our nonbinary and trans dataset was sourced from the Gender Spectrum Collection, a stock photo library of trans and gender non-conforming individuals created by Broadly, a former Vice subsite focusing on gender and identity.
Zackary Drucker, the photographer and artist who shot the photos for Vice, explains to Jezebel that the stock photos were created “to fill a void of images of trans and nonbinary people in everyday life. Stock photographs have always neglected to include gender diverse people, and that has further perpetuated a world in which trans people are not seen existing in public life.”
For our purposes, the GSC also helped us draw a very direct comparison between how Rekogition treats gender-conforming versus non-conforming individuals. We sourced our binary dataset by using the captions from the Broadly photos to identify visually similar photos of gender-conforming individuals on Shutterstock. For instance, if a photo from Broadly’s dataset was captioned “a transmasculine person drinking a beer at a bar” we would scrape Shutterstock for photos of “a man drinking a beer at a bar.”
We fed these images through the Facial Analysis functions of Amazon’s Rekognition software to retrieve each individual’s predicted gender score. Gender is assigned with a binary Male or Female variable and an associated confidence score for the prediction.
Of the 600 photos we analyzed, on average Rekognition was 10% more confident of its AGR scores on the Broadly dataset than the Shutterstock dataset. However in spite of these confidence scores, self-identified transmasculine and transfeminine individuals were misgendered far more frequently. Misgendering occurred in 31% of the images that contained a self identified trans person. Meanwhile, misgendering only occurred in 4% of the images in the Shutterstock dataset of binary individuals.
Interestingly, misgendering was also inconsistent across individuals. Broadly’s dataset contains multiple photos of the same people; there were instances where Rekognition’s AGR correctly gendered a person they had previously misgendered.
Though our data sets were admittedly limited, the difference in how Rekognition performs on a data with trans and nonbinary individuals is alarming. More concerning however is that Rekognition misgendered 100% of explicitly nonbinary individuals in the Broadly dataset. This isn’t because of bad training data or a technical oversight, but a failure in engineering vocabulary to address the population. That their software isn’t built with the capacity or vocabulary to treat gender as anything but binary suggests that Amazon’s engineers, for whatever reason, failed to see an entire population of humans as worthy of recognition.
Alyza Enriquez is a social producer at Vice and a nonbinary person, who participated in putting together the Gender Spectrum Collection and is featured in some of its photos. They weren’t surprised to learn that Rekognition doesn’t recognize nonbinary identities.
“It’s obviously disconcerting,” they told us. “When you talk about practical applications and using this technology with law enforcement, that feels like a dangerous precedent to set. But at the same time, I’m not surprised. We’re no longer shocked by people’s inability to recognize that nonbinary identity exists and that we don’t have to conform to some sort of category, and I don’t think people are there yet.”
Morgan Klaus Scheuerman is a PhD student in information science at the University of Colorado-Boulder. He and his collaborators have been conducting a large-scale analysis of how individuals of different gender identities are classified, correctly or otherwise, in AGR systems, particularly large-scale, cloud-based systems like Amazon and Microsoft.
Scheuerman’s research with his collaborators and his advisor, Jed Brubaker, an assistant professor in the Information Science department, found many of the same issues that we identified. They, too, struggled with whether it was ethical to test AGR systems at all. “We were going to analyze a specific system, and we decided not to,” Scheuerman says, “because their TOS said it was going to retain these images. We had to do the same tradeoff: what is the benefit of this research versus the impact it could have? Even if you can take your data back out, it’s probably embedded in the model already.”
In their lab, Brubaker and Scheuerman and their collaborators look at all kinds of areas of uncertainty, where tech systems struggle to respond adequately to the shades of grey in people’s lives. “We look at gender, death, breakups — scenarios where system don’t understand our social lives appropriately,” Brubaker explains.
Those shades of grey are often disregarded outright in building new technologies. Scheuerman points out that AGR, with all of its problems, is the continuation of a vastly oversimplified classification system that’s existed for a very long time, one that sorts people into “male” and female” and has very little room for any other identity.
“Even as people become more aware of differences in gendered experiences, there’s still this view that when that difference occurs, it’s an outlier,” Scheuerman says. “Or the term that engineers use, an ‘exception case,’ a ‘boundary case.’”
The “edge cases,” Scheuerman says, are often disregarded when building new technologies because, as he puts it, “they make things more technically difficult for people to accomplish. So we have this system whose main task is to identify gender. [S]tarting with a clear binary is technically the most feasible. But then when you say we want to include nonbinary, that makes the entire task obsolete. Basically, the data that you’re using then makes men and women no longer fall into a specific category.”
Brubaker, Scheuerman’s advisor, points out that AGR is also engineered towards how gender looks, not how people experience it.
“The systems have a strong bias towards how gender is presented, not how it’s experienced,” he says. “These are systems whose only way of understanding the world is through vision. They just see. That’s all they do. So if you think of gender as something that can only be seen, not self-reported, that’s a very narrow, particular, and in some cases very bizarre way to think about gender.”
Zackary Drucker, the photographer who shot the photos we used in testing Rekognition, said our use of the photos is “further evidence that the collection has utilitarian purpose in the world far beyond what we may have imagined. On the other hand, it’s so alarming that technology may be used to identify people.”
“There’s this element to trans and nonbinary people just throwing a wrench into the system that disproves its accuracy,” she added. “If this system is equally confident that one person is one gender or the other in different situations, that’s evidence that this technology is as false as the gender binary.”
In the past, Amazon has responded to critiques of its product by arguing that critics fail to understand the nuances of the tech. In a response to a New York Times article, for instance, Amazon stated: “Facial analysis and facial recognition are completely different in terms of the underlying technology and the data used to train them. Trying to use facial analysis to gauge the accuracy of facial recognition is ill-advised, as it’s not the intended algorithm for that purpose (as we state in our documentation).”
More specifically, facial analysis is meant to predict human attributes like age, emotion, or gender, whereas recognition is about matching two faces. Our investigation, focused specifically on the gender predictions from Rekognition’s facial analysis algorithms. Automatic Gender Recognition (AGR) is a subfield of facial recognition that seeks to identify the gender of individuals from images or videos.
Researchers have historically built these systems by analyzing the frequency of a human voice, looking at the texture of skin, measuring facial alignment and proportions, and analyzing the shape of breasts. AGR systems, in other words, have been designed under the presumption that gender is physiologically-rooted wherein the body is the source of a binary gender, male or female.
This design shortcoming has real consequences, and the problems will only be compounded as they’re added to other surveillance systems.
This isn’t just about calling out the evident biases of Amazon’s engineers. The real concern here is that these biases are so deeply embedded into the very structure of an institution as powerful as Amazon, which deploys technology for our schools, businesses, and governments. As writer and educator Janus Rose points out, “If we allow these assumptions to be built into systems that control people’s access to things like healthcare, financial assistance or even bathrooms, the resulting technologies will gravely impact trans people’s ability to live in society.”
Rose points out that AGR has already been used to bizarre effect:
Evidence of this problem can already be found in the wild. In 2017, a Reddit user discovered a creepy advertisement at a pizza restaurant in Oslo, which used face recognition to spy on passers-by. When the ad crashed, its display revealed it was categorising people who walked past based on gender, ethnicity and age. If the algorithm determined you were male, the display would show an advertisement for pizza. If it read you as female, the ad would change to show a salad.
The problem is, in other words, both larger and more intimate than just how it might be used by police. We can envision, for instance, a world in which this AGR system implemented in an office in the form of prohibiting someone of the “wrong” gender to enter a bathroom. With this technology, it’s entirely possible that a trans individual would unable to use a single bathroom in the building.
We reached out to all the companies using Rekognition for facial analysis as listed on Amazon’s information page for the product. Only two got back to us in a meaningful way. One was Limbik, a startup that uses machine learning to help companies understand whether their videos are being watched, and by who. They told us that Amazon’s binary gender settings posed a problem for them: “We have noticed this as an issue for us, as the better we can tag videos with proper tags the more accurate we can be with predictions and improvement recommendations. It would be best if we could get this type of information as it would help us categorize videos better and help with prediction.”
Without that information, Limbik added, they have to specify to customers what their analysis, using Rekognition, does and doesn’t do. “Since Rekognition only returns a binary value for gender, we have to make sure that, to customers, we specify that it is biological sex that is examined and not gender specifically and that it isn’t perfect. We have internal conversations about this issue and have discussed remedies but as we can have upwords of 1000 tags connected to a video coming from other Rekognition services, our internal tagging methods, manual human tagging and other methods, we haven’t found a good way to address this.”
Arguably, no one will find a “good way to address this,” or one that even remotely repairs the potential harms that Rekognition could do to vulnerable populations. That Amazon is pushing software like this at a time when trans rights are under full on assault by the Trump administration indicates that Amazon, intentionally or not, is working against the interests of some of society’s most vulnerable members. This is not a theoretical concern. It’s an urgent one, and the dystopic implications of how it might be used are getting more real every day.
Correction: An earlier version of this post misidentified Os Keyes as a PhD. They are a PhD student.