Ever since Microsoft’s chatbot Tay started
after 24 hours of interacting with humans on Twitter, it has been obvious that our AI creations can fall prey to human prejudice. Now a group of researchers has figured out one reason why that happens. Their findings shed light on more than our future robot overlords, however. They’ve also worked out an algorithm that can actually predict human prejudices based on an intensive analysis of how people use English online.
The implicit bias test
Many AIs are trained to understand human language by learning from a massive corpus known as the Common Crawl. The Common Crawl is the result of a large-scale crawl of the Internet in 2014 that contains 840 billion tokens, or words. Princeton Center for Information Technology Policy researcher Aylin Caliskan and her colleagues wondered whether that corpus—created by millions of people typing away online—might contain biases that could be discovered by algorithm. To figure it out, they turned to an unusual source: the Implicit Association Test (IAT), which is used to measure often unconscious social attitudes
People taking the IAT are asked to put words into two categories. The longer it takes for the person to place a word in a category, the less they associate the word with the category. (If you’d like to take an IAT, there are several online at Harvard University.) IAT is used to measure bias by asking people to associate random words with categories like gender, race, disability, age, and more. Outcomes are often unsurprising: for example, most people associate women with family, and men with work. But that obviousness is actually evidence for the IAT’s usefulness in discovering people’s latent stereotypes about each other. (It’s worth noting that there is some debate among social scientists about the IAT’s accuracy.)
Using the IAT as a model, Caliskan and her colleagues created the Word-Embedding Association Test (WEAT), which analyzes chunks of text to see which concepts are more closely associated than others. The “word-embedding” part of the test comes from a project at Stanford called GloVe, which packages words together into “vector representations,” basically lists of associated terms. So the word “dog,” if represented as a word-embedded vector, would be composed of words like puppy, doggie, hound, canine, and all the various dog breeds. The idea is to get at the concept of dog, not the specific word. This is especially important if you are working with social stereotypes, where somebody might be expressing ideas about women by using words like “girl” or “mother.” To keep things simple, the researchers limited each concept to 300 vectors.
To see how concepts get associated with each other online, the WEAT looks at a variety of factors to measure their “closeness” in text. At a basic level, Caliskan told Ars, this means how many words apart the two concepts are, but it also accounts for other factors like word frequency. After going through an algorithmic transform, closeness in the WEAT is equivalent to the time it takes for a person to categorize a concept in the IAT. The further apart the two concepts, the more distantly they are associated in people’s minds.
The WEAT worked beautifully to discover biases that the IAT had found before. “We adapted the IAT to machines,” Caliskan said. And what that tool revealed was that “if you feed AI with human data, that’s what it will learn. [The data] contains biased information from language.” That bias will affect how the AI behaves in the future, too. As an example, Caliskan made a video (see above) where she shows how the Google Translate AI actually mistranslates words into the English language based on stereotypes it has learned about gender
This is important content cuz it touches on one of the more overlooked aspects of the recent altright encroachment: a lot of it is about doing it “for the lulz”. Making traditionally marginalized groups angry or reactive is a form of entertainment for some, and the fact that happens to work towards reclaiming white male supremacy for them is more of a secondary bonus. That’s what makes it so difficult to logically or “correctly” combat it; the most forward face isn’t an ideology or philosophy, so to speak, but rather a form of indulgence. It’s not a grand mission, it’s a game they play.
This week, “Let’s Learn Náhuatl” (Ma tiwelikan nawatl), was officially released, an app created to teach and preserve the indigenous Mexican Nahuatl language.
“In a sub-conscious way, you’ll know some Náhuatl , you’ll have heard the greetings, numbers, some verbs, maybe the animals, body parts, types of maize or scared places that govern the Nahuatl world.”
The project is the result of a collaboration between Manuvo, the National Institute of Indigenous Languages, and the Laboratory of Digital Citizenship, and offers a playful experience for those interested in learning Náhuatl words and expressions (which originated in Acatlán, Guerrero).
The graphics were designed by the design collective Metzican who created all the visuals that accompany the application. They highlight the nuances between the various Mexican communities whilst carefully avoiding clichés.
Presently in Mexico there are around 1,586,884 speakers of náhuatl or mexikatl (‘mexicano’) living across the country. The aim of this initiative is to utilise technology as a mechanism to “disseminate and generate interest in the indigenous languages, the community’s way of living and the cosmology of the indigenous villages in Mexico”.
Also in development is an app to learn purépecha, (the language spoken mainly in the northwestern region of Michoacan) which will similarly represent the identity of the village.
The app “Let’s Learn Náhuatl” is available to download for free on IOS and Android. For more info, check out Manuvo.
Richard T. Heffron. Android in Futureworld. 1976.
When Planned Parenthood was founded a century ago, it was illegal to even hand out information about birth control. Thanks to generations of brave women and men who formed secret societies, challenged unjust laws, and started Planned Parenthood health centers in their own towns, we’ve come a long way since. Millions of people, regardless of income or insurance coverage, now have access to birth control, cancer screenings, and STI testing and treatment. Each year, Planned Parenthood proudly provides health information to nearly 70 million people online and 1 million people in classrooms and communities across the country. Today, America is at a 30-year low in unintended pregnancy and a historic low in teen pregnancy.
But all of that progress is a reminder of how much women and men in America now stand to lose. Extreme politicians at every level of government are doing everything they can to block millions of people from coming to Planned Parenthood, deny access to affordable health care, and roll back women’s rights over their own bodies. We are facing a national health disaster, especially in our most vulnerable communities.
That’s why we’re calling on the tech industry to join Tumblr in standing with Planned Parenthood and standing up for access to health care.
A 100-year-old health care provider and the platform powering 335 million blogs may seem like an unlikely pair. But over the last few years, Tumblr and Planned Parenthood have teamed up to provide information and organize communities in support of reproductive rights. We’re proud of all we’ve accomplished together and with overwhelming support from the Tumblr community.
Technology has become instrumental in the fight for fairness and equality across a range of issues. It has the power to influence public debate, mobilize communities, and — most importantly — offer creative solutions to help people receive better care, no matter where they live or who they are. Finally, the tech industry owes its success to the brilliant people it employs and the communities it serves — and we cannot take their health for granted.
It won’t be easy, but doing nothing isn’t an option when lives are at stake. We need to work together to break down barriers to care and information for the millions of people desperate to take ownership of their sexual and reproductive health, and tackle disparities in health care access and outcomes.
Now is the time to be vocal, visible, and active in your support of Planned Parenthood — starting with the #TechStandsWithPP hashtag to share stories about how Planned Parenthood has touched your life, or the life of anyone you know. Call on your co-workers and peers to do the same.
In health care, education, and nearly every industry, we’re doing things that would have been unthinkable a century ago. Think of all we can achieve together in the decades to come if we combine the creativity, innovation, and energy of the tech community with Planned Parenthood’s commitment to helping people everywhere — no matter what.
— David Karp + Cecile Richards