Ever since Microsoft’s chatbot Tay started
after 24 hours of interacting with humans on Twitter, it has been obvious that our AI creations can fall prey to human prejudice. Now a group of researchers has figured out one reason why that happens. Their findings shed light on more than our future robot overlords, however. They’ve also worked out an algorithm that can actually predict human prejudices based on an intensive analysis of how people use English online.
The implicit bias test
Many AIs are trained to understand human language by learning from a massive corpus known as the Common Crawl. The Common Crawl is the result of a large-scale crawl of the Internet in 2014 that contains 840 billion tokens, or words. Princeton Center for Information Technology Policy researcher Aylin Caliskan and her colleagues wondered whether that corpus—created by millions of people typing away online—might contain biases that could be discovered by algorithm. To figure it out, they turned to an unusual source: the Implicit Association Test (IAT), which is used to measure often unconscious social attitudes
People taking the IAT are asked to put words into two categories. The longer it takes for the person to place a word in a category, the less they associate the word with the category. (If you’d like to take an IAT, there are several online at Harvard University.) IAT is used to measure bias by asking people to associate random words with categories like gender, race, disability, age, and more. Outcomes are often unsurprising: for example, most people associate women with family, and men with work. But that obviousness is actually evidence for the IAT’s usefulness in discovering people’s latent stereotypes about each other. (It’s worth noting that there is some debate among social scientists about the IAT’s accuracy.)
Using the IAT as a model, Caliskan and her colleagues created the Word-Embedding Association Test (WEAT), which analyzes chunks of text to see which concepts are more closely associated than others. The “word-embedding” part of the test comes from a project at Stanford called GloVe, which packages words together into “vector representations,” basically lists of associated terms. So the word “dog,” if represented as a word-embedded vector, would be composed of words like puppy, doggie, hound, canine, and all the various dog breeds. The idea is to get at the concept of dog, not the specific word. This is especially important if you are working with social stereotypes, where somebody might be expressing ideas about women by using words like “girl” or “mother.” To keep things simple, the researchers limited each concept to 300 vectors.
To see how concepts get associated with each other online, the WEAT looks at a variety of factors to measure their “closeness” in text. At a basic level, Caliskan told Ars, this means how many words apart the two concepts are, but it also accounts for other factors like word frequency. After going through an algorithmic transform, closeness in the WEAT is equivalent to the time it takes for a person to categorize a concept in the IAT. The further apart the two concepts, the more distantly they are associated in people’s minds.
The WEAT worked beautifully to discover biases that the IAT had found before. “We adapted the IAT to machines,” Caliskan said. And what that tool revealed was that “if you feed AI with human data, that’s what it will learn. [The data] contains biased information from language.” That bias will affect how the AI behaves in the future, too. As an example, Caliskan made a video (see above) where she shows how the Google Translate AI actually mistranslates words into the English language based on stereotypes it has learned about gender
In Canada’s hyper-concentrated and vertically integrated telcoms sector, data caps are a normal part of life; and where there are data-caps, there is cable company fuckery in the form of ““zero rating” – when your telcom sells you to online service providers, taking bribes not to count their service against your cap.
The Canadian Radio-television and Telecommunications Commission, which regulates the country’s ISPs, has now banned the practice, in response to a 2015 complaint against the Quebecois ISP Videotron.
MTV Decoded has been nominated for the 2017 Webby Awards for public service& & activism but we need your votes to win!!
Please visit bit.ly/DecodedWebby to cast your vote!!
voting closes 4/20 & you can only vote once, so please share or tag a friend to help spread the word!! thanks loves! #Webbys
TOMORROW 4/20 is the last day to vote! Please vote or reblog to help support MTV Decoded!
TODAY 4/20 is the last day to vote!!
Her name is faith, and she deserves good business!
Let’s help this black sis!
Super disgusting! People are FUCKED UP.
It’s racial discrimination to treat someone less polite than someone else would be treated in the same circumstances, because of race. We can’t ignore the case of blatant racism. We MUST draw public attention to such cases. We must ensure that racists are identified and socially discredited. There are no reasons or excuses for racism. It’s just disgusting.
from the KTLA news article:
When Suh said she would report the action to Airbnb officials, the host replied: “It’s why we have Trump.”
Suh said that comment made her painfully aware of how threatened minorities have become under the Trump administration.
“For me personally, to now have someone say something racist to me and say it’s because of Trump, it was my fears coming true,” Suh said. “That people who held these racist beliefs felt emboldened.”
The host went on to say she would “not allow this country to be told what to do by foreigners.”
Suh is an American citizen who has called the U.S. home since she was 3 years old.
“If this is my experience as a light-skinned Asian woman, what is it like for people who have darker skin than me or are Muslim?” Suh wondered aloud. “What is it like for people who are undocumented or not U.S. citizens yet?”
Steve Smith of Cambridge, Massachusetts contacted Ars and Gillula after our recent article about how the US Senate vote to eliminate ISP privacy rules affects users and what Internet users can do to hide their browsing history. He’s a subscriber to this new browser pollution approach.
“Perhaps more constructively than using a VPN or Tor, fill up your monthly bandwidth allotment with data pollution,” Smith wrote to us. “You’re already paying for the bandwidth, so use it all if your ISP is going to sell your private data. This has the dual benefits of obscuring your actual browsing habits, and, if enough people adopt this practice, discouraging ISPs from selling private data.
“I’ve written a Python class to do this for my household—it crawls for links it finds using random word searches—and have shared the code,” he continued. Smith’s code is available on GitHub. Internet users often have to worry about data caps, but Smith set the default rate to use 50GB a month, or about five percent of a 1TB data cap.
Smith’s “ISP Data Pollution” project isn’t the only such effort. For instance, there’s a project called “RuinMyHistory” that opens a popup window that cycles through different websites and a browser plugin called Noiszy designed to “create meaningless Web data” by visiting various websites. […]
Under the Protecting Data at the Border Act, devices “belonging to or in the possession of a United States person” (a citizen or Green Card holder) could no longer be searched at the border without a warrant. Agents would no longer be able to deny US persons entry or exit on the basis of a refusal to allow such a search (but they could seize the equipment).
It doesn’t cover visitors or visa holders, but it does have bipartisan support in the Senate (Wyden D-OR; Paul R-KY) and the House (Polis D-CO; Farenthold R-TX).
The Customs and Border Protection agency conducted more warrantless device searches in Feb of 2017 than it did in all of 2015.