Artificial Intelligence

Art Project Uncovers Bias in AI Training Models

A website created to gather faces for an art project has created a controversy over the use of artificial intelligence to classify human beings.

The faces collected at the ImageNet Roulette site are being incorporated into a work of art at the Fondazione Prada Osservertario in Milan, but that is only one reason American artist Trevor Paglen and Microsoft researcher Kate Crawford created the site.

“ImageNet Roulette was launched earlier this year as part of a broader project to draw attention to the things that can — and regularly do — go wrong when artificial intelligence models are trained on problematic training data,” the creators wrote.

“ImageNet Roulette is trained on the ‘person’ categories from a data set called ImageNet (developed at Princeton and Stanford Universities in 2009), one of the most widely used training sets in machine learning research and development,” they explained.

“We created ImageNet Roulette as a provocation: it acts as a window into some of the racist, misogynistic, cruel, and simply absurd categorizations embedded within ImageNet,” they continued. “It lets the training set ‘speak for itself,’ and in doing so, highlights why classifying people in this way is unscientific at best, and deeply harmful at worst.”

Racist Roulette

ImageNet Roulette allows you to upload a photo of a face, and it will return a classification for it. I uploaded my Portuguese American head shot and received the classification “psycholinguist.” The site wasn’t as benign to some other folks who used it.

Tabong Kima, 24, an African American mentioned in a New York Times article about the site was labeled “wrongdoer” and “offender.”

Julie Carrie Wong, a reporter for The Guardian and Asian American, received the classification “gook, slant-eye.”

“ImageNet Roulette has been incredibly powerful in showing the artificial intelligence we incorporate into our daily lives is fundamentally flawed and limited by the human decisions that go into it,” said Albert Fox Cahn, executive director of the Surveillance Technology Oversight Project (STOP) in New York City, an advocacy group seeking an end to discriminatory surveillance.

“This is why we find that every visual recognition program on the market has biases on the basis of race and error rates that are different for men and women,” he told TechNewsWorld.

“No system is completely objective,” Cahn added. “However we set these tools up, the prejudices and assumptions human beings bring into these tools will shape the outcomes that we get.”

Garage Car Reasoning

ImageNet Roulette has made its point about AI, observed Joshua New, senior policy analyst at the Center for Data Innovation, part of the Information Technology & Innovation Foundation, a research and public policy organization in Washington, D.C.

“It’s useful in that it shows that AI isn’t this infallible, magic secret sauce that a lot of companies are guilty of marketing it to be,” he told TechNewsWorld.

“At the same time, it raises the important issue that if you have bad data that you’re using irresponsibly, you’re going to get bad results,” New said.

“It used a training data set that is notoriously full of wonky labels, some of them extreme, offensive and containing race and gender bias, which is concerning. But just because that exists in the world doesn’t mean all AI systems are going to exhibit those problems,” he pointed out.

“If a company cares about fighting bias and acting responsibly, they’re going to take steps do so. A bad actor that doesn’t care about fighting bias will exhibit bias, whether they’re using AI or not,” New remarked.

“AI can be done irresponsibly. It does happen, and this tool does a good job of highlighting that, but to point to this as an example of why we shouldn’t be using AI is akin to pointing to car someone made in their garage without seatbelts, brakes and airbags and saying, ‘This is unsafe, therefore everyone shouldn’t be driving.”

Better Data, Better AI

One remedy often offered to counter bias in data sets is to increase their size. The idea is that the more data in the set, the lower the probability that something will be omitted. The solution is more complex than that, though, especially when it come to facial recognition, noted George Brostoff, CEO of Sensible Vision, a face authentication company in Cape Coral, Florida.

“When facial recognition looks at a photograph, you have to take into account the quality of that photograph. Does it have enough resolution? Is there enough detail? Is there noise in it? All those factors are just as important as having lots of images,” he told TechNewsWorld.

“AI will get better with better data,” Brostoff added. “Will it always be able to make the right decision? No, because the data isn’t always going to be perfect.”

Improved data sets can help reduce bias, but so will greater transparency, said Jake Laperruque, senior counsel at the Project on Government Oversight, part of The Constitution Project, a government watchdog group in Washington, D.C.

“You can’t open source Google’s algorithm, but on the other hand, it is important to find some means of independent transparency,” he told TechNewsWorld. “That’s easier for facial recognition than for something like an algorithm.”

There are efforts in Congress to increase AI transparency, STOP’s Cahn noted. There’s a bill in the House to remove trade secret protections when AI systems are used in the criminal justice system, for example.

“If someone’s life or liberty is going to be taken away because of one of these AI tools, then their defense attorney should know exactly how it operates,” he said. “That should trump any concern for trade secrets.”

Law Enforcement Models

Removing bias from law enforcement models can be a particularly thorny problem.

To remove all bias from a law enforcement AI model, you need a bias-free historical data set, and all models must use the same data. Both those conditions are difficult to meet, said Lian Jye Su, principal AI analyst atABI Research, a technology advisory company headquartered in Oyster Bay, New York.

To get a bias free data set, you need to be 100 percent certain that all past law enforcement records are free of discrimination and bias, which is impossible, she told TechNewsWorld.

“Even if we try to remove all the biases by either not including both known and alleged cases, the inherent socioeconomic factors that cause such sentiment are incomprehensible to AI as of right now,” Su told TechNewsWorld.

Consider, for example, a bank that had been using AI for lending based on an applicant’s background, she said. Racial information was removed from the data set to avoid discrimination — yet the model was still found to be biased against African American applicants.

What the data scientists discovered was that the AI used addresses as a determining factor for loans, Su said. That created a clear bias against applicants who came from poor neighborhoods but who were not necessarily poor.

“As humans, we would have discerned that very quickly, but AI would not be able to understand that,” she pointed out.

Training a law enforcement model with the same data also can be challenging.

“It’s almost logistically impossible. Even if law enforcement agencies find a way to share training data that is aggregated, there are many country-specific regulations and socioeconomic contexts,” Su noted.

“While as humans we all share the understanding of universal values and ethic codes, AI does not have the same understanding. A slight difference in training data will result in very different decision-making processes, which means we will end up with unexplainable actions,” she added.

“One way to resolve this is to have completely transparent and explainable AI,” Su suggested. “Many AI startups are working toward that vision, but personally I think we are still very far from it.”

The hullabaloo caused by ImageNet Roulette appears to have had an impact on the researchers behind ImageNet. A few days ago they decided to purge the data set of 1.5 million images in its person category.

Meanwhile, the creators of ImageNet Roulette said they’ve proved their point and will be taking the site down on Friday. The art exhibit, though, will remain on display in Milan until February 2020.

John P. Mello Jr.

John P. Mello Jr. has been an ECT News Network reportersince 2003. His areas of focus include cybersecurity, IT issues, privacy, e-commerce, social media, artificial intelligence, big data and consumer electronics. He has written and edited for numerous publications, including the Boston Business Journal, theBoston Phoenix, Megapixel.Net and GovernmentSecurity News. Email John.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

More by John P. Mello Jr.
More in Artificial Intelligence

How confident are you in the reliability of AI-powered search results?
Loading ... Loading ...

Technewsworld Channels