Skeptics of giving computers control over high-risk activities like driving cars were given some ammunition last week when researchers at OpenAI discovered their two-month-old machine vision system could be tricked with a pen and paper into misidentifying objects.
The AI laboratory published a paper March 4 that revealed their new system could be fooled into identifying an apple as an iPod by attaching a note to the apple with the word iPod on it.
In another flub, the system also identified a chainsaw as a piggy bank when dollar signs were sprinkled over a photo of the tool.
Despite those errors, OpenAI remains optimistic about its experimental machine vision system, called CLIP, which is exploring how AI systems might be able to identify objects without close supervision through training with large databases of images and text pairs.
The company’s researchers explained in a blog that it had discovered the presence of multimodal neurons in CLIP. In the human brain, those neurons respond to clusters of abstract concepts centered around a common high-level theme, rather than any specific visual feature.
“Our discovery of multimodal neurons in CLIP gives us a clue as to what may be a common mechanism of both synthetic and natural vision systems — abstraction,” the researcher wrote.
New Attack Surface
However, the system’s ability to link words and images at an abstract level also created a new vector of attack not previously seen in machine vision systems.
“By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model,” the researchers noted.
Such attacks that use “adversarial images” have been used against commercial machine vision systems, but with CLIP, they require no more technology than a pen and paper.
Design flaws in machine vision systems like CLIP aren’t unusual.
“It’s very common for machine learning systems to have flaws in the way they classify objects,” observed Jonathan Spring, an analyst with the CERT Division in the Software Engineering Institute at Carnegie Mellon University in Pittsburgh.
“You can’t always predict what a computer is going to do even though you know what it has been programmed to do,” he told TechNewsWorld.
Trick the Machine Distraction
Humans don’t fully understand how they process the world through visual stimuli, yet they’re trying to teach a machine how to do it, explained Vilas Dhar, president of the Boston-based Patrick J. McGovern Foundation, which focuses on the impact of artificial intelligence and data science on society.
“This means we take shortcuts,” he told TechNewsWorld. “Rather than teaching a system what an apple is as a conceptual object, with inherent meaning, we treat it as a configuration of pixels, a pattern that gets meaning through context.”
“This approach works most of the time, but can fail spectacularly when the context is not part of the training set,” he continued. “When we train machine vision systems without meaning, we get further from the reality of machines seamlessly navigating our real built environment.”
Kjell Carlsson, a principal analyst at Forrester Research, warned about being distracted by tricking-the-machine stories.
“It is absolutely important that people investigate the limitations of these models, but that fact that you can trick a model says nothing useful on its own,” he told TechNewsWorld.
“It is, for example, extraordinarily easy to trick people,” he continued. “People systematically think that folks wearing glasses are intelligent, and we have some 86 billion biological neurons at our disposal.”
Mike Jude, research director for video surveillance and vision applications atIDC, explained that machine vision systems can be spoofed by playing to the biases that are implicit in the training that went into the network.
“It’s possible to do the same thing with a human being,” he told TechNewsWorld.
“With a machine vision system, it can be a lot more subtle,” he continued. “You can have cues embedded into the picture that the human eye can’t perceive but the computer can because it’s just looking at pixels.”
“Any machine vision system can be hacked,” he added. “There are flaws in any system that can be exploited by the right kind of attack.”
“This supports some of the concerns that people have about machine vision applications,” he said.
Critical for Self-Driving Cars
One application area of particular concern is the use of machine vision in self-driving vehicles.
“Machine vision is absolutely crucial to self-driving cars,” asserted Sam Abuelsamid, principal analyst for e-mobility at Guidehouse Insights, a market intelligence company in Detroit.
“It’s critical because the system needs to not only classify where objects are around the vehicle but what those objects are in order to make decisions on what to do, just as we do as human drivers,” he told TechNewsWorld.
“If a plastic bag blows across the road, that’s not a risk, but if a child is in the road, that’s a great risk,” he added.
He explained that a vehicle needs to understand the semantics of what the sensors are seeing so accurately classifying things that are being detected is absolutely critical for an automated driving system to be safe and reliable.
Automated driving systems that are over reliant on machine vision are generally not very reliable, he contended.
“Machine vision systems still have a lot of flaws, especially those dependent on AI and neural networks,” he said. “The successful perception systems for automated driving all use multiple systems to understand the environment around the vehicle.”
“They use a mix of machine vision and deterministic sensors that measure the position and distance of objects,” he continued. “You need radar, lidar and thermal imaging in order to get an accurate view of what’s out there.”
Although OpenAI’s CLIP system has been described as “state-of-the-art,” Abuelsamid was skeptical of the description. “Any machine vision system that’s fooled by a note is not state-of-the-art,” he countered.
Adding Security and Reliability
Spring noted that there are ways to make machine vision systems more secure and reliable.
“There are test beds available to people who want to train their machine learning models and make them harder to attack, but it will never get rid of all attacks,” he observed.
“We also need a social system on top of the technical system where we can manage what people know about tricking these systems, and if there are flaws with great impact, we need ways to update them to get rid of those flaws,” he continued. “That’s similar to the way we handle vulnerabilities in software in general.”
“Then there’s defense in depth so you have systems in place to help the machine learning system deal with the ways it’s likely to fail or be tricked,” he added. “Sometimes that may be a human monitoring the system. Other times it may be physical constraints on a robot or car that don’t let it do certain things.”