Fine-Tuning Spam Filtering
May 18, 2004 6:30 AM PT
Because the volume of spam has increased from about 10 percent of all e-mail in 2001 to more than 50 percent today, corporations and ISPs have been trying to find ways to keep the junk mail from overwhelming users' inboxes. Filtering products, which rely on several techniques to separate needed messages from unwanted solicitations, have helped cut down on the bulk-mail deluge.
However, these filtering products have a dark side: They can inadvertently block wanted messages, often without the user ever being aware of the block. This is a significant problem, one vendors are working diligently to fix, but such a remedy seems more of a long-term than a short-term probability.
Everyone agrees spam has evolved from a minor annoyance into a significant drain on corporate resources. "Curbing spam is the top priority for many corporate IT staffs," said Michael Osterman, president of Osterman Research, a market research firm focused on spam. "Users not only complain about it, but they also spend a lot of time sifting through a growing number of spam messages."
In response, companies such as Brightmail, Cloudmark, MailFrontier, Postini, Trend Micro and Tumbleweed Communications have developed products to deal with the problem. Two techniques have been widely used to block unwanted messages.
As Clear as Black and White
The first technique, called either whitelisting or blacklisting, examines the origin of e-mail messages. After monitoring incoming e-mail, companies develop two lists (a whitelist and a blacklist) and two different routing actions based on the lists. A whitelist is a collection of senders whose correspondences should always pass through the corporate network without being checked. Blacklists are the opposite: Everything sent is considered spam and is therefore blocked.
The problem with this technique is that companies never really know who is generating a message. "Spoofing (the process of putting another person's or organization's e-mail address in the header) is a major issue, and more than one out of every three spam messages does not come from the address listed," said Richi Jennings, leader of the antispam practice at Ferris Research, a market research firm.
Content filtering has been the other main technique used to block spam, and Bayesian filters are the most popular technique within this category. Such products examine transmissions and then assign statistical probabilities concerning the likelihood that a particular message is spam. The probabilities are based on message content. For instance, a message with the word Viagra will result in a higher rating than one without it.
As a piece of e-mail passes through a filter, each message is assigned a ranking, such as from 1 to 99. The higher the number, the more likely a message is spam. A network operator then selects possible actions based on the ratings. If a message scores a ranking of 97 or higher, for example, it could be blocked. At thresholds of 85 to 96, a note saying "this may be spam" could be added to the subject line as the message is relayed to the end user.
To Be or Not To Be?
Although filters can be useful, spammers are constantly examining what is being blocked and then taking steps to find ways around the checkpoints. "A number of spammers are now including passages from Shakespeare in their mailings, because they have found that the filters view items with such passages as legitimate e-mail," Osterman told TechNewsWorld.
This game of leapfrog -- a sort of arms race on the network level -- has negatively impacted users. As companies and ISPs have ratcheted up their efforts to block spam, they also have begun blocking genuine correspondences, a snafu known as false positives. To avoid the filters, spammers started to use "Re:" in their subject fields. Many popular filters were then altered to block such spam tactics, but they also occasionally stopped legitimate transmissions.
The spam crackdown is causing headaches for companies, such as newsletter publishers, that ship large volumes of legitimate e-mail. Because of the crackdown on spam, these companies are seeing spikes in the number of undelivered messages. Consequently, users increasingly are missing important communications, and often they don't even realize it until they talk with the senders. More and more employees are becoming frustrated because they expect e-mail delivery to be guaranteed, and they are putting pressure on IT departments that have few technical alternatives at this stage.
Shutting off the filters ensures message delivery but results in spam overload. Another potential solution is setting up quarantine areas -- places where suspected spam sits as users are notified that someone has sent them messages that appear to be bogus. Employees then can examine the messages to determine whether or not they are legitimate. While potentially helpful, this process is inefficient and requires that users check the quarantine area while IT staffs set it up and manage it.
Creating Customer Dissatisfaction
The end result of this give and take in the e-mail world is that few companies are completely satisfied with today's filtering capabilities, and many are pressuring vendors to improve them. Osterman Research found that just 25 percent of users in a survey were "very satisfied" with their spam filter's ability not to generate false positives, and 16 percent expressed a degree of dissatisfaction.
In response, vendors are searching for better authentication techniques, and researchers are working on the problem from several different angles. One technique, championed by companies such as Microsoft and Yahoo, involves use of "domain keys," which use public-key encryption technology to verify e-mail senders. If this approach were implemented, ISPs could enable authenticated e-mail messages to reach end users.
But to be effective, domain keys -- much like all the alternative spam-fighting techniques still in the research lab -- require widespread adoption. They are now only in an early stage of development. Also, Microsoft and Yahoo have different techniques for verifying mail, and it is unclear whether a group like the Internet Engineering Task Force can craft a standard that all vendors will support.
Corporations would have to upgrade their e-mail systems to support the new functions. "Long term, I think technologies like domain keys will help us cut down on the volume of spam generated," Ferris Group's Jennings told TechNewsWorld. "However, in the short term, it will continue to be difficult for companies to block spam but still deliver needed messages to their users."