Ever-inventive spam senders have come up with all kinds of ways to fool e-mail filters and get their ads into your inbox. Congress and computer experts are fighting back, but the battle is far from over.
By BILL DURYEA
Published December 14, 2003
Someone calling herself Amelia Schroeder sent a colleague of mine an e-mail the other day about an urgent matter. "i found a great de.al!" the e-mail read. "want it bi.ger, nows the time." It went on: "PE.N1S ENL@RGE.MENT P1.LLS DISCOUNT PRICE!" This is spam, the nickname for what is officially known as "unsolicited bulk e-mail." In case you missed the nuance of Amelia's marketing strategy, it's penis enlargement pills she's peddling. The antispam filters here at the St. Petersburg Times missed the true intent of this e-mail and let it through.
Like so many other companies, the Times has invested handsomely to spare its employees the daily frustration of deleting dozens of these garbagey come-ons for hot porn, low rates, cheap pills and, irony of ironies, spam-blocking software.
On the same day that Amelia's spam wiggled through our network like a pint-sized pickpocket on a crowded subway car, the Times' filters intercepted 165,000 other spam messages, roughly 80 percent of the total volume of e-mail sent to the company, a figure that is both impressive and scary.
So how did Amelia's patently spammy message, as well as an estimated 14,500 equally unwanted solicitations, make it to the inboxes of Times employees? The same way spammers are overwhelming you at home, the same way they are confounding legislators and threatening the integrity of e-mail as a means of modern communication.
Bad spelling for starters.
Spam or ham?
Last January, nearly 600 software engineers and computer security experts gathered at MIT for the first annual spam conference. The idea was to share information about the best practices in the battle against spam, which at that time was estimated at 15-billion messages, or half of the world's daily e-mail traffic.
Most of the panels were as dry and technical as you might expect at an MIT conference, dealing mostly with specific products that were then just hitting the market. One of the panels promised to examine "the Sparse Binary Polynomial Hash Message Filtering technique. . . . As implemented in the GPLware "CRM114 Discriminator,' and combined with supervised ADABOOST learning, SBPH can deliver >99.5 percent accuracy on real-time e-mail without whitelists or blacklists."
But one panelist, John Graham-Cumming, a 36-year-old Oxford graduate with a doctorate in computer security, took a slightly different, more lighthearted approach that aimed to show just how smart and inventive an adversary the antispammers were up against. He titled his talk "The Spammers' Compendium." It was a list of spammers' techniques then in vogue and an explanation of how each worked.
He had latched onto the idea for such a survey a few months earlier as he developed an e-mail organizing software. Called POPFile, the software would automatically route the numerous e-mails he received daily into different categories, "Family," "Work," etc.
"It would make mistakes on spam," he said. "I didn't know why. Then I looked at the spam and found out the obfuscation and trickery going on."
That first list detailed 11 tricks, which Graham-Cumming gave witty names such as "Hypertextus Interruptus" and "Slice and Dice."
The compendium was so popular (among a certain group of people, that is), that on Monday an updated version was released by Sophos, the British antivirus and antispam software company for which Graham-Cumming now works. Now called the "Field Guide to Spam," the list is 27 tricks long "with another five or six in the wings," Graham-Cumming said.
"They're coming in at a rate of about two a month," he said.
////////// The Field Guide (www.sophos.com/spaminfo/) is like a bestiary of the ever-mutating messages that you might encounter in your inbox. Remember that e-mail from Amelia Schroeder, the one that looked like it came from a typing pool staffed by chimpanzees? Thumb through the Field Guide, and voila, you come across the ploy Graham-Cumming calls "Ze Foreign Accent," a common, not particularly sophisticated trick he first identified on Jan. 17.
In Amelia's message, the word penis was spelled PE.N1S. This was no typo. Spammers learned that replacing letters with numbers that resemble letters (1 for i, 4 for A) or putting nonsense accents on regular English words would confound a filter that was programmed to look for the normal rendering of the word. In a variation of that ruse, which Graham-Cumming calls "L O S T i n S P A C E," spammers splice punctuation marks between each letter. Hence, Viagra becomes V-i-a-g-r-a or even V'/'a'g'r'a'.
The filters at the Times, for example, are text-based and only look for terms and strings of words that they have been instructed to search for. The list, which includes sender addresses, phone numbers and product names, is roughly 26,400 items long.
If the filter hasn't been told that V'/'a'g'r'a is as spammy a word as Viagra, then "by default this system will deliver it," said Frank Levy, a Times senior systems engineer and data services administrator.
You may have noticed recently an increase in subject lines that contain gibberish strings of letters and numbers. Graham-Cumming calls this ploy "Speaking in Tongues," and it is another example of spammers trying to confound a filter into passing along a message that it cannot automatically identify as spam.
Text-based filters, such as the Trend Micro product the Times uses, also have a hard time with spam messages that contain nothing but a picture with information written inside it, Levy said. The filter sees only an image, not words, so it lets it through. Graham-Cumming calls this one "The Big Picture." Sometimes there is no picture, just an empty box. The picture appears only when the person clicks on it.
"It creates spam after the fact," Levy said.
Spammers also like to insert invisible commands that splice obvious spam words in two, convincing the filter, but not the eye, that "Via" and "gra" are somehow separate terms.
The tricks evolved to the point that spammers would conceal lengthy encyclopedia passages in a color that was so close to the background color that the eye could not register the difference. But the computer could, a technique Graham-Cumming dubbed "Camouflage." In one such e-mail, spammers hoped that a few hundred words of dull biographic material on King Juan Carlos of Spain would trick the filter into thinking the message was not actually about weight-loss pills.
But filters have also evolved, and the most sophisticated ones can spot these ploys with ease. In fact, the spammers' inventiveness has come to work against them.
"The more complex the obfuscation, the more identifiable it is as spam," Graham-Cumming says. "I call them Redcoat spam; they stand out pretty clearly against the undergrowth of e-mail. Those have become easy pickings."
The antispam software that works best generally uses something called "Bayesian" analysis. Named for Thomas Bayes, an 18th century English minister and mathematician, this form of analysis predicts the probability of a future occurence by information gleaned from past experience.
Bayesian analysis was the center of the artificial intelligence push in the 1970s, when scientists attempted to teach computers to think. It is also the mathematical basis for a doctor's cancer prognosis; the probability a patient will die in a certain number of months is based on how other patients with similar cancers have fared.
In 2002, a software engineer named Paul Graham proposed applying Bayesian, or adaptive, analysis to spam filtering. He had a 99 percent success rate, with no "false positives," good e-mail that is blocked by too crude a filter. Nearly every filter now hitting the market is Bayesian.
(Large organizations, such as newspapers that receive numerous unsolicited though legitimate e-mails, have been hesitant to use adaptive filters because they are afraid of so-called "false positives." What happens if the Times' filters intercept a legitimate news tip about Viagra causing cancer (not true, so far as we know)? Better to err on the side of caution and let e-mails through than thwart the Pulitzer Prize-winning story. Or the $100,000 deal, in the case of a stockbroker.)
With a Bayesian filter, the person receiving the message doesn't need to red-flag a term. He just tells the computer, "This is an example of spam," and the computer figures out why by analyzing the content.
For example, even if a spammer has disguised the word Viagra by cutting it in two, the filter knows that "gra" is not a word that would appear in any e-mail you would want to see. Likewise, it is not going to be confused by a lengthy passage about King Juan Carlos, because this is simply neutral information, neither good nor bad. The one reference to Viagra will correctly identify the message as spam.
Bayesian filters can even detect patterns that a human would be unlikely to notice. Spammers typically put two spaces in front of a dollar sign, Graham-Cumming said, a quirk of typing that no one is likely to do in a nonspam message.
The latest trend in bogus e-mails, Graham-Cumming said, is something he calls "Pico spams."
"What's getting through are really, really small spams that say as little as possible," he said. That's why you may have been noticing an increase in the number of e-mails that say little more than "Hi." and "Thought you might be interested . . . " with a link to some unspecified Web site that more often than not turns out to be pornography or mortgage rates.
But even these tricks would be caught by a good Bayesian filter, Graham-Cumming said. These e-mails have other problems - forged headers, for example - that would make them stand out.
"My prediction is all these tricks are going to go away," Graham-Cumming said.
Does that mean the antispammers will have won?
"I doubt it," he said.
The cost of sending spam is so small (it's free if the spammer is using an innocent person's service provider as the source) that it doesn't make economic sense for them to stop.
A Pew Center study on Americans' use of the Internet and attitudes toward spam found that a majority of e-mail users say spam has made them less trusting of e-mail in general, but 7 percent of e-mail users "have ordered a product that was offered in an unsolicited e-mail."
Spammers make money not only from selling you products, but by selling your address to other spammers. A list of 142-million e-mail addresses fetches $149, which says something about how cheaply our privacy is valued.
Congress just passed legislation that aims to control spam by requiring senders to provide legitimate postal addresses and to offer recipients the choice to "opt out" of future contact. But often this opt-out option is simply another trick, to get the recipient to confirm that his e-mail address is valid. In Europe, where laws are stricter, spammers may contact only those people who have chosen to "opt in" by putting their names on a list.
Most experts think the best defense against spam will be some combination of spam filters and legislation to prosecute fraudulent e-mail. Ultimately, this two-pronged approach may redirect spam to people who don't object to it. For such people, spam, while unsolicited, is not unwelcome.
The second spam conference at MIT is coming up in January. Graham-Cumming will be there. His talk this year is about ways to defeat adaptive filters. One of them is called "Web bugs," one-pixel graphics that spammers enclose in an e-mail. The spammers bombard the recipient with slightly altered versions of the same pitch. If one gets through and is opened, the Web bug is triggered to phone back to the spammers' Web site, thereby indicating to the spammer what variation made it through the filter.
"This technique does not yet seem to be in use," Graham-Cumming said, "but, given spammers' continued ingenuity, it is perhaps a matter of time."