MAINPAGE  |  HEADLINES  |  FEEDBACK  |  HOW TO JOIN  |  ABOUT US  |  EDITORIAL STAFF  |  HELP  |  SEARCH  |  FORUM  |  SPECIAL

 

Guide to Fighting Spam. The Latest Tools
By Dan Calloway, TheWorldJournal.com

  E-MAIL THE AUTHOR       PRINT VERSION




By all accounts, spam is on the rise. Brightmail, an AntiSpam vendor estimates that 42 percent of all Internet e-mail is now spam. That's an 8 percent rise since 2001. The rise in spam, however, has been met with an almost equal rise in anti-spam vendors who have come up with many solutions to rid your mailboxes of unwanted email. At least 25 such vendors are currently offering products they promise will get rid of most, but not all your spam.

One such product is Network Associates' acquisition of SpamAssassin Pro. I have used SpamAssassin Pro with Outlook XP and it works very well. Network Associates is rolling out a new product based on SpamAssassin's heuristics engine algorithm, which is an enterprise-level software package called SpamKiller. This product was first released as an individual consumer product but now has been reengineered for the corporate workplace.

E-mail has become an all-important tool of communication today not only for the home user, but more so for the business world. As a result, corporations can no longer tolerate the proliferation of unwanted, unsolicited mail from vendors, porn sites, and others who will stop at next to nothing to make certain you see their e-mail.

One of the biggest problems faced by corporations and consumers today, however, is how do you block the unsolicited e-mail without blocking the mail you wish to see. Multiple techniques have been developed since it is essential to understand that e-mail is generated by human beings who want their e-mail to get through to you because they either want you to buy their products or they want you to visit their websites. Like viruses and Internet worm writers, spammers adapt their messages to beat the system.

Some common methods of detecting spam today are:


Keyword Searches

Keyword searches are static filters that scan subject lines and message body text looking for words that you have identified. If the software or anti-spam plug-in detects these words or combinations of these words as you have specified, the e-mail is identified as spam and it is either blocked, deleted, or moved to a junk mail folder or other folder you specify.

While keyword searches give the user a very granular control over the incoming e-mail, there is a high risk of "false positive" which winds up preventing the user from seeing mail that they want to see. For instance, if you specify the word "breast" in your keyword list, an email containing information about breast cancer will be blocked since that word is a part of the text in a legitimate e-mail for which you may have an absolute interest.

In addition, keyword searches as a means of filtering spam lends itself well to being defeated by spammers since they may intentionally misspell certain words so that they are not detected by your software. For instance, spammers might intentionally misspell the word "porn" as "p0rn" using the zero instead of the letter "o" in order to get around the software's detection process. Spammers can also use HTML or Hypertext Markup Language, which can be invisible to the reader but detectable by the software, thus spoofing the filter.

Black Lists

A black list is a list that is created by the software vendor and then expanded upon by the user of the anti-spam software which blocks all e-mail from an email address or header that is unwanted.

There is a wide use of black lists available on the Internet. Once such company is Mail Abuse Prevention Systems (MAPS) located a www.mailabuse.com.This particular site has a Realtime Blackhole List (RBL) that is a database of URL's or IP addresses of mail servers known to be friendly, or at least neutral to spammers.

Another well-known black list is SpamCop, located at www.spamcop.net. And, another is Open Relay Database, located at www.ordb.org. Many anti-spam products maintain their own black lists and include optional subscriptions to third-party black list services. One major drawback to black lists, however, is that if you block an entire domain, you may be blocking as much as 90 percent of wanted mail while blocking only 10 percent of unwanted spam.

White Lists

A white list is a collection of trusted e-mail addresses and domains. White lists will definitely allow mail coming from a trusted site to come through, but do nothing to block spam. Therefore, this list must be used in conjunction with black lists to achieve the proper balance so that you minimize "false positives" while receiving all the wanted e-mail you wish. The use of White Lists is beneficial because it increases the speed with which your e-mail comes to you since any e-mail on the lists bypasses all other filters which typically look for spam.

On the downside, however, white lists require constant maintenance to be very effective. If not properly maintained, you run a high risk of losing e-mail from legitimate sources.

Hashes/Signatures

This is a very popular anti-spam technique. A computer program derives the checksum or cryptographic hash of a known spam message, in effect creating a signature or fingerprint of that message. Because spammers send tens of thousands of messages that are identical, the message can be easily identified from the fingerprint and blocked effectively.

The strength of this type of anti-spam technique is that it is 100% certain if the fingerprint of the message matches a future message. IT WILL BE BLOCKED.

However, spammers have discovered workarounds to fingerprints by inserting random strings of letters or HTML code into the subject line or body text in the message which are invisible to the recipient of the mail but readable by the software in its calculations of the checksum mentioned above. Placing these random characters in messages makes what is inevitably an identical e-mail circumvent the checksum algorithm in this technique allowing an email which should be blocked to get through.

Heuristics

Heuristic analysis is another method which involves running a e-mail message through a variety of tests. These tests include searching for characteristics that are typically inherent in spam. Each characteristic is assigned a spam probability, and the message is given a cumulative probability score based on the overall test results. If a certain probability threshold is reached, the e-mail is determined to be spam and is blocked. If not, the e-mail goes through.

By weighing a variety of characteristics, heuristic analysis increases the confidence that a message with a high spam score is actually junk mail.

However, heuristics can produce "false positives." This has caused anti-spam vendors to look at developing new tests to reduce the number of these "false positives," which are usually a result of spammers trying to workaround the tests themselves. Another downside to heuristics techniques is that the length of time required to check each e-mail can be laborious and time-consuming.

Reverse DNS Lookups

This particular method runs DNS queries on the IP addresses of the incoming e-mails to determine if the host names identified match actual host names for those IP addresses of the sender. Because many spammers use misconfigured hosts to disguise the source of the spam, a query that doesn't recover a matching host and IP address is a good indication that the message is junk.

On the downside, however, many legitimate e-mail servers are incorrectly configured, or have intentionally not registered a name with DNS, so a reverse query that doesn't return a matching host name isn't incontrovertible proof that spam exists. In addition, running DNS queries on a large number or e-mails, such as you would find in large corporations, is very taxing on network resources of that corporation.

Header Analysis

This technique scans for e-mail headers that deviate from specifications outlined in RFCS. Many spammers, however, will spoof headers to make it harder for investigators to track down the source of the spam, making malformed or spoofed headers a strong indicator of unwanted mail. Header analysis has an advantage over other techniques in that header information is much shorter than full body text scans.

Bayesian Filtering

This is a form of text classification that can be applied to spam detection. This type of filtering learns the more you use it. By examining the language used in a set of spam messages and the language used in normal messages, this process filters out spam by making the comparison.

As new messages arrive, the filter rounds up the words or phrases that have the highest probabilities in either direction--spam or not spam. Then the filter calculates a new probability that the message is spam or not spam using the individual scores of the collected words.

The creator of a Bayesian filter called CRM114, Bill Yerazunis, claims that over 99.9 percent of all e-mail is accurately detected as either spam or not spam. The website for more information on this filter is http://crm114.sourceforge.net.

The major drawback of the Bayesian filter is it is more computationally intensive than other methods for detecting spam.

Bayesinan filtering is an open-source project which is currently receiving a lot of attention due to its high accuracy rates and extremely low "false positive" rates. CRM114 and ifile (www.nongnu.org/ifile) are the first tools that applied Bayesian filtering to spam detection. Spambayes (http://spambayes.sourceforge.net) is developing new techniques to improve Bayesian filtering. All three are open-source.


When trying to determine what methods you should look for and employ to rid your mailboxes of spam, you want to keep in mind that one consideration that should be first and foremost is granular control. Filtering can increase the likelihood that spam will be blocked. Choosing a solution that gives you multiple options for dealing with suspected e-mails is probably your best bet.

I am currently using a subscription service that I'm running for 30 days free to see if I like it or not. The subscription services is called SpamArrest (www.spamarrest.com). The cost of the service is $19.95 for 6 months or $34.95 for 1 year. This service can be used with any POP3 mail client, including Outlook Express, Outlook and others. I happen to be using it with Eudora Pro 5.2.1. I'll let you know how it tested against some of the other methods I've investigated earlier in a future report.

© April 16, 2003
 



MAINPAGE  |  HEADLINES  |  FEEDBACK  |  HOW TO JOIN  |  ABOUT US  |  EDITORIAL STAFF  |  HELP  |  SEARCH  |  FORUM  |  SPECIAL

Sponsored Links

 Web Hosting Forum - Web hosting, marketing and webmaster related issues. Find the best hosting for your website!

>> Buy a Link

Since 1999 © TheWorldJournal.com, All rights reserved.
Student Media Network

For the best advertising rates at TheWorldJournal.com (120x600 - new banner format by the Interactive Advertising Bureau), click here.

Back to top
e-mail: info@theworldjournal.com
sales: sales@theworldjournal.com