Research

Technology

Software designed to fend off spam may use different techniques. Below, you will see what kind of state-of-the-art methods are built into VirusBuster's spam filter applications.

There are basically two approaches for a system to decide whether an incoming message is spam or not. One is to check on the sender's address, the other is to analyze message content and structure.

Filtering by the sender's address

Black list
Most spam originates from a limited number of addresses. In most cases, they belong to otherwise harmless machines taken over by spammers. By rejecting mail sent from these addresses, you can also help the operators of such abused machines. Though they may have to go through a hard time, they may finally change their configuration, which will reduce overall spam threat.

A simple way to configure a spam filter is to set up a list of rejected  addresses in a configuration file. This is the black list, which will then requires ongoing manual maintenance.

White list
If you decide not to accept mail sent from unknown locations, you can set up a list containing all your mail partners, and then program the spam filter to reject all mail coming from unlisted addresses. Such a  list is called a white list. Senders of rejected messages will be informed automatically about how to get onto the white list or what other contact (e.g. alternative e-mail address) is available.

Realtime Blackhole List (RBL)
An RBL list contains IP addresses which were used - directly or indirectly - to send spam. Such lists are available on the internet and maintained continuously. Before accepting a message, the server checks if the sender's IP address is on an RBL, and, if it is, the message will be rejected.

Filtering by message content and structure

VirusBuster products use content filtering based on a statistical method, called the Bayes method. This is one of the most powerful filtering principles. It analyzes incoming messages, splits them into parts, and uses their attributes (word occurrences, message structure)  to categorize and compare them with its database. The database can be continuously updated and taught, providing evolving, customizable protection to users who can also contribute to the solution's development.

It is important to note that such algorithms may occasionally categorize valuable messages as spam. Normally, however, this causes much less trouble than the loads of spam pouring in.

Custom database

Users can complement VirusBuster's database with their own, custom database, which will gradually learn typical spam characteristics in the given environment. In this way, detection will be more efficient, and the risk of false positives is lower.

When creating a custom spam database, the most important task is to gather a lot of samples to teach the system. Samples must be selected with care. In general, there should be more valuable mail in the sample collection than spam, since spam text varies much less than the text in normal mail.

Send samples!

Spam filter reliability is measured with two indicators: the detection rate and false positive rate. Both depend to a great extent on teaching and verifying samples. Of the different approaches to spam filtering, the statistical method shows the best indicators. False positives are about 0.1%, and the detection rate can exceed 99%.

VirusBuster's spam filtering is based on the statistical approach, where detection reliability increases with the number of samples available. Your contribution to database development is also welcome. Send us spam you received - in its original form - to spam@virusbuster.hu. VirsuBuster customers can send false positives (also in their original form) to spamlab@virusbuster.hu.

Important!
Please send spam and false positive messages IN THEIR ORIGINAL FORM. Compress and attach them or insert them into your message, but don't use forwarding.