What is Bayesian Filtering and What does it Mean?

Email marketers deal with many anti-spam methods, but one question keeps being asked. What is Bayesian Filtering? Simply put, Bayesian Filtering is an antispam system based upon experience and tracking, rather than strict rules.

Think of a buffet. There’s a LOT of food to choose from. You mentally score the food you see, whether you realize it or not. You have “Like it, “Love it,” and “Hate it” categories. One particular item may score higher if paired with another food, or lower if it’s made a particular way.

Bayesian Filtering in email is something like this. Every word in a message is examined separately, and a database is built. If you mark an email as junk, then words contained in that email will have their score adjusted based on that action.  Another email marked as “not junk” can cause another adjustment. Then, when a new email arrives, each word is checked against the database of words to decide if the email is spam or ham (spam is bad email, ham is good email).

Every word is tracked. Even  non-words are tracked.  The word “the” is tracked. It’s so common in email that it becomes a non-issue.  I call it ubiquitous irrelevancy. Other words such as “beach,” or “avenue,” or even “Ave.,” will not be quite as common, and will be scored differently.

You would think that this would make the whole process much easier.  Knowing how the system works can help you choose better performing words, right?

There’s a problem with this idea.  Everybody has a different database of words.  Someone with an antispam software running at the client, or PC level will have a Bayesian Filtering database that is totally different than someone else’s. Email service providers may have a centralized database, or may allow each user’s choices to affect their own database. They may do both.

Because everyone is running their own database of words, there is no way to tailor your message to fit easily with this antispam method.

The drawback to the Bayesian Filtering method is that it can easily cause false positives, especially if too much weight (score) is given to it in the antispam tool’s settings.

In the end, Bayesian Filtering can be a tough barrier to overcome. Some words are needed, but can be issues.  Here are some words that are high-scoring in my own email client’s Bayesian Filtering database:

  • claim
  • completely
  • rules
  • least
  • matches
  • label
  • file
  • relay
  • confirm
  • experiences
  • editor
  • spam
  • longer

Crazy, right?  Those aren’t even uncommon words.  You probably use at least 2-3 of them together in an email on a regular basis. On the other hand, other common words will score lower, pulling the overall score low enough for the email to deliver…

…in theory.

When you’re fighting to get your message into the inbox, Bayesian Filtering can be frustrating. Hopefully, understanding a little more about it can help reduce the frustration.

Be the first to comment

Leave a Reply