Summarizing the message of the previous chapters Content Negotiation is a nearly impossible thing to do. The complexity of the topic Information in Emails yields an intelligent and fault tolerant implementation. The approach has to be modeled by nature. How do human beings understand and rate such complex information structures?
Patterns
The Nature Model puts before. Information is negotiated by patterns correlated and rated by a network of experiences (knowledge). Combinations of patterns with already well proven patterns construct an information.
Pattern Correlation
Recurring or similar patterns can be grouped together into an unique pattern for the negotiation of the belonging category. The following example shows two approaches with different results:
- Invitation to team meeting 9:30 GMT 1 room 4.22
- Invitation to our wedding May, 26th 2006
One Shot Matching
The simplest approach is to identify an unique and significant key pattern. This pattern can be the word Invitation. This word is a key word of the Private Category. Using the One Shot Matching approach results the two informations being correlated as private email. Unfortunately this only applies to the second information.
Relation Matching
This more intelligent approach identifies an unique and significant key pattern similar to the One Shot Matching approach. The two informations differ, but are rated private since. Now a second (and potentially even more) key pattern is identified for both informations. These secondary key patterns can be meeting and wedding.
Now the knowledge base is looked up. Both, Business Email and Private Email contain correlations with the pattern Invitation. The pattern Meeting is only existing in the business category. Vice versa the pattern Wedding is only found in the private category. The correct result is information 1 being rated as Business Email and information 2 being rated as Private Email.
Deriving Rules
In the informational aspect a pattern can be formulated as a rule. Rules can be implemented and used by an algorithm. Using a set of rules in combination with a special mnemonic can help solving semiotic problems.
Integrity of Rules
Rules have to ensure integrity. This means that new rules must not impact previously correctly negotiated information. An existing information correlated doubtless with a class may not be moved to another class by applying the new rule. For example the R.I.P.P.E.R algorithm (see next chapter [ripper]) does a great job on this. When new rules are derived the algorithm clarifies the integrity of classes and rules. If conflicts appear, a so called Reduction Phase is invoked and the ruleset is reduced to ensure integrity again.
Teaching Rules
The aim of teaching rules is to discover and build a knowledge base (Nature Model). There are several methods derived from the topic of Software Engineering Techniques to apply the process of teaching.
Unit Training
A Content Negotiation Algorithm using Unit Training (see figure [unittraining]) is trained on examples for each category before the processing begins. The aim is to provide both unique and problematical examples, as well. This helps the algorithm to conclude and build rules of integrity. After the training phase has finished the algorithm should produce good results. If not, another training is needed.
Figure: Unit Training and Processing
Real Time Training
The Real Time Training] (see figure [realtimetraining]) to build a knowledge base is similar to the Software Engineering Waterfall Concept. The software begins processing immediately, often misattributing. The training process then is to let the categorizer software work, improving the learning process by user interception when the software fails. With every user interception the software derives new rules. In case of an interception after miscorrelation the training is double effective:
- The software knows what to prevent in future.
- The software knows a new kind of correlation.
Figure: Real Time Training with User Interception
Paradigm Training
The Paradigm Training is a multi-phase approach (see figure [paradigmtraining]). After finishing the learning phase the processing begins.
- Observing Phase The software takes a look on the user actions and stores every event in a local database.
- Conclusion Phase The software reads all saved events from the database and tries to derive rules and knowledge by conclusion.
- Processing Phase The software starts processing using the rules and knowledge base.
This technique follows the paradigm: Watch -> Think -> Do
Figure: Paradigm Training – Understanding User Actions
Static Negotiation
The Pattern and Rules Concept is implemented in nearly all available email software available in our days. The simplest and most sufficient approach for small sites is the approach of static negotiation. This technique has a set of user defined rules to determine how to categorize incoming email. No knowledge base is concerned.
Mail User Agents
Popular MUAs like Microsoft Outlook and its derivatives implement static negotiation with the help of rule sets. These rules help pre-categorizing email in the every day work. These agents have the ability to user-define simple rules using predefined templates and macros (see also figure [applemail-static]):
- If sender is foo@bar.com then move to folder Foomail
- If subject contains My Project then move to folder Projects
- If sender is my wife then mark mail with color red
- If subject contains Viagra then move to folder Spam
- If sender is postmaster then move mail to folder Notification and mark with color magenta
Figure: Apple Mail Using Static Rules
The problem with Static Negotiation Techniques is concerning new (and therefore unknown) email information, as well as suffering all semiotic problems like described in section [semiotic].
Mail Transfer Agents
MTAs like Sendmail, Procmail and Exim4 provide simple rule processing like the MUAs do. It is possible to configure email routes and aliases, deny mail from specific hosts or addresses, move email, or use external tools to categorize and rate email. A promising approach is the Milter Interface, which allows third party applications to rate an email using a plugin interface (see figure [milter]). This allows complex filter software to be plugged into dumb MTAs.
Figure: MTA using the Milter Interface
The range of milter software runs the gamut from simple Spam- and Anti Virus Filters up to complex content negotiation software.
Possible MTA Rules:
- If client domain is evil.org then deny relaying
- If client ip is other than 127.0.0.1 then deny connection
- If sender is mywife@bar.com then move email to mailbox private
- If recipient is foo@bar.com then recipient is real@foo.com
- If subject matches the regular expression /viagra/i then move to mailbox spam
- If email was rated spam (ask milter interface) then move email to mailbox spam
- If email contains virus (ask milter interface) then delete email and log
Dynamic Negotiation
The disadvantages of the Static Negotiation approach is obvious. The lack of a knowledge base results in information being new information every time. The correct negotiation of information is potentially insecure with each arriving email.
Due to the dynamic diversity of email information a dynamic approach is needed to follow and serve the progress of the information flood. Machine Learning Techniques serve this issue.
Machine Learning
The advantage of rule sets using Machine Learning is the ever growing Knowledge Base. The training of rules never has to end (see figures Real Time- and Paradigm Training [realtimetraining] and [paradigmtraining]).
Unique advantages are:
- Proceeding on a knowledge base
- Improving the knowledge base and therefore improving results
- User interception possible
- Derive rules and patterns by conclusion
- Fault tolerance
A good example of a (simple) Machine Learning implementation is the MUA called Apple Mail] (see figure [applemail]). The negotiation software starts processing with an initial (pre-trained) knowledge base, causing good results already. Once misrated an user interception is possible. In this case Apple Mail offers a button to tell the software that the last action was incorrect.
Figure: Smart Real Time Teaching in Apple Mail