Learning Rules that Classify E-Mail (1996)  William W. Cohen - Two methods for learning text classifiers are comparedon classification problems that might arise in filtering and filing personal e-mail messages: a "traditional IR" method based on TF-IDF weighting, and a new method for learning sets of "keyword-spotting rules" based on the RIPPER rule learning algorithm.It is demonstrated that both methods obtain significant generalizations from a small number of examples; that both methods are comparable in generalization performance on problems of this type; and that both methods are reasonably efficient, even with fairly large training sets.
