Bayesian Classification for weblog entries

XML.com: Working with Bayesian Categorizers [Nov. 19, 2003] by John Udell makes an interesting read. I disagree with him though that the classification didn’t work well - I think you have to tune the algorithm more than he did (and maybe classify Author, Subject and Body separately and then combine the weighting).

That’s one of the ideas I’ve been playing with anyhow. It makes things a lot more complicated (and much slower I guess - especially as there doesn’t seem to be a fast (C based) Bayes implementation in either PHP or python) but should give extra accuracy.

I couldn’t understand why he wanted the entries to show in only one category. The way my system (which I know exactly how it *should* work; I just haven’t coded it yet) works is by classifying into multiple categories, using a cut-off % (which will probably vary depending on how many categories there are with a score > 0.01) to restrict it to the relevant categories only.

0 comments ↓

There are no comments yet...Kick things off by filling out the form below.

Leave a Comment