| Oct. 12th, 2005 01:10 am The Porter Stemmer in Prolog I recently completed a project to implement in Prolog the Porter Stemming algorithm, which uses a rough model of English morphology to quickly estimate the stem of a word for Information Retrieval (IR) purposes. The code is available here. I first attempted to do this a few years ago with Jon McClain, who has since graduated. Our original approach was flawed, and we soon abandoned it as we became distracted by other things. I came across our code a few days ago and decided to rewrite it with a better method in mind. It seems to comply with the algorithm specification, correctly stemming the sample vocabulary on the website. Unfortunately, probably largely because Prolog most naturally handles data as linked lists with poor random access time, it does not compare very favorably to the implementations in other languages as far as speed is concerned. On my computer, it takes 3.4 seconds to process a megabyte of text, compared to the Perl version's 2.8.
Update: It is now posted on Martin Porter's website. Leave a comment |
|