|
|
Technorati Profile |
Syndicate this weblog (RSS/XML) |
Autonomous Agent - October 12th, 2005

| Oct. 12th, 2005 01:10 am The Porter Stemmer in Prolog I recently completed a project to implement in Prolog the Porter Stemming algorithm, which uses a rough model of English morphology to quickly estimate the stem of a word for Information Retrieval (IR) purposes. The code is available here. I first attempted to do this a few years ago with Jon McClain, who has since graduated. Our original approach was flawed, and we soon abandoned it as we became distracted by other things. I came across our code a few days ago and decided to rewrite it with a better method in mind. It seems to comply with the algorithm specification, correctly stemming the sample vocabulary on the website. Unfortunately, probably largely because Prolog most naturally handles data as linked lists with poor random access time, it does not compare very favorably to the implementations in other languages as far as speed is concerned. On my computer, it takes 3.4 seconds to process a megabyte of text, compared to the Perl version's 2.8.
Update: It is now posted on Martin Porter's website. Leave a comment | |

| Oct. 12th, 2005 03:30 pm The Music of Philip John Brooks My father, Philip John Brooks, is a musician. He started out touring the UK and US playing rock music, but now he focuses on original folk music in both American and British styles. The songs on his first solo album, Fishermen of Fleetwood, are about growing up in a fishing village on the northwest coast of England, while those on his newest, A Different Time, a Different Place are about the American Southwest, especially New Mexico, and its current events and colorful history. He now has a new website, The Tin Whistle, with links to reviews, places to sample and buy his CDs, and a touring schedule (but an unfortunate color scheme, which I think may change soon). Anyway, go listen, and feel free to order a few CDs!
I'll be touring with him as a roadie of sorts in New Mexico the end of this month and beginning of the next one. I'm looking forward to it. I haven't been there in years, but it used to be a frequent Brooks family vacation spot before I went off to college. 3 comments - Leave a comment | |

| Oct. 12th, 2005 04:02 pm C is BAD As part of a homework assignment a few years ago, I needed to be able to use Prolog to generate random numbers from a Gaussian (bell curve) distribution with a given mean and standard deviation. A friend found the code that does this in the Python standard library, so I translated it into Prolog.
When SWI Prolog 5.2.10 decompiles a compiled program, it names the first variable to occur A, the second B, and so on. Part of my translation of the Python looked like this in the source file: gaussian(SD,Mean,R) :-
gaussian_z(Z),
!,
retractall(gaussian_z(_)),
R is Mean + SD * Z. When compiled and displayed, it reads:gaussian(A, B, C) :-
gaussian_z(D), !,
retractall(gaussian_z(E)),
C is B+A*D. In spite of its apparent dislike of the language, I've since translated this from Prolog into C (really, C++) for a quick and dirty Gaussian random number generator for use in SAGA.2 comments - Leave a comment | |

Forward a Day
|
|