Thursday, August 10, 2006

Incredible data-mining project on US Senate

Ars Technica has a fascinating report on a major data-mining project involving the Congressional Record. The project uses an unsupervised learning program to automatically code several million speeches as one of forty (automatically generated) categories. The authors show some interesting variation in agenda salience over time demonstrating this method, suggesting that the method is attuned to fluctuations. The model could be relatively easily ported to other languages, including Italian, as long as one has a program to 'stem' the words so as to populate the vocabulary matrix.

Technorati Tags: , , , , ,

1 comment:

Chris Hanretty said...

Hi Chris, not really interested in your article... just thought I'd say hello as we share the same name albeit I am a woman and you a young man! Oh yeah I married a Hanretty. We live in Australia