aaron.harnly.net

NLTK’s “ing words”: variations

April 22nd, 2007 by aaronharnly

NLTK, the “natural language toolkit” for Python, is a wonderful lightweight framework that provides a wealth of NLP tools. The other day, in reading through its documentation, I came across a little appendix describing the advantages of Python for implementing and (especially) teaching NLP.

The authors show a simple sample program to find and list words ending in “ing” from the standard input:

import sys for line in sys.stdin.readlines(): for word in line.split(): if word.endswith('ing'): print word

and contrast this elegant Python implementation with a variety of monstrosities in other languages. I won’t disagree that the Python is nice, but it seemed like a good little exercise to see whether I can’t produce something almost as good in my languages de jour.

To wit, a Ruby version: for line in ARGF for word in line.split if word.match(/ing$/) then puts word
end end end

which is almost identical to the Python version, though showing Ruby’s not-exactly-pretty fascination with the ‘end’ keyword.

And a Scala version using for-comprehensions. Note to Scala creators: It’s really frustrating having the various ways of executing Scala — as a script, as an object, etc. — all disagree slightly on how the outermost wrapper of a procedure should be formatted.

import scala.io._ object IngWords extends Application { for ( val line <- Source.fromInputStream(System.in).getLines; val word <- line.split(" "); word.endsWith("ing") ) Console.println(word)
}

(Aside: I need a decent syntax highlighting package for WP, it seems.)

NLTK’s “ing words”: variations

April 22nd, 2007 by aaronharnly

NLTK, the “natural language toolkit” for Python, is a wonderful lightweight framework that provides a wealth of NLP tools. The other day, in reading through its documentation, I came across a little appendix describing the advantages of Python for implementing and (especially) teaching NLP.

The authors show a simple sample program to find and list words ending in “ing” from the standard input:

import sys for line in sys.stdin.readlines(): for word in line.split(): if word.endswith('ing'): print word

and contrast this elegant Python implementation with a variety of monstrosities in other languages. I won’t disagree that the Python is nice, but it seemed like a good little exercise to see whether I can’t produce something almost as good in my languages de jour.

To wit, a Ruby version: for line in ARGF for word in line.split if word.match(/ing$/) then puts word
end end end

which is almost identical to the Python version, though showing Ruby’s not-exactly-pretty fascination with the ‘end’ keyword.

And a Scala version using for-comprehensions. Note to Scala creators: It’s really frustrating having the various ways of executing Scala — as a script, as an object, etc. — all disagree slightly on how the outermost wrapper of a procedure should be formatted.

import scala.io._ object IngWords extends Application { for ( val line <- Source.fromInputStream(System.in).getLines; val word <- line.split(" "); word.endsWith("ing") ) Console.println(word)
}

(Aside: I need a decent syntax highlighting package for WP, it seems.)

Scala is my new Ruby

March 19th, 2007 by aaronharnly

Scala is my new Ruby, i.e. the language I love to tinker in. Rather more practical, too, as the fact that Ruby is dog-slow has gotten in the way of my work more than once recently.

From DTD to Rails Migrations

March 7th, 2007 by aaronharnly

In the category of tools that I want, but better not make right now, lest it turn into a “paroxysm of generalization”:

I have a DTD, describing a bunch of entities, their relationships, and their attributes. I’m going to push data from a set of XML files (adhering to said DTD) into a Ruby-on-Rails savvy database. Wouldn’t be nice to have a simple tool that, in the most general way possible, would, given that DTD:

  1. Issue a series of ’script/general model Foo’ commands for the various entities.

  2. Populate the Rails migration files appropriately, to manage the creation of the database tables for these entities. That would include inserting :foo_id columns for has-many and has-and-belongs-to-many relationships (though differentiating between the two might require some human supervision), and exploiting the wonderful Red Hill Foreign Key Migrations to create appropriate FK constraints.

  3. In addition / as an alternative to using the Red Hill plugin, insert the appropriate has_many / habtm declarations in the model files.

  4. And finally, make a script that can read a set of such XML files and fill the database appropriately.

Well, sounds nice to me, anyway. Put it on the someday-maybe list.