NLTK’s “ing words”: variations
NLTK, the “natural language toolkit” for Python, is a wonderful lightweight framework that provides a wealth of NLP tools. The other day, in reading through its documentation, I came across a little appendix describing the advantages of Python for implementing and (especially) teaching NLP.
The authors show a simple sample program to find and list words ending in “ing” from the standard input:
import sys
for line in sys.stdin.readlines():
for word in line.split():
if word.endswith('ing'):
print word
and contrast this elegant Python implementation with a variety of monstrosities in other languages. I won’t disagree that the Python is nice, but it seemed like a good little exercise to see whether I can’t produce something almost as good in my languages de jour.
To wit, a Ruby version:
for line in ARGF
for word in line.split
if word.match(/ing$/) then
puts word
end
end
end
which is almost identical to the Python version, though showing Ruby’s not-exactly-pretty fascination with the ‘end’ keyword.
And a Scala version using for-comprehensions. Note to Scala creators: It’s really frustrating having the various ways of executing Scala — as a script, as an object, etc. — all disagree slightly on how the outermost wrapper of a procedure should be formatted.
import scala.io._
object IngWords extends Application {
for (
val line <- Source.fromInputStream(System.in).getLines;
val word <- line.split(" ");
word.endsWith("ing")
)
Console.println(word)
}
(Aside: I need a decent syntax highlighting package for WP, it seems.)



