My current project involves predicting the roles a person plays in their organization by automated analysis of the language they use in their emails, and the topology of the network of people they communicate with.
I have a latent interest in the computational biology of networks, as well. I've done some work on the evolution of gene regulatory networks.
This is Yet Another section of my site I haven't really pulled together yet.
Email Thread Reassembly Using Similarity Matching
J Yeh and A Harnly, 2006. Email Thread Reassembly Using Similarity Matching. Conference on Email and Anti-Spam (CEAS).
[PDF] |
[plaintext] |
[Bibtex]
Email thread reassembly is the task of linking messages by parent-
child relationships. In this paper, we present two approaches to
address this problem. One exploits previously undocumented
header information from the Microsoft Exchange Protocol. The
other uses string similarity metrics and a heuristic algorithm to
reassemble threads in the absence of header information. The pros
and cons of both methods are discussed. The similarity matching
method is evaluated using the Enron email corpus and found to
perform well.
Read the rest of this page »
Automation of Summary Evaluation by the Pyramid Method
A Harnly, A Nenkova, R Passonneau, and O Rambow, 2005. Automation of summary evaluation by the Pyramid Method. Recent Advances in Natural Language (RANLP).
[PDF] |
[plaintext] |
[Bibtex]
The manual Pyramid method for summary evaluation, which focuses on the
task of determining if a summary expresses the same content as a set of
manual models, has shown sufficient promise that the Document
Understanding Conference 2005 effort will make use of it. However, an
automated approach would make the method far more useful for developers
and evaluators of automated summarization systems. We present an
experimental environment for testing automated evaluation of summaries,
pre-annotated for shared information. We reduce the problem to a
combination of similarity measure computation and clustering. The best
results are achieved with a unigram overlap similarity measure and
single-link clustering, which yields high correlation to manual pyramid
scores (r=0.942, p=0.01), and shows better correlation than the n-gram
overlap automatic approaches of the ROUGE system.
Read the rest of this page »