aaron.harnly.net

Research

My current project involves predicting the roles a person plays in their organization by automated analysis of the language they use in their emails, and the topology of the network of people they communicate with.

I have a latent interest in the computational biology of networks, as well. I've done some work on the evolution of gene regulatory networks.

This is Yet Another section of my site I haven't really pulled together yet.

Publications & Talks

Email Thread Reassembly Using Similarity Matching

J Yeh and A Harnly, 2006. Email Thread Reassembly Using Similarity Matching. Conference on Email and Anti-Spam (CEAS). pdf[PDF] | markdown[plaintext] | markdown[Bibtex]

Email thread reassembly is the task of linking messages by parent- child relationships. In this paper, we present two approaches to address this problem. One exploits previously undocumented header information from the Microsoft Exchange Protocol. The other uses string similarity metrics and a heuristic algorithm to reassemble threads in the absence of header information. The pros and cons of both methods are discussed. The similarity matching method is evaluated using the Enron email corpus and found to perform well.

Read the rest of this page »

Automation of Summary Evaluation by the Pyramid Method

A Harnly, A Nenkova, R Passonneau, and O Rambow, 2005. Automation of summary evaluation by the Pyramid Method. Recent Advances in Natural Language (RANLP). pdf[PDF] | markdown[plaintext] | markdown[Bibtex]

The manual Pyramid method for summary evaluation, which focuses on the task of determining if a summary expresses the same content as a set of manual models, has shown sufficient promise that the Document Understanding Conference 2005 effort will make use of it. However, an automated approach would make the method far more useful for developers and evaluators of automated summarization systems. We present an experimental environment for testing automated evaluation of summaries, pre-annotated for shared information. We reduce the problem to a combination of similarity measure computation and clustering. The best results are achieved with a unigram overlap similarity measure and single-link clustering, which yields high correlation to manual pyramid scores (r=0.942, p=0.01), and shows better correlation than the n-gram overlap automatic approaches of the ROUGE system.

Read the rest of this page »