<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>aaron.harnly.net &#187; research</title>
	<atom:link href="http://harnly.net/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://harnly.net</link>
	<description>SÃ¬, abbiamo un'anima. Ma Ã¨ fatta di tanti piccoli robot.</description>
	<pubDate>Sat, 23 Aug 2008 04:59:59 +0000</pubDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
			<item>
		<title>Email Thread Reassembly Using Similarity Matching</title>
		<link>http://harnly.net/2006/research/email-thread-reassembly-using-similarity-matching/</link>
		<comments>http://harnly.net/2006/research/email-thread-reassembly-using-similarity-matching/#comments</comments>
		<pubDate>Fri, 28 Jul 2006 15:24:26 +0000</pubDate>
		<dc:creator>aaronharnly</dc:creator>
		
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://harnly.net/2006/research/email-thread-reassembly-using-similarity-matching/</guid>
		<description><![CDATA[J Yeh and A Harnly, 2006. Email Thread Reassembly Using Similarity Matching. Conference on Email and Anti-Spam (CEAS). 
[PDF] &#124; 
[plaintext] &#124;
[Bibtex]

Email thread reassembly is the task of linking messages by parent- 
child relationships. In this paper, we present two approaches to 
address this problem. One exploits previously undocumented 
header information from the Microsoft Exchange [...]]]></description>
			<content:encoded><![CDATA[<p>J Yeh and A Harnly, 2006. <em>Email Thread Reassembly Using Similarity Matching</em>. Conference on Email and Anti-Spam (CEAS). 
<a href="/publications/Yeh-2006-Email Thread Reassembly Using Similarity Matching.pdf"><img src="/images/software/filetypes/thumbs/pdf.png" alt="pdf">[PDF]</a> | 
<a href="/publications/Yeh-2006-Email Thread Reassembly Using Similarity Matching.markdown"><img src="/images/software/filetypes/thumbs/markdown.png" alt="markdown">[plaintext]</a> |
<a href="/publications/Yeh-2006-Email Thread Reassembly Using Similarity Matching.bib"><img src="/images/software/filetypes/thumbs/bib.png" alt="markdown">[Bibtex]</a></p>

<p>Email thread reassembly is the task of linking messages by parent- 
child relationships. In this paper, we present two approaches to 
address this problem. One exploits previously undocumented 
header information from the Microsoft Exchange Protocol. The 
other uses string similarity metrics and a heuristic algorithm to 
reassemble threads in the absence of header information. The pros 
and cons of both methods are discussed. The similarity matching 
method is evaluated using the Enron email corpus and found to 
perform well. </p>

<p><span id="more-60"></span></p>

<h1>Full Text </h1>

<h2>ABSTRACT</h2>

<p>Email thread reassembly is the task of linking messages by parent- 
child relationships. In this paper, we present two approaches to 
address this problem. One exploits previously undocumented 
header information from the Microsoft Exchange Protocol. The 
other uses string similarity metrics and a heuristic algorithm to 
reassemble threads in the absence of header information. The pros 
and cons of both methods are discussed. The similarity matching 
method is evaluated using the Enron email corpus and found to 
perform well. </p>

<h2>1. INTRODUCTION </h2>

<p>One key difference between emails and other types of documents 
is the existence of threading, i.e. hierarchical, referential 
relationships among emails. Recently, email thread structure has 
been profitably employed in several applications, including email 
search [3], email summarization [9], email classification [1], and 
visualization [5]. However, the lack of reliable, widely applicable 
methods for thread reassembly has limited the use of thread 
structure. </p>

<p>Email thread reassembly is the task of relating messages by 
parent-child relationships, grouping messages together based on 
which messages are replies to which others. In many cases, this 
task can be achieved based on specific data within email headers. 
However, no standard protocol for thread structure headers is 
universally observed, making thread reassembly  </p>

<p>In this paper, we present two approaches to threading email 
messages. The first employs a specific header â€“ â€œThread-Index,â€? 
which is defined in the Microsoft Exchange Protocol, while the 
second links two messages by mainly measuring the content 
similarity between them. It takes account of several heuristics as 
well, such as subject, time, and sender/recipient relationships 
among emails. Furthermore, since some messages in a thread may 
not exist in the corpus (e.g., if deleted), we also discuss how to 
recover missing messages. Here, a missing message, as defined in 
[2], is an email that does not itself present in the archive but has 
been quoted in subsequent emails kept in a userâ€™s folder. 
The contributions of this work are twofold. First, this paper offers 
a method of thread reassembly in the absence of header 
information. Second, we evaluated the method in a case study 
with the Enron corpus. In the following, Section 2 introduces 
previous related work. In Sections 3-4, we describe the proposed 
methods which aim to address the email thread reassembly task. 
Preliminary results and discussions are given in Section 5 and 
Section 6. </p>
]]></content:encoded>
			<wfw:commentRss>http://harnly.net/2006/research/email-thread-reassembly-using-similarity-matching/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Automation of Summary Evaluation by the Pyramid Method</title>
		<link>http://harnly.net/2005/research/automation-of-summary-evaluation-by-the-pyramid-method/</link>
		<comments>http://harnly.net/2005/research/automation-of-summary-evaluation-by-the-pyramid-method/#comments</comments>
		<pubDate>Sat, 22 Oct 2005 17:28:19 +0000</pubDate>
		<dc:creator>aaronharnly</dc:creator>
		
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://harnly.net/2005/blog/autobiography/automation-of-summary-evaluation-by-the-pyramid-method/</guid>
		<description><![CDATA[A Harnly, A Nenkova, R Passonneau, and O Rambow, 2005. Automation of summary evaluation by the Pyramid Method. Recent Advances in Natural Language (RANLP).
[PDF] &#124; 
[plaintext] &#124;
[Bibtex]

The manual Pyramid method for summary evaluation, which focuses on the
task of determining if a summary expresses the same content as a set of
manual models, has shown sufficient promise [...]]]></description>
			<content:encoded><![CDATA[<p>A Harnly, A Nenkova, R Passonneau, and O Rambow, 2005. Automation of summary evaluation by the Pyramid Method. Recent Advances in Natural Language (RANLP).
<a href="/publications/Harnly-2005-Automation of Summary Evaluation by the Pyramid Method.pdf"><img src="/images/software/filetypes/thumbs/pdf.png" alt="pdf">[PDF]</a> | 
<a href="/publications/Harnly-2005-Automation of Summary Evaluation by the Pyramid Method.markdown"><img src="/images/software/filetypes/thumbs/markdown.png" alt="markdown">[plaintext]</a> |
<a href="/publications/Harnly-2005-Automation of Summary Evaluation by the Pyramid Method.bib"><img src="/images/software/filetypes/thumbs/bib.png" alt="markdown">[Bibtex]</a></p>

<p>The manual Pyramid method for summary evaluation, which focuses on the
task of determining if a summary expresses the same content as a set of
manual models, has shown sufficient promise that the Document
Understanding Conference 2005 effort will make use of it.  However, an
automated approach would make the method far more useful for developers
and evaluators of automated summarization systems.  We present an
experimental environment for testing automated evaluation of summaries,
pre-annotated for shared information.  We reduce the problem to a
combination of similarity measure computation and clustering.  The best
results are achieved with a unigram overlap similarity measure and
single-link clustering, which yields high correlation to manual pyramid
scores (r=0.942, p=0.01), and shows better correlation than the n-gram
overlap automatic approaches of the ROUGE system.</p>

<p><span id="more-61"></span></p>

<h1>Full Text</h1>

<ul>
<li>Aaron Harnly</li>
<li>Ani Nenkova</li>
<li>Rebecca Passonneau</li>
<li>Owen Rambow</li>
</ul>

<p><code>Department of Computer Science, Center for Computational Learning Systems
Columbia University
New York, NY, USA</code></p>

<p><aaron, ani, becky, rambow @cs.columbia.edu></p>

<h2>Abstract</h2>

<p>The manual Pyramid method for summary evaluation, which focuses on the
task of determining if a summary expresses the same content as a set of
manual models, has shown sufficient promise that the Document
Understanding Conference 2005 effort will make use of it.  However, an
automated approach would make the method far more useful for developers
and evaluators of automated summarization systems.  We present an
experimental environment for testing automated evaluation of summaries,
pre-annotated for shared information.  We reduce the problem to a
combination of similarity measure computation and clustering.  The best
results are achieved with a unigram overlap similarity measure and
single-link clustering, which yields high correlation to manual pyramid
scores (r=0.942, p=0.01), and shows better correlation than the n-gram
overlap automatic approaches of the ROUGE system.</p>

<h2>Introduction</h2>

<p>Automatic summarization is usually evaluated through comparison to human
summarization choices for the same texts. Traditionally,
the comparison is done by eliciting human judgments on content.  When
humans write short, abstractive summaries based on their reading of
multiple documents, they select content they think belongs in a summary,
and put it in their own words.  While many words and phrases may be similar
to those another human summarizer would employ, people can use different
forms of the same words (inflectional or derivational variants), different
word order, syntactic structure, and paraphrases.  See for example the
spans of words in bold below, coming from five different summaries of the
same set of documents  about a Swissair
crash off of Nova Scotia in 1998, all expressing the fact that the cause of
the crash has not been determined.</p>
]]></content:encoded>
			<wfw:commentRss>http://harnly.net/2005/research/automation-of-summary-evaluation-by-the-pyramid-method/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.598 seconds -->
