Methods for Mining and Summarizing Text Conversations by Giuseppe Carenini, Gabriel Murray, Raymond Ng

A result of net Revolution, human conversational facts -- in written types -- are gathering at a good looking price. even as, advancements in speech expertise permit many spoken conversations to be transcribed. participants and enterprises have interaction in e mail exchanges, face-to-face conferences, running a blog, texting and different social media actions. The advances in typical language processing supply considerable possibilities for those "informal files" to be analyzed and mined, hence growing quite a few new and helpful functions. This publication provides a collection of computational the right way to extract info from conversational information, and to supply traditional language summaries of the knowledge. The e-book starts off with an outline of uncomplicated suggestions, comparable to the variations among extractive and abstractive summaries, and metrics for comparing the effectiveness of summarization and diverse extraction initiatives. It additionally describes many of the benchmark corpora utilized in the literature. The ebook introduces extraction and mining equipment for appearing subjectivity and sentiment detection, subject segmentation and modeling, and the extraction of conversational constitution. It additionally describes frameworks for undertaking discussion act attractiveness, determination and motion merchandise detection, and extraction of thread constitution. there's a particular specialize in acting these types of projects on conversational facts, resembling assembly transcripts (which exemplify synchronous conversations) and emails (which exemplify asynchronous conversations). Very contemporary methods to accommodate blogs, dialogue boards and microblogs (e.g., Twitter) also are mentioned. the second one half this ebook makes a speciality of normal language summarization of conversational facts. It supplies an outline of numerous extractive and abstractive summarizers constructed for emails, conferences, blogs and boards. It additionally describes makes an attempt for construction multi-modal summarizers. final yet no longer least, the booklet concludes with recommendations on issues for extra improvement. desk of Contents: advent / history: Corpora and review equipment / Mining textual content Conversations / Summarizing textual content Conversations / Conclusions / ultimate innovations

3. EVALUATION METRICS FOR SUMMARIZATION 33 • The group presented their final budget. The first sentence is from a system summary and the second sentence is from a gold-standard human summary of the document. We can see that the bigram final budget occurs in each sentence, so we say that there is a bi-gram overlap between this sentence and the gold-standard. If we permit intervening terms between the words of the bigram, we can identify further overlaps, which are called skip bigram overlaps. The following pair of sentences illustrates skip bigram overlap: • So let’s look at the final revised budget.

A major reason why the summarization community has been slow to adopt “official” evaluation metrics (compared with, say, the machine translation community) is precisely owing to conflicting results regarding such correlations in different domains. Liu and Liu [2010] is a recent example of work trying to measure the usefulness of a popular intrinsic evaluation software package (ROUGE, described in Chapter 2) on noisy conversational data. Extrinsic evaluations, on the other hand, measure the usefulness of a summary in aiding some real-world task, such as document classification or reading comprehension.

Machine summaries are then annotated for SCUs as well and can be scored based on the sum of SCU weights compared with the sum of SCU weights for an optimal summary. , SCU6 ), and two possible optimal summaries containing four SCUs are indicated. These summaries are optimal because they each contain all of the SCUs of weight 4, the highest weight level, and the remaining SCUs from weight 3, the next highest level of the Pyramid. Using the SCU annotation, one can calculate both precision-based and recall-based summary scores for a given machine summary.

