Computing.dcu.ie

What is Text?
Content-based Structure
ATHENS, Greece (Ap) A strong earthquake shook theAegean Sea island of Crete on Sunday but caused no in- • Describe the strength and the impact of an juries or damage. The quake had a preliminary magni- tude of 5.2 and occurred at 5:28 am (0328 GMT) on the sea floor 70 kilometers (44 miles) south of the Cretanport of Chania. The Athens seismological institute said the temblor’s epicenter was located 380 kilometers (238 miles) south of the capital. No injuries or damage werereported.
What is Text?
Domain-dependent Text Structures
A product of structural relations (coherence) S1: A strong earthquake shook the Aegean Sea island ofCrete on Sunday Regina Barzilay
S2: but caused no injuries or damage.
S3: The quake had a preliminary magnitude of 5.2 March 1, 2003
Analogy with Syntax
Motivation
Extract a representative subsequence from a set ofsentences Domain-independent Theory of Sentence Structure • Fixed set of word categories (nouns, verbs, . . .) Find an answer to a question in natural language • Fixed set of relations (subject, object, . . .) Order a set of information-bearing items into a coherent Find the best translation taking context into account Rhetorical Structure
Two Approaches to Text Structure
Content-based models
Rhetorical models
Rhetorical Structure Theory (Next Class)
Argumentative Zoning
Motivation
• Scientific articles exhibit (consistent across Many of the recent advances in Question Answering have followed from the insight that systems can benefit fromby exploiting the redundancy in large corpora.
BACKGROUND
Brill et al. (2001) describe using the vast amount of OWN CONTRIBUTION
data available on the WWW to achieve impressive per- RELATION TO OTHER WORK
formance . . .
The Web, while nearly infinite in content, is not a com- • Automatic structure analysis can benefit: plete repository of useful information . . .
Q&A
In order to combat these inadequacies, we propose a summarization
strategy in which in information is extracted from . . .
citation analysis
Today: Domain-Specific Models
Argumentative Zoning
BACKGROUNDMany of the recent advances in Question Answering have followed from the insight that systems can benefit from by exploiting theredundancy Argumentative Zoning of Scientific Articles
Brill et al. (2001) describe using the vast amount of data available onthe WWW to achieve impressive performance . . .
Supervised (Duboue&McKeown, 2001)
The Web, while nearly infinite in content, is not a complete repositoryof useful information . . .
Unsupervised (Barzilay&Lee, 2004)
OWN CONTRIBUTIONIn order to combat these inadequacies, we propose a strategy in whichin information is extracted from . . .
Examples
Features
We have proposed a method of clustering words Section 2 describes three parsers which are . . .
Contrast
However, no method for extracting the relation-ship from superficial linguistic expressions was • Lexical Features (“other researchers claim that”) Approach
Kappa Statistics
(Siegal&Castellan, 1998; Carletta, 1999)Kappa controls agreement P (A) for chance agreement • Goal: Rhetorical segmentation with labeling Own work: aim, own, textual
Background
Other Work: contrast, basis, other
Supervised Content Modeling
Semantic Sequence
• Goal: Find types of semantic information characteristic to a domain and ordering constraints age, gender, pmh, pmh, pmh, pmh, med-preop, med-preop, med-preop, drip-preop, med-preop, • Approach: find patterns in a set of transcripts ekg-preop, echo-preop, hct-preop, procedure, . . .
Annotated Transcript
He is 58-year-old male. History is significant for Hodgkin’s disease, treated with . . . to his neck, back and chest. Hyperspadias, BPH, hiatal hernia and proliferative lymph edema in his right arm. No IV’spmh or blood pressure down in the left arm. Medications — Inderal, Lopid, Pepcid, nitroglycerine and heparin. EKG has PAC’s. . . .
med-preop drip-preop Example of Learned Pattern
Content Models
• Content models represent topics and their ordering Topics: “strength”, “location”, “casualties”, . . .
Order: “casualties” prior to “rescue efforts” • Assumption: Patterns in content organization are Pattern Detection
Evaluation
Analogous to motif detectionT1: A B C D F A A B F DT2: F C A B D D F F Similarity in Domain Texts
Computing Content Model
TOKYO (AP) A moderately strong earthquake with a preliminary magni- tude reading of 5.1 rattled northern Japan early Wednesday, the CentralMeteorological Agency said. There were no immediate reports of casual- ties or damage. The quake struck at 6:06 am (2106 GMT) 60 kilometers(36 miles) beneath the Pacific Ocean near the northern tip of the main • State-transitions represent ordering constraints island of Honshu. . . .
ATHENS, Greece (Ap) A strong earthquake shook the Aegean Sea islandof Crete on Sunday but caused no injuries or damage. The quake had a preliminary magnitude of 5.2 and occurred at 5:28 am (0328 GMT)on the sea floor 70 kilometers (44 miles) south of the Cretan port ofChania. The Athens seismological institute said the temblor’s epicenterwas located 380 k ilometers (238 miles) south of the capital. . . .
Similarity in Domain Texts
Narrative Grammars
TOKYO (AP) A moderately strong earthquake with a preliminary magni-tude reading of 5.1 rattled northern Japan early Wednesday, the CentralMeteorological Agency said. There were no immediate reports of casual-ties or damage. The quake struck at 6:06 am (2106 GMT) 60 kilometers • Propp (1928): fairy tales follow a “story grammar” (36 miles) beneath the Pacific Ocean near the northern tip of the mainisland of Honshu. . . .
• Barlett (1932): formulaic text structure facilities ATHENS, Greece (AP) A strong earthquake shook the Aegean Sea island of Crete on Sunday but caused no injuries or damage. The quake hada preliminary magnitude of 5.2 and occurred at 5:28 am (0328 GMT)on the sea floor 70 kilometers (44 miles) south of the Cretan port of • Wray (2002): texts in multiple domains exhibit Chania. The Athens seismological institute said the temblor’s epicenter was located 380 k ilometers (238 miles) south of the capital. No injuriesor damage were reported.
Initial Topic Induction
Estimating Emission Probabilities
Agglomerative clustering with cosine similarity measure (Iyer&Ostendorf:1996,Florian&Yarowsky:1999, Barzilay&Elhadad:2003) The Athens seismological institute said the temblor’s epicenter was lo-cated 380 kilometers (238 miles) south of the capital.
Seismologists in Pakistan’s Northwest Frontier Province said the temblor’s epicenter was about 250 kilometers (155 miles) north of the provincialcapital Peshawar.
• Estimation for the “insertion” state: The temblor was centered 60 kilometers (35 miles) northwest of theprovincial capital of Kunming, about 2,200 kilometers (1,300 miles) southwest of Beijing, a bureau seismologist said.
Model Construction
From Clusters to States
• Each large cluster constitutes a state • Agglomerate small clusters into an “insert” state • Determining states, emission and transition Viterbi re-estimation
Information Ordering: Algorithm
• Decode the training data with Viterbi decoding • Use the new clustering as the input to the parameter Estimating Transition Probabilities
Application: Information Ordering
Text summarization
Natural Language Generation
g(ci, cj) is a number of adjacent sentences (ci, cj) “get marry” prior to “give birth” (in some domains) Summarization: Algorithm
Baselines for Ordering
Input: source textTraining data: parallel corpus of summaries and sourcetexts (aligned) • “Straw” baseline: Bigram Language model • Employ Viterbi on source texts and summaries • “State-of-the-art” baseline: (Lapata:2003) • Compute state likelihood to generate summary represent a sentence using lexico-syntactic
compute pairwise ordering preferences
find optimally global order
• Given a new text, decode it and extract sentences Application: Summarization
Evaluation: Data
specify types of important information
use information extraction to identify this
• Domain-independent summarization: (Kupiec et represent a sentence using shallow features
use a classifier
Baselines for Summarization
Results: Summarization
• “Straw” baseline: n leading sentences • “State-of-the-art” Kupiec-style classifier: Sentence representation: lexical features and
Classifier: BoosTexter
Results: Ordering
Ordering: Learning Curve
Summarization: Learning Curve

Source: http://www.computing.dcu.ie/~ebicici/Week3/DomainTextStructure.pdf

apibs.info

A SERMON PUBLISHED ON THURSDAY, JULY 27TH, 1905, AT THE METROPOLITAN TABERNACLE, NEWINGTON, ON LORD’S-DAY EVENING, MAY 2ND, 1875. “Take up the cross, and follow me.”-Mark 10:21. YOUR mind’s eye can see that procession yonder. Notice it carefully. Atthe head of it there walks One whom we rightly call Master and Lord; youmay know him by the prints of the nails in his hands and fe

lannacher.com

Gebrauchsinformation: Information für Patienten Irbepress 150 mg-Filmtabletten Lesen Sie die gesamte Packungsbeilage sorgfältig durch, bevor Sie mit der Einnahme dieses Arzneimittels beginnen, denn sie enthält wichtige Informationen. - Heben Sie die Packungsbeilage auf. Vielleicht möchten Sie diese später nochmals lesen. Wenn Sie weitere Fragen haben, wenden Sie sich an I

Copyright ©2010-2018 Medical Science