Avoiding Deceptive Annotations in the Semantic Web
Semantic Web, annotations are easily abused. If we cannotresolve deceptive annotations, we may have negative expe-
Deceptive annotations are becoming an important problem
riences with Semantic Web applications due to the unsure
as more and more people start to tag documents, and the
problem has become an argument to against the Semantic
In general, there are two opposing opinions to this prob-
Web. Skeptics believe that developers make mistakes when
lem.1 Some people believe that deceptive annotation is a
annotating documents, and developers may even abuse an-
type of cheating. In their definition, deceptive annotations
notations from time to time. Due to the difficulty of detect-
are false claims whose purpose is to mislead. Advocates of
ing and resolving deceptive tags, these skeptics openly won-
total freedom on the Internet, however, suggests that every-
der whether semantic annotations may bring more trouble
body has a right to say and write whatever is on their mind.
than benefit. In this paper we present a deception avoidance
So, in essence no annotations are “deceptive,” but are only
resolution method. By adding personal specifications about
ontology concepts through instance recognition semantics,
In the Semantic Web, deceptive annotations are the anno-
Semantic Web users can avoid being deceived by improperly
tations with instances that deviate from their commonly ex-
annotated data. At the same time, our deception avoidance
pected meanings. For example, if “UTAH” is annotated as a
strategy also passively discourages annotators from falsely
N AT ION , this annotation is deceptive because a N AT ION
tagging documents by decreasing the profit they can gain
is commonly understood as an independent country in the
from deceptive annotations. Finally, our deception avoid-
ance mechanism still preserves the right to annotate text
At the same time, however, we must not prohibit the free-
dom of people to annotate as they wish. Things can changeand new knowledge is discovered from time to time. For
example, annotating “Montenegro” as a N AT ION before
Deceptive annotations, or deceptive tags, are becoming
June 3, 2006 would have been deceptive. But it is no longer
more and more of a problem as people start to tag their
deceptive after June 3, 2006, when Montenegro declared its
documents. The problem has, in fact, become an argument
independence. Moreover, people should have the freedom
to against the Semantic Web. As an example, at a recent
to annotate a document according to their own understand-
conference in Boston, Peter Norvig, the Google Director of
ing even if it is seen as deceptive by others. For example,
Search and an AAAI Fellow, asked Tim Berners-Lee, the in-
a Montenegro independence movement member may anno-
ventor of the Web and the current director of W3C, a ques-
tate “Montenegro” as a N AT ION even before June 3, 2006.
tion about deception in the Semantic Web [4]. Norvig said,
This was what the person believed and expected although it
“We deal every day with people who try to rank higher in
would certainly have been a deceptive annotation as viewed
the results and then try to sell someone Viagra when that’s
by others. To the end that the web is designed to be an
not what they are looking for. With less human oversight
open and free space, a resolution to the deceptive annota-
with the Semantic Web, we are worried about it being eas-
tion problem should not override the freedom of tagging.
ier to be deceptive.” In this question, Norvig reveals one of
There are three strategies we can apply to solve the decep-
his concerns about the Semantic Web. Without question,
tive annotation problem: deception protection, deception
Internet deception is a severe problem. Particularly in the
detection, or deception avoidance. A deception protection
strategy would allow only trusted authorities to annotate all
Written mainly while this author was on an extended visit
web pages and would encrypt annotations so that no one canabuse them. Based on current Internet security technolo-gies, we can believe that the deceptive annotation problemcan be solved by deception protection methods. A problem
Permission to make digital or hard copies of all or part of this work for
with this resolution, however, is that it generally dismisses
personal or classroom use is granted without fee provided that copies are
the right of individual web developers to annotate their own
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
A deception detection strategy would check the correct-
republish, to post on servers or to redistribute to lists, requires prior specific
ness of mappings between annotated data and their annota-
permission and/or a fee. SAAW ’06 Athens, GA, USA
Copyright 200X ACM X-XXXXX-XX-X/XX/XX .$5.00.
http://www.bloghop.com/tagview.htm?itemid=deceptive
tions based on formal definitions of rules in ontologies. Such
a process is usually expensive to execute, however. For ex-
ample, to check whether “Montenegro” is a N AT ION , a
process must at least compare the annotating date to the
exception phrase: \s.*ba(th)?s?\b.*r(oo)?ms?
independence date of Montenegro. Even worse, it could be
context keyword: b(r|d)s? | bdrms? | bed(rooms?)?
very difficult to construct these rules and agree on them.
Both defining rules as well as processing them would likely
be costly. Researchers must first resolve all these sophisti-cate issues before we could really apply deception detection
In this paper, we present a deception avoidance strat-
egy. Rather than detecting false annotations, the deceptionavoidance strategy avoids looking for potentially deceptive
of digits representing numbers between 1 and 20. Defined
cases. Our method is based on two observations and as-
auxiliary filtering specifications help to precisely identify an
sumptions: (1) users need not care about whether an anno-
instance. In Figure 1, we declare the left immediate con-
tation is deceptive unless they are interested in the annota-
text (left context phrase) to be a legal word boundary and
tion; and (2) if users are interested in an annotation, they
the right immediate context (right context phrase) to be the
can avoid being deceived by explicitly and clearly express-
regular expression “r(oo)?ms?” with possibly several other
ing their interests about the annotation. We proffer instance
words in between, e.g. “large room.” The exception phrase
recognition semantics to allow Semantic Web users to specify
excludes some negative phrases from the previously specified
their personal interests to avoid deceptive annotations. The
patterns, which is the right context phrase in our example.
degree of vulnerability to deceptive annotations depends on
In our case, we exclude, for example, “bath room” to be a
how precisely they have specified their instance recognition
legal right context phrase. The context keywords are a care-
semantics in ontologies. Moreover, our deception avoidance
fully selected set of keywords that typically appear close to
strategy also passively discourages annotators from falsely
the concept locations. They are mainly for the purpose of
tagging documents by decreasing the profit they can gain
improving the accuracy of automated semantic annotation
from deceptive annotations. At the same time, our decep-
processes. Although this example somehow looks compli-
tion avoidance method still preserves the right of people to
cated, many times ISR declarations can be as simple as a
list of potential instances, such as a list of country names
To explain how our strategy works, we briefly introduce
instance recognition semantics in Section 2. In Section 3,
ISR augmentations to ontologies help separate the work
we show how we use instance recognition semantics in our
load between domain experts (who are individual annota-
tors) and data-extraction engineers (who design and build
data-extraction engine). This separation is key in our au-tomatic deception avoidance mechanism. Because ISR rules
are declarative, domain experts can create instance recog-nition rules for domain concepts without having to do any
Instance recognition semantics, which can also be called
programming; and because ISR rules are embedded inside
instance semantics recognizers (ISR),2 are formal specifi-
of ontologies, domain experts need not be concerned about
cations that identify instances of a concept C in ordinary
mapping recognized concepts to domain ontologies.4 These
text. The text may be unstructured, semi-structured, or
two properties of ISR rules enable domain experts to cre-
fully structured. For Semantic Web applications, the con-
ate and update their ISR declarations without the need to
cept C should be a lexical element of a formal ontology
consult with data-extraction engineers. Since domain ex-
(e.g. concepts such as date, time, place, location, name,
perts know their domain best, their ISR declarations can
telephone number, email address, various weights and mea-
sures, etc.). Thus, instance recognition semantics of an on-
Using ISR declarations, domain experts implicitly “per-
tology concept (e.g. T elephone N umber) interpret instances
sonalize” the meanings of specified ontology concepts. Here
in a text fragment (e.g. the contact number in “Call me at
personalize means that domain experts cast the recogni-
222-1234.”) to have the intensional meaning of the defined
tion of a generally defined concept to their own expecta-
Figure 2 illustrates this idea with a simple exam-
Figure 1 shows a partial ISR declaration we have used
ple. Without ISR declarations, an arbitrary positive inte-
in an apartment-rental domain ontology for the concept
ger number could be a legal instantiation of the concept
BedroomCount.3 Although recognition patterns can be ex-
BedroomCount,5 although in reality we rarely can find a
pressed variously in different syntaxes, in our study we have
single apartment with more than 4 or 5 bedrooms. With ISR
used Perl-style regular expressions. In general, an ISR dec-
declarations, we can restrict the instantiation of BedroomCount
laration includes defined recognition patterns and auxiliary
to be between 3 and 4, perhaps because we need an apart-
filtering specifications. We specify recognition patterns in an
ment with at least three bedrooms and we do not antici-
external representation clause. In Figure 1 we specify that
pate ever needing more than four bedrooms. Therefore, our
any legal instantiation of BedroomCount should be a string
BedroomCount with this ISR declaration becomes a spe-
2We avoid the acronym IRS (Internal Revenue Service) be-
4Mapping concepts to domain ontologies is a major concern
cause instance recognition semantics are not tax collectors.
in current semantic annotation approaches [1, 2, 3].
3The ontology can be found in the DEG web site:
5In theory there is no restriction why one cannot build a
online advertisement,7 and we have annotated it manually
<Feature>Large</Feature> <BedroomCount>2.5
</BedroomCount> room apartment 70 qm available
<AvailableDate>July 1</AvailableDate>
Although we still may not know what the meaning of a “.5bedroom” is, somebody truly has expressed the number of
Figure 2: Concept with IRS declarations equivalent
to declare a special subclass to itself.
modify our external representation declaration so that it ac-cepts n.5 as a legal representation for room numbers, or wecan keep ignoring them and continue to treat them as de-
cialization of the BedroomCount without an ISR declara-
ceptive annotations because we do not like n.5 bedrooms.
tion or with a different (more generalized) ISR declaration.
Both choices are fine, and the decision totally depends on
Hence the meaning of the concept BedroomCount is per-
sonalized to our perspective. With personalized concepts,
Using this same technique, we can resolve the problem
ontologies become personalized, augmented by personalized
that a deceiver falsely annotates “Viagra” as a F OOD in
order to attract more readers to a Viagra-sales web page. This deception may not be easy to detect through ontologyreasoning because Viagra is edible, which satisfies one of the
crucial feature about F OOD. But we can avoid this prob-
Deceptive annotations are harmless if users are not in-
lem by applying our deception avoidance method. Based
terested in them. For example, if “Viagra” is falsely anno-
on different conditions, there are two ways to avoid this de-
tated as a F OOD, users will not be deceived unless they
ception. First, if users specify a list of F OOD items that
are looking for F OOD. Therefore, users can automatically
does not contain Viagra, straightforwardly they avoid this
avoid deceptive annotations in which they are not inter-
deceptive web page based upon unmatched interests. Sec-
ested. Moreover, if users are interested in an annotation,
ond, if users are open to trying new foods that they do not
they can avoid being deceived by explicitly and clearly ex-
know, they can simply leave the external representation of
pressing their interests about the annotation. For example,
their F OOD declaration blank, which means that they ac-
if users are looking for F OOD, and they have clearly speci-
cept whatever is annotated as a F OOD to be F OOD. Then
fied that their F OOD consists of lists of breads, meats, and
they will be deceived by this deceptive annotation the first
vegetables, they can also avoid being deceived by “Viagra”
time. But after they learn that this is a deception, they
since it is not on their list. These two scenarios constitute
can avoid it by simply adding an exception phrase “Viagra”
the basis of our deception avoidance methodology.
for their external representation about F OOD. Hence they
By augmenting ISR declarations, ontologies become per-
would never be trapped in this deception again. This up-
sonalized ontologies. Therefore, any annotations that con-
date avoids not only this deceptive web page, but also all
tradict specified personal interests can be automatically ig-
the other web pages that play the same deceptive trick on
nored. For example, the following house-rental advertise-
ment is from a real online web site,6 and we have inten-
In our deception avoidance method, we must emphasize
tionally annotated it deceptively with our apartment-rental
that the vulnerability of users to deceptive annotations de-
pends very much on how carefully users build and improvetheir ISR declarations. It is fair, however. Just like in any
<BedroomCount>3.5</BedroomCount> Bed,
human society, humans who are too lazy to learn will be
<BathroomCount>2.5</BathroomCount> Bath
repeatedly deceived by the same trick. Only if they learn
House with <Feature>Pool</Feature>,
from previous experiences, i.e. only if they update their own
<Feature>Large LCD HDTV</Feature>,
ISR declarations by their experiences, can they avoid being
<Feature>High speed internet</Feature>
deceived again. When we continually update our knowledgeby our experiences, we become harder and harder to de-
By applying the ISR declarations in Figure 1, however, ma-
ceive. Hence our deception avoidance method is partly an
chines can avoid being deceived by these deceptive anno-
tations because “3.5” is not recognized as a data instance
Since our method does not depend on annotations, but
of interest by the specified external representation for the
rather on recognizers, our method preserves total freedom
concept BedroomCount. In this process, machines do not
for annotators to tag whatever they want to any textual
generate any logic rules from ontologies to detect the se-
content. Our method is applied to the user side rather than
mantic meaning of this annotated data; nor do machines
the annotator side. While users have the power to avoiding
perform any domain identification methods to verify the ap-
what they believe to be deception, annotators can still an-
plication domain for this advertisement. Machines avoid this
notate everything freely. For example, our method does not
deceptive case simply because of the ISR declaration in the
prohibit annotators from tagging “Viagra” to be a F OOD.
If our deception avoidance methods were used extensively
On the other hand, perhaps we begin to notice several n.5
on the web, deceptive annotators would find that they lose
bedroom counts. The following example is also from a real
much more than they gain by deceptive annotations. For
6http://www.villas2000.com/frbvo/homes/3345.php.
7http://berlin.craigslist.org/apa/173491092.html. Checked
example, the reason deceivers falsely annotate “Viagra” asa F OOD is that they want to increase the hit rate of a webpage.
With our deception avoidance strategy, real food-
seekers will soon learn that this is a deceptive web pageand thus avoid visiting it any more. At the same time, realViagra-seekers may look for annotations such as M EDI-CIN E rather than F OOD because they do not think Viagrais a F OOD. Even if deceivers annotate “Viagra” simulta-neously to be both F OOD and M EDICIN E, they still de-creases their own opportunities to have their real customersbecause the thought that Viagra is not F OOD overridesthe thought that Viagra is both F OOD and M EDICIN E. Therefore, our mechanism not only provides an active decep-tion avoidance method for users, but also becomes a passivedeception avoidance strategy from an annotator’s perspec-tive.
Deceptive annotations are becoming a severe problem as
more and more people start to tag web data.
has been used as an argument against the realization of theSemantic Web. In this paper we presented a new deceptionavoidance resolution. By augmenting ontologies with ISRdeclarations, our method not only provides active deceptionavoidance for users, but may also passively decrease the rateof deception by reducing the chances that deceivers mayobtain benefits from deceptive annotations. We expect thatour work may lead to more attention being paid to thisimportant and interesting research problem.
[1] Y. Ding, D.W. Embley, and S.W. Liddle. Automatic
creation and simplified querying of semantic webcontent: An approach based on information-extractionontologies. In Proceedings of the first Asian SemanticWeb Conference (ASWC 2006), LNCS 4185, pages400–414, Beijing, China, September 2006.
[2] S. Handschuh, S. Staab, and F. Ciravegna. S-cream
semi-automatic creation of metadata. In Proceedings ofthe European Conference on Knowledge Acquisition andManagement (EKAW-2002), pages 358–372, Madrid,Spain, October 2002.
[3] A. Kiryakov, B. Popov, I. Terziev, D. Manov, and
D. Ognyanoff. Semantic annotation, indexing, andretrieval. Journal of Web Semantics, 2(1):49–79,December 2004.
[4] C. Lombardi. Google exec challenges Berners-Lee.
http://news.zdnet.com/2100-9588 22-6095705.html.
Sutter Health Cancer Service Line: Prostate Committee Chapter 12. Recovery & Coping with the Effects of Prostate Cancer With contributions from Nancy L. Brown, Ph.D., Palo Alto Medical Foundation Research Institute; and Stan Rosenfeld, Patient Advocate, Marin Cancer Institute, Marin General Hospital. Recovery Every treatment option will have different issues related to recove
Curriculum Vitae Seyed Ebrahim Eskandari MSc Researcher, Center for Research and Training in Skin Diseases and Leprosy, Tehran University of Medical Sciences, 79 Taleghani Avenue, Tehran 14166, Iran Phone: (98-21) 897 0657 Fax phone: (98-21) 897 0658 Email: [email protected] I. PERSONAL Name : Seyed Ebrahim Last Name : Eskandari Date and Place of