Namescape Named Entity Annotation

Tagging of named entities in the Namescape project

 

As pointed out in f.i. [van Dalen-Oskam 2012], standard NE tagging is not adequate for a systematic study of the onymic landscape. We define an annotation scheme for named entity information as an extension to the TEI p5 annotation guidelines.

TEI extension for named entities

 

The main reasons to extend TEI instead of using the existing tagging guidelines for names are:
1. We need all searchable properties of names to be ‘inline’, not standoff. This entails the introduction of several additional attributes for tagged named entity occurrences.
2. We prefer one single tag for named entities, one (but different) tag for entity parts, whereas TEI either offers a range of tags (persName, geoName, orgName), or the single tag “name”, which latter choice is inconvenient for querying because tagging name parts would lead to nested “name” tags:


<name type="person">
 <name type="forename">Jan</name>
 <name type="surname'>Janssen</name>
</name>

 

The formal definition of the extension has the form of a TEI ODD file[1] , cf. delivered file “namescape.odd.xml”. We include the documentation from the ODD as a description of the proposed encoding guidelines

TEI modules

header, core, tei, textstructure, analysis, dictionaries, drama, namesdates, figures, iso-fs

Element ne

Namespace: http://www.namescape.nl/

Named Entity


<ns:ne
   xmlns:ns="http://www.namescape.nl/"
   type="person"
   gloss="MAIN CHARACTER"
   structure="forename"
   nymRef="nym7"
   normalizedForm="MICHIEL"
   resolution="plotInternal">
       <ns:nePart type="forename" sex="male">Michiel</ns:nePart>
 </ns:ne>

Belongs to element classes:

model.nameLike

Content model

 

<rng:ref name=”macro.phraseSeq” ></rng:ref>

Attributes

 

Name Description
Type Named entity typeData type: text

Possible values:person, location, organisation, misc

nymRef Reference to the “nym” element in the headerData type: tekst
gloss For persons, this defines the role of the character; for other entities, it is a subcategorisationData type: tekst
structure This (redundant) attribute gives the internal structure of a person name, like forename_surname, etcData type: tekst
normalizedForm Form of the entity without interpunction, genitive “s”, uppercasedData type: tekst
resolution Does the named entity refer to a plot-internal of plot-external concept?Data type: text

Possible values:plotInternal, plotExternal

 

Element nePart

Namespace: http://www.namescape.nl/

Person name part (forename, surname, addname)

Belongs to element classes:

model.persNamePart

Content model

<rng:ref name=”macro.phraseSeq” ></rng:ref>

Attributes
Name namespace description
type Part type(forename, surname, byname)Data type: text

Possible values:surname, forename, addname

sex Sex of person referred to; only with forenamesData type: text

Possible values:male, female, unknown

surnameType “Modern” means established, registered surname according to modern usage; “historical” for surname-like (geonymic, patronymic) designations predating modern practice. “collective” for usage like “the Clintons”, etc.Data type: text

Possible values:modern, historical, collective

normalizedForm Form of the entity part without interpunction, genitive “s”, uppercasedData type: text

 

Element nym

 

“Nym” is a TEI element. The listNym (in sourceDescription in header) enumerates different entities found in the text – something of a small lexicon of the names in the text. We add a few namescape-specific attributes.


Example entry:
<nym ns:id="nym7" ns:resolution="plotInternal" ns:gloss="MAIN CHARACTER" ns:type="person">
  <usg type="frequency">531</usg>
  <form type="nym">MICHIEL VAN BEUSEKOM</form>
  <form type="witnessed">
    <orth type="original">Michiel</orth>
    <orth type="normalized">MICHIEL</orth>
    <usg type="frequency">501</usg>
  </form>
  <form type="witnessed">
    <orth type="original">Michiels</orth>
    <orth type="normalized">MICHIEL</orth>
    <usg type="frequency">25</usg>
  </form>
  <form type="witnessed">
    <orth type="original">Michiel van Beusekom</orth>
    <orth type="normalized">MICHIEL VAN BEUSEKOM</orth>
    <usg type="frequency">4</usg>
  </form>
  <form type="witnessed">
    <orth type="original">v.B.</orth>
    <orth type="normalized">V.B.</orth>
    <usg type="frequency">1</usg>
  </form>
</nym>

Attributes
name namespace description
type http://www.namescape.nl/ Data type: textPossible values:person, location, organisation, misc
gloss http://www.namescape.nl/ Data type: text
resolution http://www.namescape.nl/ Data type: text

Element p

Attributes
name namespace description
id http://www.politicalmashup.nl Data type:
numTokens http://www.politicalmashup.nl Data type:

 

Element docStats

Namespace: http://www.politicalmashup.nl

Belongs to element classes

model.teiHeaderPart

Content model

<rng:zeroOrMore>

<rng:choice>

<rng:ref name=”histogram” ></rng:ref>

<rng:ref name=”parTokensMedian” ></rng:ref>

<rng:ref name=”pagebreaks” ></rng:ref>

</rng:choice>

</rng:zeroOrMore>

 

Element histogram

Namespace: http://www.politicalmashup.nl

Content model

<rng:zeroOrMore>

<rng:element name=”entry” ns=”http://www.politicalmashup.nl” >

<rng:attribute name=”bin” ns=”http://www.politicalmashup.nl” >

<rng:data type=”integer” ></rng:data>

</rng:attribute>

<rng:attribute name=”count” ns=”http://www.politicalmashup.nl” >

<rng:data type=”integer” ></rng:data>

</rng:attribute>

</rng:element>

</rng:zeroOrMore>

Attributes

name namespace description
description http://www.politicalmashup.nl Data type:

 

Element parTokensMedian

Namespace: http://www.politicalmashup.nl

Content model

<rng:data type=”integer” ></rng:data>

 

Element pagebreaks

Namespace: http://www.politicalmashup.nl

Content model

<rng:data type=”integer” ></rng:data>

 

Element collection

Namespace: http://www.politicalmashup.nl

Belongs to element classes:

model.teiHeaderPart

Content model

<rng:text></rng:text>

 

Element pseudonym-id

Namespace: http://www.politicalmashup.nl

Belongs to element classes:

model.teiHeaderPart

Content model

<rng:text></rng:text>

 

Element cleanParagraphs

Namespace: http://www.politicalmashup.nl

Belongs to element classes:

model.teiHeaderPart

Content model

<rng:text></rng:text>

 

Element genre

Namespace: http://www.politicalmashup.nl

Belongs to element classes:

model.teiHeaderPart

Content model

<rng:text></rng:text>