Tagging of named entities in the Namescape project
As pointed out in f.i. [van Dalen-Oskam 2012], standard NE tagging is not adequate for a systematic study of the onymic landscape. We define an annotation scheme for named entity information as an extension to the TEI p5 annotation guidelines.
TEI extension for named entities
The main reasons to extend TEI instead of using the existing tagging guidelines for names are:
1. We need all searchable properties of names to be ‘inline’, not standoff. This entails the introduction of several additional attributes for tagged named entity occurrences.
2. We prefer one single tag for named entities, one (but different) tag for entity parts, whereas TEI either offers a range of tags (persName, geoName, orgName), or the single tag “name”, which latter choice is inconvenient for querying because tagging name parts would lead to nested “name” tags:
<name type="person">
<name type="forename">Jan</name>
<name type="surname'>Janssen</name>
</name>
The formal definition of the extension has the form of a TEI ODD file[1] , cf. delivered file “namescape.odd.xml”. We include the documentation from the ODD as a description of the proposed encoding guidelines
TEI modules
header, core, tei, textstructure, analysis, dictionaries, drama, namesdates, figures, iso-fs
Element ne
Namespace: http://www.namescape.nl/
Named Entity
<ns:ne
xmlns:ns="http://www.namescape.nl/"
type="person"
gloss="MAIN CHARACTER"
structure="forename"
nymRef="nym7"
normalizedForm="MICHIEL"
resolution="plotInternal">
<ns:nePart type="forename" sex="male">Michiel</ns:nePart>
</ns:ne>
Belongs to element classes:
Content model
<rng:ref name=”macro.phraseSeq” ></rng:ref>
Attributes
Name | Description |
Type | Named entity typeData type: text
Possible values:person, location, organisation, misc |
nymRef | Reference to the “nym” element in the headerData type: tekst |
gloss | For persons, this defines the role of the character; for other entities, it is a subcategorisationData type: tekst |
structure | This (redundant) attribute gives the internal structure of a person name, like forename_surname, etcData type: tekst |
normalizedForm | Form of the entity without interpunction, genitive “s”, uppercasedData type: tekst |
resolution | Does the named entity refer to a plot-internal of plot-external concept?Data type: text
Possible values:plotInternal, plotExternal |
Element nePart
Namespace: http://www.namescape.nl/
Person name part (forename, surname, addname)
Belongs to element classes:
Content model
<rng:ref name=”macro.phraseSeq” ></rng:ref>
Attributes
Name | namespace | description |
type | Part type(forename, surname, byname)Data type: text
Possible values:surname, forename, addname |
|
sex | Sex of person referred to; only with forenamesData type: text
Possible values:male, female, unknown |
|
surnameType | “Modern” means established, registered surname according to modern usage; “historical” for surname-like (geonymic, patronymic) designations predating modern practice. “collective” for usage like “the Clintons”, etc.Data type: text
Possible values:modern, historical, collective |
|
normalizedForm | Form of the entity part without interpunction, genitive “s”, uppercasedData type: text |
Element nym
“Nym” is a TEI element. The listNym (in sourceDescription in header) enumerates different entities found in the text – something of a small lexicon of the names in the text. We add a few namescape-specific attributes.
Example entry:
<nym ns:id="nym7" ns:resolution="plotInternal" ns:gloss="MAIN CHARACTER" ns:type="person">
<usg type="frequency">531</usg>
<form type="nym">MICHIEL VAN BEUSEKOM</form>
<form type="witnessed">
<orth type="original">Michiel</orth>
<orth type="normalized">MICHIEL</orth>
<usg type="frequency">501</usg>
</form>
<form type="witnessed">
<orth type="original">Michiels</orth>
<orth type="normalized">MICHIEL</orth>
<usg type="frequency">25</usg>
</form>
<form type="witnessed">
<orth type="original">Michiel van Beusekom</orth>
<orth type="normalized">MICHIEL VAN BEUSEKOM</orth>
<usg type="frequency">4</usg>
</form>
<form type="witnessed">
<orth type="original">v.B.</orth>
<orth type="normalized">V.B.</orth>
<usg type="frequency">1</usg>
</form>
</nym>
Attributes
name | namespace | description |
type | http://www.namescape.nl/ | Data type: textPossible values:person, location, organisation, misc |
gloss | http://www.namescape.nl/ | Data type: text |
resolution | http://www.namescape.nl/ | Data type: text |
Element p
Attributes
name | namespace | description |
id | http://www.politicalmashup.nl | Data type: |
numTokens | http://www.politicalmashup.nl | Data type: |
Element docStats
Namespace: http://www.politicalmashup.nl
Belongs to element classes
Content model
<rng:zeroOrMore>
<rng:choice>
<rng:ref name=”histogram” ></rng:ref>
<rng:ref name=”parTokensMedian” ></rng:ref>
<rng:ref name=”pagebreaks” ></rng:ref>
</rng:choice>
</rng:zeroOrMore>
Element histogram
Namespace: http://www.politicalmashup.nl
Content model
<rng:zeroOrMore>
<rng:element name=”entry” ns=”http://www.politicalmashup.nl” >
<rng:attribute name=”bin” ns=”http://www.politicalmashup.nl” >
<rng:data type=”integer” ></rng:data>
</rng:attribute>
<rng:attribute name=”count” ns=”http://www.politicalmashup.nl” >
<rng:data type=”integer” ></rng:data>
</rng:attribute>
</rng:element>
</rng:zeroOrMore>
Attributes
name | namespace | description |
description | http://www.politicalmashup.nl | Data type: |
Element parTokensMedian
Namespace: http://www.politicalmashup.nl
Content model
<rng:data type=”integer” ></rng:data>
Element pagebreaks
Namespace: http://www.politicalmashup.nl
Content model
<rng:data type=”integer” ></rng:data>
Element collection
Namespace: http://www.politicalmashup.nl
Belongs to element classes:
Content model
<rng:text></rng:text>
Element pseudonym-id
Namespace: http://www.politicalmashup.nl
Belongs to element classes:
Content model
<rng:text></rng:text>
Element cleanParagraphs
Namespace: http://www.politicalmashup.nl
Belongs to element classes:
Content model
<rng:text></rng:text>
Element genre
Namespace: http://www.politicalmashup.nl
Belongs to element classes:
Content model
<rng:text></rng:text>