The Namescape NER taggers

Using the Namescape web application

The named entity tagging web application can be accessed at  http://ner.namescape.nl/namescape/tagger

One can use the Namescape website to submit files, text or a URL to go through named entity recognition. To do so, several parameters must be set through the HTML form:

  • The input format: Namescape supports several different input formats. This is most relevant with file uploads, but may also factor into URLs.
  • Output type: Namescape can provide the result as a link to the tagged file, or output the resulting TEI structure immediately. Keep in mind that the INL does not guarantee persistence of the linked result files for more than a couple of hours.
  • Tagger type: Namescape provides two taggers: the Namescape tagger and the Stanford tagger. Besides the fact that the two taggers use different algorithms, another difference is that the Namescape tagger tags parts of person names.

Styled output: results can be inspected visually by having the output TEI formatted by an XSLT stylesheet (output as “styled”).

The styled output will highlight named entities, display lists of entities found and offer a simple visualisation of entity cooccurrence.

Web service

Using the Namescape web service

Namescape can also be called as a REST webservice which returns responses in XML, allowing it to be part of a webservice tool chain. The URLs for calling the Namescape tagger are as follows:

·         /namescape/text for tagging text.

·         /namescape/url for tagging URLs.

·         /namescape/file for tagging text.

Each of these expects a number of parameters:

  • input: (mandatory) the text (as a string), URL (as a string) or files (as form/multipart file upload) to process.
  • format: (mandatory) the format of the input. Supported values are:
    • text: plain text
    • html: HTML encoded text
    •  epub: EPUB e-book formatword: Microsoft word format (.doc)
    • tei: TEI encoded text
  • output: the format of the output. Supported values are:
    • link: (default) tagger results are provided as links
    • raw: tagger results are provided as TEI code directly
  •  tagger: the tagger to use. Supported values are:
    • namescape: (default) Namescape tagger
    • stanford: Stanford tagger

Output format

 

Tagged output is  is returned as a TEI-encoded tokenized XML document. The document structure is documented in “Description of the curated resource” and “(GTB woordenboeken codering working paper)”.

Example invocations

Tagging plain text

http://SERVER:PORT/namescape/text?format=text&input=Amsterdam+is+een+plaats.

http://SERVER:PORT/namescape/text?format=text&input=Amsterdam+is+een+plaats.&output=raw

 

Tagging an HTML page at a specified URL

 

http://SERVER:PORT/namescape/url?format=html&input=http://www.nrc.nl&output=raw

http://SERVER:PORT/namescape/url?format=html&input=http://www.gutenberg.org/ebooks/10820.html.gen&output=styled

Uploading and tagging a TEI file using curl

FILE=zomer_in_kashmir.xml
SERVER=ner.namescape.nl
PORT=80
curl –form tagger=impact –form files=@$FILE  –form output=raw –form format=tei http://$SERVER:$PORT/namescape/file

The underlying named entity taggers

We deploy two distinct named entity taggers trained on the Namescape training corpus. One is the Stanford tagger with default settings; the other is an SVM-based tagger which is set up to use some information from the main target corpus (corpus Sanders).