Using the Namescape web application
The named entity tagging web application can be accessed at http://ner.namescape.nl/namescape/tagger
One can use the Namescape website to submit files, text or a URL to go through named entity recognition. To do so, several parameters must be set through the HTML form:
- The input format: Namescape supports several different input formats. This is most relevant with file uploads, but may also factor into URLs.
- Output type: Namescape can provide the result as a link to the tagged file, or output the resulting TEI structure immediately. Keep in mind that the INL does not guarantee persistence of the linked result files for more than a couple of hours.
- Tagger type: Namescape provides two taggers: the Namescape tagger and the Stanford tagger. Besides the fact that the two taggers use different algorithms, another difference is that the Namescape tagger tags parts of person names.
Styled output: results can be inspected visually by having the output TEI formatted by an XSLT stylesheet (output as “styled”).
The styled output will highlight named entities, display lists of entities found and offer a simple visualisation of entity cooccurrence.
Web service
Using the Namescape web service
Namescape can also be called as a REST webservice which returns responses in XML, allowing it to be part of a webservice tool chain. The URLs for calling the Namescape tagger are as follows:
· /namescape/text for tagging text.
· /namescape/url for tagging URLs.
· /namescape/file for tagging text.
Each of these expects a number of parameters:
- input: (mandatory) the text (as a string), URL (as a string) or files (as form/multipart file upload) to process.
- format: (mandatory) the format of the input. Supported values are:
- text: plain text
- html: HTML encoded text
- epub: EPUB e-book formatword: Microsoft word format (.doc)
- tei: TEI encoded text
- output: the format of the output. Supported values are:
- link: (default) tagger results are provided as links
- raw: tagger results are provided as TEI code directly
- tagger: the tagger to use. Supported values are:
- namescape: (default) Namescape tagger
- stanford: Stanford tagger
Output format
Tagged output is is returned as a TEI-encoded tokenized XML document. The document structure is documented in “Description of the curated resource” and “(GTB woordenboeken codering working paper)”.
Example invocations
Tagging plain text
http://SERVER:PORT/namescape/text?format=text&input=Amsterdam+is+een+plaats.
http://SERVER:PORT/namescape/text?format=text&input=Amsterdam+is+een+plaats.&output=raw
Tagging an HTML page at a specified URL
http://SERVER:PORT/namescape/url?format=html&input=http://www.nrc.nl&output=raw
http://SERVER:PORT/namescape/url?format=html&input=http://www.gutenberg.org/ebooks/10820.html.gen&output=styled
Uploading and tagging a TEI file using curl
FILE=zomer_in_kashmir.xml
SERVER=ner.namescape.nl
PORT=80
curl –form tagger=impact –form files=@$FILE –form output=raw –form format=tei http://$SERVER:$PORT/namescape/file
The underlying named entity taggers
We deploy two distinct named entity taggers trained on the Namescape training corpus. One is the Stanford tagger with default settings; the other is an SVM-based tagger which is set up to use some information from the main target corpus (corpus Sanders).