Contents

Notes

Software

ANC Tool

The ANC Tool is a Java program used to process the ANC files with standoff annotations into files with inline annotations--that is, with the annotations merged into the primary data and contained in a single file.

The ANC GrAF Tool can be downloaded in a number of formats

Note: Generating the Mac OS X and Windows executables from the Java jar files is still experimental. If the native executables appear to hang or quit unexpectedly see the instructions below for running the jar file from the command line with increased memory settings.

Output formats

The following output formats are supported.

Installation

Download the ANCTool and unzip the file to any convenient location.

Running

The ANCTool is an executable Java application that can be executed on most operating system by double clicking on the jar file. However, it is recommended that the jar file be started from the command line with the following command:

    java -Xmx500M -jar ANCTool-x.y.z-jar.jar

Where x.y.z is the version number (i.e. 1.2.6). The -Xmx500M option increases the amount of memory Java will make available to the ANC Tool. If the ANC Tool appears to hang or quits abruptly run the jar file from the command line and increase the memory size.

The first time you run the ANC Tool you will be asked to select the ANC home directory. This is the root directory of your ANC installation and should include (at least) the ANC's data directory.

Using

Additional information on using the ANC Tool can be found here.

Back to the top.


XCES Parser - A Java SAX (like) Parser

With the xces-parser package programmers and tool developers have an easy method to use the ANC files with existing XML tools. For example, the ANCTool above is simply a front end for the xces-parser that uses different org.sax.helpers.DefaultHandler implementations to achieve the desired output.

The classes in the Java package org.xces.standoff.parsers implement (most of) the SAXParserFactory, SAXParser, and XMLReader interfaces to permit SAX aware applications to process ANC documents with standoff annotations as if they were XML files; there is no need to preprocess the ANC into XML ahead of time. For example, the parser can be used with Saxon to apply XSLT stylesheets directly to an ANC document with standoff annoations.

The jar file can also be run from the command line to merge ANC documents into XML files with inline annotations. The usage is:

     java -jar xces-parser.jar -a -b -c [-milestone|-nest|-discard] [-utf8|-utf16|-ascii] [-offsets] filename.anc
	    

Where:

So, for example, the command:

     java -jar xces-parser.jar -s -hepple -np Chapter1.anc

would create an XML file that included the sentence annotations, Hepple part of speech tags, and the noun phrase annotations.

Back to the top.


Character Conversion

Download the Java jar file.

The ANC was processed on a dual processor Macintosh and currently the text files include Unix end of line characters. If you want to process the text files on a Windows system you may need to convert the end of line characters. This program will do a bulk conversion of the end of line characters on a directory of files. The program can also be used to convert between UTF-16 and UTF-8 character encodings at the same time.

Like the above programs the ANCFileConversion jar file can be run from the command line:

		java -jar ANCFileConversion.jar

Using

Select the input and output directories. You may type the directory paths directly into the input fields, or click on the button beside each field to navigate to, and select a directory.

Select the "File pattern filter". This should be left to the default (*.txt), but can be changed to process other types of files, including all the files in a directory. The ANCFileConversion utility does not process the contents of the files. For example, if changing the encoding of *.xml files it will not change the encoding attribute in the xml heading; that will need to be done separately.

Select the input and output character encodings and how end of line characters should be handled. If you select ASCII as the output encoding you will get garbage characters in the output since it is not possible to map all the characters used in the ANC into the 256 characters provided by 8-bit ASCII. Nor do we make any attempt to guess what the most appropriate replacement character would be for an un-mappable character.

Note: Performing the end of line character conversion will not be necessary for most applications. Conversion from UTF-16 to UTF-8 may be required depending on your application's handling of Unicode characters.

Back to the top.


GATE Tools

Download the Gate tools [zip | tgz]

GATE is an application framework (among other things) that was used to prepare the ANC. Therefore GATE is a natural companion to the ANC for many types of processing tasks. Several plugins are provided that allow GATE to work with ANC documents and load and save standoff annotations.

To install the plugins simply extract the contents of the GateTools archive into GATE's plugins folder, launch GATE and add the plugins with GATE's "Manage Plugins" dialog. See GATE's documentation for further information on adding custom plugins.

After installation you should find the following items available in GATE:

Documents

Processing Resources

There is no corresponding ANC Load Content processing resourse. Since the content of an ANC document is stored as a text file the content can be loaded in a normal GATE document (remember, the text files are UTF-16).

Using GATE

Detailed instructions on using the ANC resources in GATE can be found here.

Back to the top.


UIMA Tools

The ANC's Uima tools provide tools to convert ANC standoff annotation files (GrAF) to UIMA's Common Analysis Structure File (CAS), and to export annotations in UIMA CAS format to GrAF.

Instructions on using the UIMAUtils API and using the exectuable UIMAUtils.jar file can be found here.