The ANC Tool is a Java program used to process the ANC files with standoff annotations into files with inline annotations--that is, with the annotations merged into the primary data and contained in a single file.
The ANC GrAF Tool can be downloaded in a number of formats
Note: Generating the Mac OS X and Windows executables from the Java jar files is still experimental. If the native executables appear to hang or quit unexpectedly see the instructions below for running the jar file from the command line with increased memory settings.
The following output formats are supported.
Download the ANCTool and unzip the file to any convenient location.
The ANCTool is an executable Java application that can be executed on most operating system by double clicking on the jar file. However, it is recommended that the jar file be started from the command line with the following command:
java -Xmx500M -jar ANCTool-x.y.z-jar.jar
Where x.y.z is the version number (i.e. 1.2.6). The -Xmx500M option increases the amount of memory Java will make available to the ANC Tool. If the ANC Tool appears to hang or quits abruptly run the jar file from the command line and increase the memory size.
The first time you run the ANC Tool you will be asked to select the ANC home directory. This is the root directory of your ANC installation and should include (at least) the ANC's data directory.
Additional information on using the ANC Tool can be found here.
Back to the top.
With the xces-parser package programmers and tool developers have an easy method to use the ANC files with existing XML tools. For example, the ANCTool above is simply a front end for the xces-parser that uses different org.sax.helpers.DefaultHandler implementations to achieve the desired output.
The classes in the Java package org.xces.standoff.parsers implement (most of) the SAXParserFactory, SAXParser, and XMLReader interfaces to permit SAX aware applications to process ANC documents with standoff annotations as if they were XML files; there is no need to preprocess the ANC into XML ahead of time. For example, the parser can be used with Saxon to apply XSLT stylesheets directly to an ANC document with standoff annoations.
The jar file can also be run from the command line to merge ANC documents into XML files with inline annotations. The usage is:
java -jar xces-parser.jar -a -b -c [-milestone|-nest|-discard] [-utf8|-utf16|-ascii] [-offsets] filename.anc
Where:
So, for example, the command:
java -jar xces-parser.jar -s -hepple -np Chapter1.anc
would create an XML file that included the sentence annotations, Hepple part of speech tags, and the noun phrase annotations.
Back to the top.
Download the Java jar file.
The ANC was processed on a dual processor Macintosh and currently the text files include Unix end of line characters. If you want to process the text files on a Windows system you may need to convert the end of line characters. This program will do a bulk conversion of the end of line characters on a directory of files. The program can also be used to convert between UTF-16 and UTF-8 character encodings at the same time.
Like the above programs the ANCFileConversion jar file can be run from the command line:
java -jar ANCFileConversion.jar
Select the input and output directories. You may type the directory paths directly into the input fields, or click on the button beside each field to navigate to, and select a directory.
Select the "File pattern filter". This should be left to the default (*.txt), but can be changed to process other types of files, including all the files in a directory. The ANCFileConversion utility does not process the contents of the files. For example, if changing the encoding of *.xml files it will not change the encoding attribute in the xml heading; that will need to be done separately.
Select the input and output character encodings and how end of line characters should be handled. If you select ASCII as the output encoding you will get garbage characters in the output since it is not possible to map all the characters used in the ANC into the 256 characters provided by 8-bit ASCII. Nor do we make any attempt to guess what the most appropriate replacement character would be for an un-mappable character.
Note: Performing the end of line character conversion will
not be necessary for most applications. Conversion from UTF-16 to
UTF-8 may be required depending on your application's handling of Unicode
characters.
Back to the top.
Download the Gate tools [zip | tgz]
GATE is an application framework (among other things) that was used to prepare the ANC. Therefore GATE is a natural companion to the ANC for many types of processing tasks. Several plugins are provided that allow GATE to work with ANC documents and load and save standoff annotations.
To install the plugins simply extract the contents of the GateTools archive into GATE's plugins folder, launch GATE and add the plugins with GATE's "Manage Plugins" dialog. See GATE's documentation for further information on adding custom plugins.
After installation you should find the following items available in GATE:
loadStandoff | : | A boolean indicating whether any standoff annotations should be loaded. Defaults to false. |
standoffASName | : | The name of the annotation set the standoff annotations will be added to. |
standoffAnnotations | : | A list of standoff annotation types to be loaded. |
document | : | The document the annotations will be added to |
sourceUrl | : | The URL of the standoff annotation file. |
standoffAsName | : | The annotation set to add the annotation to. |
document | : | The document containing the annotations to save. |
destination | : | The URL of the standoff annotation file to create. |
inputASName | : | Set containing the annotations to be saved. |
standoffTags | : | A list of annotation types to be saved. If this property is left empty all of the annotations in the input annotation set will be saved. |
encoding | : | The character encoding to be used when writing the standoff annotation file. Default is UTF-8 |
namespace | : | The namespace to use for the XML document. Defaults to http://www.xces.org/schema/2003 |
version | : | The value of the version attribute in the root element of the document. Defaults to 1.0 |
schemaLocation | : | If set will cause a xsi:schemaLocation attribute to be added to the root element of the standoff annotation file. |
document | : | The document to save. |
destination | : | The URL of the file to create. |
encoding | : | The character encoding to use when creating the file. Defaults to UTF-8. |
There is no corresponding ANC Load Content processing resourse. Since the content of an ANC document is stored as a text file the content can be loaded in a normal GATE document (remember, the text files are UTF-16).
Detailed instructions on using the ANC resources in GATE can be found here.
Back to the top.
The ANC's Uima tools provide tools to convert ANC standoff annotation files (GrAF) to UIMA's Common Analysis Structure File (CAS), and to export annotations in UIMA CAS format to GrAF.
Instructions on using the UIMAUtils API and using the exectuable UIMAUtils.jar file can be found here.