This page shall serve as a short description of the XStandoff format. See Stührenberg and Goecke 2008 for an article describing XStandoff's ancestor, the Sekimo Generic Format and Stührenberg and Jettka 2009 for an article describing XStandoff together with its toolkit.
For demonstration purposes we use a simple example, the single sentence shown in the listing below:
The sun shines brighter.
We then apply two inline annotation levels to this simple example: one for the morpheme structure and a second for the syllable structure.
The inline annotation of the morpheme structure:
<?xml version="1.0" encoding="UTF-8"?> <morphemes xmlns="http://www.xstandoff.net/morphemes" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xstandoff.net/morphemes ../xsd/morphemes.xsd"> <m>The</m> <m>sun</m> <m>shine</m> <m>s</m> <m>bright</m> <m>er</m>. </morphemes>
This annotation can be easily rendered as the following graphic:

The inline annotation of the syllable structure:
<?xml version="1.0" encoding="UTF-8"?> <syllables xmlns="http://www.xstandoff.net/syllables" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xstandoff.net/syllables ../xsd/syllables.xsd"> <s>The</s> <s>sun</s> <s>shines</s> <s>brigh</s> <s>ter</s>. </syllables>
Again, this annotation can be rendered as a graphic:

When we try to combine these two annotation levels we get an overlapping structure which can be seen in the following figure:

These overlaps cannot be annotated in a single XML instance, but it is possible to use standoff annotation methods.
XStandoff (like other standoff formats) uses the character positions of the primary data to depict the positions where annotation elements occur:
T h e s u n s h i n e s b r i g h t e r . 00|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24
The character 'T' ranges from position 0 to 1, the character 'h' from 1 to 2, and so on. We will use this information when constructing an XSF instance. In addition, using character positions allows us for easily identify the overlapping positions, in this case at position 20 and 21 at which the letter 't' is stored.
We will now construct two XStandoff instances corresponding to the two example annotations. For converting inline annotations to the respective XStandoff instances we use the XStandoff Toolkit (see the download section), to be more specific, the inline2XSF XSLT stylesheet. The result can be seen in the next two listings.
The XStandoff representation of the morpheme structure:
<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd" xsfVersion="1.1"
xml:id="sentence_morphemes">
<xsf:primaryData start="0" end="24">
<xsf:primaryDataRef uri="../pd/sentence.txt" mimeType="text/plain"/>
</xsf:primaryData>
<xsf:segmentation>
<xsf:segment xml:id="seg1" type="char" start="0" end="24"/>
<xsf:segment xml:id="seg2" type="char" start="0" end="3"/>
<xsf:segment xml:id="seg3" type="char" start="4" end="7"/>
<xsf:segment xml:id="seg4" type="char" start="8" end="13"/>
<xsf:segment xml:id="seg5" type="char" start="13" end="14"/>
<xsf:segment xml:id="seg6" type="char" start="15" end="21"/>
<xsf:segment xml:id="seg7" type="char" start="21" end="23"/>
</xsf:segmentation>
<xsf:annotation>
<xsf:level xml:id="sentence_morphemes_l">
<xsf:layer xmlns:m="http://www.xstandoff.net/morphemes"
xsi:schemaLocation="http://www.xstandoff.net/morphemes ../xsd/morphemes.xsd"
priority="0">
<m:morphemes xsf:segment="seg1">
<m:m xsf:segment="seg2"/>
<m:m xsf:segment="seg3"/>
<m:m xsf:segment="seg4"/>
<m:m xsf:segment="seg5"/>
<m:m xsf:segment="seg6"/>
<m:m xsf:segment="seg7"/>
</m:morphemes>
</xsf:layer>
</xsf:level>
</xsf:annotation>
</xsf:corpusData>
The XStandoff representation of the syllable structure:
<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd" xsfVersion="1.1"
xml:id="sentence_syllables">
<xsf:primaryData start="0" end="24">
<xsf:primaryDataRef uri="../pd/sentence.txt" mimeType="text/plain"/>
</xsf:primaryData>
<xsf:segmentation>
<xsf:segment xml:id="seg1" type="char" start="0" end="24"/>
<xsf:segment xml:id="seg2" type="char" start="0" end="3"/>
<xsf:segment xml:id="seg3" type="char" start="4" end="7"/>
<xsf:segment xml:id="seg4" type="char" start="8" end="14"/>
<xsf:segment xml:id="seg5" type="char" start="15" end="20"/>
<xsf:segment xml:id="seg6" type="char" start="20" end="23"/>
</xsf:segmentation>
<xsf:annotation>
<xsf:level xml:id="sentence_syllables_s">
<xsf:layer xmlns:s="http://www.xstandoff.net/syllables"
xsi:schemaLocation="http://www.xstandoff.net/syllables ../xsd/syllables.xsd
priority="0">
<s:syllables xsf:segment="seg1">
<s:s xsf:segment="seg2"/>
<s:s xsf:segment="seg3"/>
<s:s xsf:segment="seg4"/>
<s:s xsf:segment="seg5"/>
<s:s xsf:segment="seg6"/>
</s:syllables>
</xsf:layer>
</xsf:level>
</xsf:annotation>
</xsf:corpusData>
In the next step, we combine these two XStandoff instances by means of the mergeXSF XSLT stylesheet. This is possible since both instances use the same primary data.
The resulting XStandoff instance:
<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd"
xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1" xml:id="c1" xsfVersion="1.1">
<xsf:primaryData start="0" end="24" xml:lang="en">
<textualContent>The sun shines brighter.</textualContent>
</xsf:primaryData>
<xsf:segmentation>
<xsf:segment xml:id="seg1" type="char" start="0" end="24"/>
<xsf:segment xml:id="seg2" type="char" start="0" end="3"/>
<xsf:segment xml:id="seg3" type="char" start="4" end="7"/>
<xsf:segment xml:id="seg4" type="char" start="8" end="14"/>
<xsf:segment xml:id="seg5" type="char" start="8" end="13"/>
<xsf:segment xml:id="seg6" type="char" start="13" end="14"/>
<xsf:segment xml:id="seg7" type="char" start="15" end="21"/>
<xsf:segment xml:id="seg8" type="char" start="15" end="20"/>
<xsf:segment xml:id="seg9" type="char" start="20" end="23"/>
<xsf:segment xml:id="seg10" type="char" start="21" end="23"/>
</xsf:segmentation>
<xsf:annotation>
<xsf:level xml:id="l_morph">
<xsf:layer xmlns:m="http://www.xstandoff.net/morphemes"
xsi:schemaLocation="http://www.xstandoff.net/morphemes ../xsd/morphemes.xsd" priority="0">
<m:morphemes xsf:segment="seg1">
<m:m xsf:segment="seg2"/>
<m:m xsf:segment="seg3"/>
<m:m xsf:segment="seg5"/>
<m:m xsf:segment="seg6"/>
<m:m xsf:segment="seg7"/>
<m:m xsf:segment="seg10"/>
</m:morphemes>
</xsf:layer>
</xsf:level>
<xsf:level xml:id="l_syll">
<xsf:layer xmlns:s="http://www.xstandoff.net/syllables"
xsi:schemaLocation="http://www.xstandoff.net/syllables ../xsd/syllables.xsd" priority="1">
<s:syllables xsf:segment="seg1">
<s:s xsf:segment="seg2"/>
<s:s xsf:segment="seg3"/>
<s:s xsf:segment="seg4"/>
<s:s xsf:segment="seg8"/>
<s:s xsf:segment="seg9"/>
</s:syllables>
</xsf:layer>
</xsf:level>
</xsf:annotation>
</xsf:corpusData>
As one can observe we end up with 10 instead of 13 (6+7) segment elements since some segments are shared between both annotation levels. XStandoff heavily makes use of XML's inherent ID/IDREF(S) mechanism to connect segments of the primary data with single or multiple annotation layer(s).
Each XStandoff instance can store several annotation levels (which itself can store several annotation layers, cf. the examples section).
Metadata can be applied at various locations throughout the document:
<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd"
xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1" xml:id="c1" xsfVersion="1.1">
<xsf:meta xmlns:olac="http://www.language-archives.org/OLAC/1.0/" xmlns="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/ ../xsd/meta/olac.xsd">
<olac:olac>
<creator>Maik Stührenberg</creator>
<date>2009-02-19</date>
<description>Example sentence "The sun shines brighter" annotated with morphemes and syllables.</description>
</olac:olac>
</xsf:meta>
<xsf:primaryData start="0" end="24" xml:lang="en">
<textualContent>The sun shines brighter.</textualContent>
</xsf:primaryData>
<xsf:segmentation>
<xsf:segment xml:id="seg1" type="char" start="0" end="24"/>
<xsf:segment xml:id="seg2" type="char" start="0" end="3"/>
<xsf:segment xml:id="seg3" type="char" start="4" end="7"/>
<xsf:segment xml:id="seg4" type="char" start="8" end="14"/>
<xsf:segment xml:id="seg5" type="char" start="8" end="13"/>
<xsf:segment xml:id="seg6" type="char" start="13" end="14"/>
<xsf:segment xml:id="seg7" type="char" start="15" end="21"/>
<xsf:segment xml:id="seg8" type="char" start="15" end="20"/>
<xsf:segment xml:id="seg9" type="char" start="20" end="23"/>
<xsf:segment xml:id="seg10" type="char" start="21" end="23"/>
</xsf:segmentation>
<xsf:annotation>
<xsf:level xml:id="l_morph">
<xsf:meta xmlns:olac="http://www.language-archives.org/OLAC/1.0/" xmlns="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/
../xsd/meta/olac.xsd">
<olac:olac>
<creator>Maik Stührenberg</creator>
<date>2009-02-19</date>
<description>Morphem annotation. Manually annotated.</description>
</olac:olac>
</xsf:meta>
<xsf:layer xmlns:m="http://www.xstandoff.net/morphemes"
xsi:schemaLocation="http://www.xstandoff.net/morphemes ../xsd/morphemes.xsd" priority="0">
<m:morphemes xsf:segment="seg1">
<m:m xsf:segment="seg2"/>
<m:m xsf:segment="seg3"/>
<m:m xsf:segment="seg5"/>
<m:m xsf:segment="seg6"/>
<m:m xsf:segment="seg7"/>
<m:m xsf:segment="seg10"/>
</m:morphemes>
</xsf:layer>
</xsf:level>
<xsf:level xml:id="l_syll">
<xsf:meta xmlns:olac="http://www.language-archives.org/OLAC/1.0/" xmlns="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/
../xsd/meta/olac.xsd">
<olac:olac>
<creator>Maik Stührenberg</creator>
<date>2009-02-19</date>
<description>Syllables annotation. Manually annotated.</description>
</olac:olac>
</xsf:meta>
<xsf:layer xmlns:s="http://www.xstandoff.net/syllables"
xsi:schemaLocation="http://www.xstandoff.net/syllables ../xsd/syllables.xsd" priority="1">
<s:syllables xsf:segment="seg1">
<s:s xsf:segment="seg2"/>
<s:s xsf:segment="seg3"/>
<s:s xsf:segment="seg4"/>
<s:s xsf:segment="seg8"/>
<s:s xsf:segment="seg9"/>
</s:syllables>
</xsf:layer>
</xsf:level>
</xsf:annotation>
</xsf:corpusData>
XStandoff's meta element is a wrapper element for metadata annotation defined in an external schema file (or without any schema definition at all), similar to the content of the layer element.
It is possible to use an XStandoff instance in two different ways: for storing a whole corpus or for storing single corpus entries together with a corpus definition file. See the following listing:
<corpus xmlns="http://www.XStandoff.net/2009/XStandoff/1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsf="http://www.XStandoff.net/2009/XStandoff/1.1">
<corpusData xml:id="c1" xsfVersion="1.1">
<!-- [...] -->
</corpusData>
<corpusData xml:id="c2" xsfVersion="1.1">
<!-- [...] -->
</corpusData>
</corpus>
In this case the root element of an XStandoff instance is not the corpusData element but the corpus element.
The most common way would be to use a set of files: one for the corpus definition and at least another one for storing single corpus entries (instances with corpusData root elements). The following listings shows the corpus definition file:
<corpus xmlns="http://www.XStandoff.net/2009/XStandoff/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsf="http://www.XStandoff.net/2009/XStandoff/1.1"> <corpusDataRef xml:id="c1" uri="c1.xml" mime-type="text/xml" encoding="UTF-8" xsfVersion="1.1"/> <corpusDataRef xml:id="c2" uri="c2.xml" mime-type="text/xml" encoding="UTF-8" xsfVersion="1.1"/> </corpus>
Note, that corpusDataRef elements are used instead of corpusData elements.
XStandoff can be used in several ways: in Stührenberg and Goecke 2008 we describe the use of an XStandoff annotated corpus for extracting possible antecedent candidates in the linguistic field of anaphora resolution. In Stührenberg and Jettka 2009 analyzing cross-level-relations is described. Other use cases were tested as well: for calculating Inter-Annotator Aggreement or comparing the quality of different annotation software (e.g., linguistic parsers). Have a look at the examples section to get an impression.