General

General
What is Callisto?

Callisto is an annotation tool developed to support linguistic annotation of textual sources for any Unicode-supported language. It is written in Java, and The initial development of the tool by the MITRE Corporation was funded by the U. S. government.

Where do I get help on Callisto?

This FAQ answers some frequent questions already. If you do not find the answer to your question here, consider joining the callisto-users-list and posting your question there. The Callisto developers are subscribed as well.

What System Requirements does Callisto have?

Callisto requires Java version 1.4 or better to run. Note there is an incompatability with java 1.5 for certain tables in some tasks (ace2004 in particular). If you are affected, please continue to use java 1.4 until it can be fixed.

SUN Microsystems provides: http://java.sun.com/j2se/1.4/download.html.

As of 2004-01-27, SUN's latest version is 1.4.2_03. When downloading, you want the J2SE JRE (Java 2 Standard Edition, Java Runtime Environment: See below).

Apple provides: http://www.apple.com/java/.

As of 2004-01-27, Apple's latest version is 1.4.1, Update 1. Requires Mac OS X v10.2.6 or later. Callisto is only minimally tested on Macintosh and is know to have problems.

IBM provides: http://www.ibm.com/developerworks/java/jdk/index.html.

As of 2004-01-27, IBM's latest version is 1.4.1. Callisto untested on IBM's implementation, and we welcome information on your experiences.

Sun's naming and visioning scheme is somewhat complex and confusing. There are three major "Editions" of the Java Platform available (ignore the "2": it refers to a major change in the Java architecture, not a release version):

  • Java 2 Micro Edition ("J2ME")
  • Java 2 Standard Edition ("J2SE")
  • Java 2 Enterprise Edition ("J2EE")

The latter two (J2SE and J2EE) have two variants: a Software Development Kit ("SDK" or "J2SDK") and a Java Runtime Environment ("JRE" or "J2RE"). The SDK was formerly called the Java Development Kit ("JDK").

If you're developing programs, you want the SDK. To just use Java programs you want the JRE. SUN explains the difference in greater detail here http://java.sun.com/j2se/overview.html.

Using Callisto

Using Callisto
Can I annotate files which are not UTF-8 encoded?

Yes, you can specify the character encoding (which defaults to UTF-8) of the signal file when opening or importing. If you choose the wrong encoding, you may see your text in the wrong font, or some characters will look meaningless (Though this can also be caused by using a font that does not have all the characters in the text) . You can re-read the file in a different encoding by selecting a different "Character Encoding" from the "Format" menu.

Why are tag offsets in my colleague's file wrong?

This is almost always caused by some program changing the new-line characters automatically, while exchanging the files.

The cause of the problem:

Different operating systems use different characters to represent "new-line": some use two characters, while others use only one. With stand-off annotation, if the data-files have the new-lines changed, the annotation-file must have all of it's offsets updated, or each annotation will be "off by one" for each preceding newline.

The following means are known to "auto-convert" files:

  • Using FTP to transfer files in ASCII mode
  • Using WinZip to un-zip archives (WinZip's default settings will convert automatically, though you can change that in it's preferences
  • Sending files as text attachments in e-mail. Many e-mail clients will convert when attaching and detaching.

How you can fix it

We've considered several means of automatically correcting the problem in Callisto. Unfortunately, without embedding the original data file in the standoff annotation file, it's impossible to automatically correct all problems.

That said, correction could be as easy as changing all new-lines to DOS or UNIX style. This can be done in several ways.

  • Good text editors can save using different new-lines (eg. emacs, EditPlus, JEdit, BBEdit)
  • There are several utilities that just change new-lines (eg. dos2unix, unix2dos, d2u, u2d)
  • If you have Perl, these one-liners will work:
    ConversionPerl script
    DOS to UNIXperl -i -pe 's/\x0d\x0a/\x0a/g' <filename>
    UNIX to DOSperl -i -pe 's/\x0a/\x0d\x0a/g' <filename>
    MAC to DOSperl -i -pe 's/\x0d/\x0d\x0a/g' <filename>

How you can prevent it

The Most reliable mechanism we have found is to use the "tar" (and optionally "gzip") utilities to archive and unpack files before transferring them. Windows users can get these command line tools with the cygwin tools.

Windows users can use WinZip if the preferences are corrected on the machine where they are unpacked. Open WinZip, and open the "Options->Configuration" menu. Under the "Miscellaneous" tab, in the "Other" group, un-check the "TAR file smart CR/LF conversion" option.