Video Sequences

Click here to access annotated Quicktime movies of ASL sentences. (Click here to download a Quicktime player from Apple.) Uncompressed AVI files for most of these videos are available upon request (send e-mail to athitsos AT cs.bu.edu). Calibration sequences can be found here.

The signing was captured simultaneously from four different cameras, at a frame rate of 60 frames per second and at an image resolution of 648x484 (width x height). The Quicktime movies available on the web are downsampled versions of the original video sequences. In particular, the Quicktime movies are at 30 frames per second, and they have half the width and half the height of the original files, i.e. they are 312x242.

For each sequence recorded, there are up to four different viewpoints available. The filenames are in the format

field1_field2_small_number.format

The ID number of the sequence is always in field2, so for every sequence there are up to four files, which share the same ID in field2. The format extension is .avi for AVI files and .mov for Quicktime files.

Frame Format

The image on the right shows a sample frame from one of the sequences. As you can see, each frame contains two parts: The top part is the original image data, captured from one of the cameras. The bottom part (70 pixel rows) contains information about the frame, as follows:
First text row, from left to right:
Second text row, from left to right:
Third row:
Bottom 14 pixel rows

In the AVI files, which are all uncompressed, black pixels have RGB values set to 0, and white pixels have RGB values set to 255. In the Quicktime files the values may have been modified. However, the average intensity value in each square should make it obvious whether the square is supposed to be a "white" or a "black" square.

Text Stream Format

Each AVI file, in addition to the video stream, which is visible if you play the file using any AVI player, also contains a text stream. This stream should not be confused with the bottom 70 pixel rows of the frames in the video stream, although it contains the same information as those 70 pixel rows.

In the text stream, there is one frame for each frame in the video stream. So, the first text frame contains information about the first video frame and so on. Each frame contains 74 bytes. The information in those bytes is in the following format:

As you may have noted, there is no information in the text frame that is not visible in the bottom 70 rows of the video frame, except for the checksum. Text streams are added to the AVI files just to make it easier to access that information. People who download the Quicktime versions of the sequences have to recover all this information by decoding the bottom part of the frames. For the information in the text frame that is in binary, it is stored in PC byte order. To verify you are using the right byte order, simply check your values against what you see in the video frame.

For any comments, suggestions, questions, problems, please don't hesitate to contact Vassilis Athitsos (athitsos@cs.bu.edu).