Video Sequences

Click here to access annotated Quicktime movies of ASL sentences. (Click here to download a Quicktime player from Apple.) Uncompressed AVI files for most of these videos are available upon request (send e-mail to athitsos AT cs.bu.edu). Calibration sequences can be found here.

The signing was captured simultaneously from four different cameras, at a frame rate of 60 frames per second and at an image resolution of 648x484 (width x height). The Quicktime movies available on the web are downsampled versions of the original video sequences. In particular, the Quicktime movies are at 30 frames per second, and they have half the width and half the height of the original files, i.e. they are 312x242.

For each sequence recorded, there are up to four different viewpoints available. The filenames are in the format

field1_field2_small_number.format

The ID number of the sequence is always in field2, so for every sequence there are up to four files, which share the same ID in field2. The format extension is .avi for AVI files and .mov for Quicktime files.

Frame Format

The image on the right shows a sample frame from one of the sequences. As you can see, each frame contains two parts: The top part is the original image data, captured from one of the cameras. The bottom part (70 pixel rows) contains information about the frame, as follows:

First text row, from left to right:

Date on which the frame was recorded.
Time on which the frame was recorded.
Second text row, from left to right:

Number of this frame in the sequence. Note that the first frame is number 1, and the frame numbers increase by 2 and not by 1. The reason is that, as mentioned earlier, these files are downsampled versions original sequences which were recorded at 60 frames per second.
Number of milliseconds from beginning of recording. This number will probably not be of much use to anyone outside our lab. Note that the first frame of the sequence is not necessarily recorded at time 0.
Third row:

Frame ID of this frame. This frame ID is guaranteed to be unique among all frames ever made available in this site. Two frames share the same frame ID if and only if they correspond to multiple viewpoints captured at the same instant. So, obviously, in each of the four AVI files that correspond to the four viewpoints of a sequence, the frame ID numbers are identical and in one-to-one correspondence.
Bottom 14 pixel rows

At the bottom 14 pixel rows of each frame you will notice some black and white squares. Each square has a size of 7x7 pixels. This is a binary encoding of the frame ID. We use that encoding to make it easy to automatically recover the frame ID of a frame using trivial computer vision techniques. The squares are lined up in two rows of 32 squares each. Each row starts at the left edge of the image and extends 224 pixels to the right (enough for 32 squares). A white square stands for the digit 1 and a black square stands for the digit 0. The least significant digit is at the bottom right square. Digit significance increases from right to left and from bottom to top. So, for example, the 32nd least significant digit is represented by the leftmost square at the bottom row of squares.

In the AVI files, which are all uncompressed, black pixels have RGB values set to 0, and white pixels have RGB values set to 255. In the Quicktime files the values may have been modified. However, the average intensity value in each square should make it obvious whether the square is supposed to be a "white" or a "black" square.

Text Stream Format

Each AVI file, in addition to the video stream, which is visible if you play the file using any AVI player, also contains a text stream. This stream should not be confused with the bottom 70 pixel rows of the frames in the video stream, although it contains the same information as those 70 pixel rows.

In the text stream, there is one frame for each frame in the video stream. So, the first text frame contains information about the first video frame and so on. Each frame contains 74 bytes. The information in those bytes is in the following format:

Bytes 0-19: The date and time the video frame was recorded, in text.
Bytes 20-34: The 32 most significant digits of the number of milliseconds since the beginning of the recording, in text. Note that the first frame was not necessarily recorded at time 0.
Bytes 35-49: The 32 least significant digits of the number of milliseconds since the beginning of the recording, in text.
Bytes 50-53: A checksum of the data in the top 242 rows of the video frame. 4 bytes, in binary.
Bytes 54-57: The time at which the frame was recorded, in binary, encoded as the number of seconds since the first second of the year 1970 (as returned by the standard C function "time").
Bytes 58-61: The 32 most significant digits of the number of milliseconds since the beginning of the recording, in binary.
Bytes 62-65: The 32 least significant digits of the number of milliseconds since the beginning of the recording, in binary.
Bytes 66-73: The frame id of the video frame, in binary.

As you may have noted, there is no information in the text frame that is not visible in the bottom 70 rows of the video frame, except for the checksum. Text streams are added to the AVI files just to make it easier to access that information. People who download the Quicktime versions of the sequences have to recover all this information by decoding the bottom part of the frames. For the information in the text frame that is in binary, it is stored in PC byte order. To verify you are using the right byte order, simply check your values against what you see in the video frame.

For any comments, suggestions, questions, problems, please don't hesitate to contact Vassilis Athitsos (athitsos@cs.bu.edu).