Video Sequences
Click here to
access annotated Quicktime movies of ASL sentences. (Click here to download a Quicktime player from
Apple.) Uncompressed AVI files for most of these videos are available
upon request (send e-mail to athitsos AT cs.bu.edu). Calibration sequences can
be found here.
The signing was captured simultaneously from four different cameras,
at a frame rate of 60 frames per second and at an image resolution of
648x484 (width x height). The Quicktime movies available on the web
are downsampled versions of the original video sequences. In
particular, the Quicktime movies are at 30 frames per second, and they
have half the width and half the height of the original files,
i.e. they are 312x242.
For each sequence recorded, there are up to four different viewpoints
available. The filenames are in the format
field1_field2_small_number.format
The ID number of the
sequence is always in field2, so for every sequence there are up to
four files, which share the same ID in field2. The format extension is
.avi for AVI files and .mov for Quicktime files.
Frame Format
The image on the right shows a sample frame from one of the
sequences. As you can see, each frame contains two parts: The top part
is the original image data, captured from one of the cameras. The
bottom part (70 pixel rows) contains information about the frame, as
follows:
- First text row, from left to right:
- Date on which the frame was recorded.
- Time on which the frame was recorded.
- Second text row, from left to right:
- Number of this frame in the sequence. Note that the first frame is
number 1, and the frame numbers increase by 2 and not by 1. The reason
is that, as mentioned earlier, these files are downsampled versions
original sequences which were recorded at 60 frames per second.
-
Number of milliseconds from beginning of recording. This number will
probably not be of much use to anyone outside our lab. Note that the
first frame of the sequence is not necessarily recorded at time 0.
- Third row:
- Frame ID of this frame. This frame ID is guaranteed to be unique
among all frames ever made available in this site. Two frames share
the same frame ID if and only if they correspond to multiple
viewpoints captured at the same instant. So, obviously, in each of
the four AVI files that correspond to the four viewpoints of a sequence,
the frame ID numbers are identical and in one-to-one correspondence.
- Bottom 14 pixel rows
- At the bottom 14 pixel rows of each frame you will notice some
black and white squares. Each square has a size of 7x7 pixels. This is
a binary encoding of the frame ID. We use that encoding to make it
easy to automatically recover the frame ID of a frame using trivial
computer vision techniques. The squares are lined up in two rows of 32
squares each. Each row starts at the left edge of the image and
extends 224 pixels to the right (enough for 32 squares). A white
square stands for the digit 1 and a black square stands for the digit
0. The least significant digit is at the bottom right square. Digit
significance increases from right to left and from bottom to top. So,
for example, the 32nd least significant digit is represented by the
leftmost square at the bottom row of squares.
In the AVI files, which are all uncompressed, black pixels have RGB values
set to 0, and white pixels have RGB values set to 255. In the
Quicktime files the values may have been modified. However, the
average intensity value in each square should make it obvious whether
the square is supposed to be a "white" or a "black" square.
Text Stream Format
Each AVI file, in addition to the video stream, which is visible if
you play the file using any AVI player, also contains a text
stream. This stream should not be confused with the bottom 70 pixel
rows of the frames in the video stream, although it contains the same
information as those 70 pixel rows.
In the text stream, there is one frame for each frame in the video
stream. So, the first text frame contains information about the first
video frame and so on. Each frame contains 74 bytes. The information
in those bytes is in the following format:
-
Bytes 0-19: The date and time the video frame was recorded, in text.
-
Bytes 20-34: The 32 most significant digits of the number of
milliseconds since the beginning of the recording, in text. Note that
the first frame was not necessarily recorded at time 0.
-
Bytes 35-49: The 32 least significant digits of the number of
milliseconds since the beginning of the recording, in text.
-
Bytes 50-53: A checksum of the data in the top 242 rows of the
video frame. 4 bytes, in binary.
-
Bytes 54-57: The time at which the frame was recorded, in binary,
encoded as the number of seconds since the first second of the year
1970 (as returned by the standard C function "time").
-
Bytes 58-61: The 32 most significant digits of the number of
milliseconds since the beginning of the recording, in binary.
-
Bytes 62-65: The 32 least significant digits of the number of
milliseconds since the beginning of the recording, in binary.
-
Bytes 66-73: The frame id of the video frame, in binary.
As you may have noted, there is no information in the text frame that
is not visible in the bottom 70 rows of the video frame, except for
the checksum. Text streams are added to the AVI files just to make it
easier to access that information. People who download the Quicktime
versions of the sequences have to recover all this information by
decoding the bottom part of the frames. For the information in the
text frame that is in binary, it is stored in PC byte order. To verify
you are using the right byte order, simply check your values against
what you see in the video frame.
For any comments, suggestions, questions, problems, please don't
hesitate to contact
Vassilis Athitsos (athitsos@cs.bu.edu).