: Retrieving specific segments of a movie based on story beats.
: Classifying how a film was shot, such as scale or movement.
: 3,000 hours of video, 3.9 million photos, and 10 million text sentences.
: Tracking and identifying actors across different scenes.
MovieNet is the first comprehensive dataset that integrates multiple modalities—such as video, audio, and text—to help machines understand complex stories. It contains data from , featuring:
: Includes 1.1 million character bounding boxes with identities.
: Trailers, photos, subtitles, scripts, and plot descriptions all linked within the dataset. The Role of "Verified" Annotations
: 92,000 tags for cinematic styles (lighting, camera motion, view scale) and 65,000 tags for action and location.

: Retrieving specific segments of a movie based on story beats.
: Classifying how a film was shot, such as scale or movement.
: 3,000 hours of video, 3.9 million photos, and 10 million text sentences.
: Tracking and identifying actors across different scenes.
MovieNet is the first comprehensive dataset that integrates multiple modalities—such as video, audio, and text—to help machines understand complex stories. It contains data from , featuring:
: Includes 1.1 million character bounding boxes with identities.
: Trailers, photos, subtitles, scripts, and plot descriptions all linked within the dataset. The Role of "Verified" Annotations
: 92,000 tags for cinematic styles (lighting, camera motion, view scale) and 65,000 tags for action and location.