[PDF] UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Figures and Tables from this paper

figure 1
table 1
figure 2
table 2
figure 3
figure 4
figure 5

Topics

UCF101 (opens in a new tab)Playing Musical Instrument (opens in a new tab)HMDB51 (opens in a new tab)UCF50 (opens in a new tab)Action Classes (opens in a new tab)Action Recognition Datasets (opens in a new tab)Action Recognition (opens in a new tab)Action Recognition Method (opens in a new tab)Unconstrained Videos (opens in a new tab)Camera Motion (opens in a new tab)

5,401 Citations

CNN-LSTM Architecture for Action Recognition in Videos

Carlos Ismael OrozcoM. BuemiJ. J. Berlles

Computer Science

2019

A CNN–LSTM architecture where a pre-trained VGG16 convolutional neuronal networks extracts the features of the input video and a LSTM classifies the video in a particular class.

Spatial Attention Adapted to a LSTM Architecture with Frame Selection for Human Action Recognition in Videos

Carlos Ismael OrozcoM. BuemiJ. J. Berlles

Computer Science

LatinX in AI at International Conference on…

2021

This work proposes an attention mechanism adapted to a CNN–LSTM base architecture that can be used for action recognition in videos and evaluates the performance of the system using accuracy as the evaluation metric.

1
Highly Influenced
PDF

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

João CarreiraAndrew Zisserman

Computer Science

2017 IEEE Conference on Computer Vision and…

2017

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

7,014

[PDF]

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

Jiaming ZhouJunwei LiangKun-Yu LinJinrui YangWei-Shi Zheng

Computer Science

ArXiv

2024

A novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR that significantly outperforms the state-of-the-art on three popular ZSAR benchmarks (i.e., Kinetics-ZSAR, UCF101 and HMDB51) under two different learning protocols in ZSAR.

5
Highly Influenced

[PDF]

Human Action Recognition in Videos using a Robust CNN LSTM Approach

Carlos Ismael OrozcoEduardo XamenaM. BuemiJ. J. Berlles

Computer Science

Ciencia y Tecnología

2020

A CNN–LSTM architecture is implemented that first, a pre-trained VGG16 convolutional neural network extracts the features of the input video, then an LSTM classifies the video in a particular class.

The Kinetics Human Action Video Dataset

W. KayJoão Carreira Andrew Zisserman

Computer Science

ArXiv

2017

The dataset is described, the statistics are described, how it was collected, and some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset are given.

3,302

[PDF]

Video Action Transformer Network

Rohit GirdharJoão CarreiraCarl DoerschAndrew Zisserman

Computer Science

2019 IEEE/CVF Conference on Computer Vision and…

2019

The Action Transformer model for recognizing and localizing human actions in video clips is introduced and it is shown that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others.

[PDF]

TaiChi: A Fine-Grained Action Recognition Dataset

Shan SunFeng WangQi LiangLiang He

Computer Science

ICMR

2017

TaiChi consists of unconstrained user-uploaded web videos containing camera motion and partial occlusions which pose new challenges to fine-grained action recognition compared to the existing datasets.

10
Highly Influenced

Revisiting hand-crafted feature for action recognition: a set of improved dense trajectories

K. MatsuiToru TamakiGwladys AuffretB. RaytchevK. Kaneda

Computer Science

ArXiv

2017

Experimental results on the UCF50, UCF101, and HMDB51 action datasets demonstrate that TS is comparable to state-of-the-arts, and outperforms many other methods; for HMDB the accuracy of 85.4%, compared to the best accuracy obtained by a deep method.

[PDF]

A Study of Action Recognition Problems: Dataset and Architectures Perspectives

Bassel S. ChawkyA. S. ElonsA. AliHowida A. Shedeed

Computer Science

2018

Different action recognition datasets are explored to highlight their ability to evaluate different models, and a usage is proposed for each dataset based on the content and format of data it includes, the number of classes and challenges it covers.

...

13 References

HMDB: A large video database for human motion recognition

Hilde KuehneHueihan JhuangEstíbaliz GarroteT. PoggioThomas Serre

Computer Science

2011 International Conference on Computer Vision

2011

This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.

3,482
PDF

Recognizing realistic actions from videos “in the wild”

Jingen LiuJiebo LuoM. Shah

Computer Science

2009 IEEE Conference on Computer Vision and…

2009

This paper presents a systematic framework for recognizing realistic actions from videos “in the wild”, and uses motion statistics to acquire stable motion features and clean static features, and PageRank is used to mine the most informative static features.

1,035
PDF

Recognizing human actions: a local SVM approach

Christian SchüldtI. LaptevB. Caputo

Computer Science

Proceedings of the 17th International Conference…

2004

This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.

3,989
PDF

Actions in context

Marcin MarszalekI. LaptevC. Schmid

Computer Science

2009 IEEE Conference on Computer Vision and…

2009

This paper automatically discover relevant scene classes and their correlation with human actions, and shows how to learn selected scene classes from video without manual supervision and develops a joint framework for action and scene recognition and demonstrates improved recognition of both in natural video.

1,353
PDF

Action Recognition from Arbitrary Views using 3 D Exemplars

Daniel Weinland

Computer Science

2007

A new framework is proposed where actions are model actions using three dimensional occupancy grids, built from multiple viewpoints, in an exemplar-based HMM, where a 3D reconstruction is not required during the recognition phase, instead learned 3D exemplars are used to produce 2D image information that is compared to the observations.

Action Recognition from Arbitrary Views using 3D Exemplars

Daniel WeinlandEdmond BoyerRémi Ronfard

Computer Science

2007 IEEE 11th International Conference on…

2007

Actions as space-time shapes

M. BlankLena GorelickEli ShechtmanM. IraniR. Basri

Computer Science

Tenth IEEE International Conference on Computer…

2005

The method is fast, does not require video alignment and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.

2,316
PDF

Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

Juan Carlos NieblesChih-Wei ChenLi Fei-Fei

Computer Science

ECCV

2010

A framework for modeling motion by exploiting the temporal structure of the human activities, which represents activities as temporal compositions of motion segments, and shows that the algorithm performs better than other state of the art methods.

Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition

Mikel D. RodriguezJ. AhmedM. Shah

Computer Science

2008 IEEE Conference on Computer Vision and…

2008

This paper generalizes the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data, and analyzes the response of the filter in the frequency domain to avoid the high computational cost commonly incurred in template-based approaches.

1,322
PDF

Detecting Carried Objects in Short Video Sequences

D. DamenDavid C. Hogg

Computer Science

ECCV

2008

A new method for detecting objects such as bags carried by pedestrians depicted in short video sequences by comparing the temporal templates against view-specific exemplars generated offline for unencumbered pedestrians, which yields a segmentation of carried objects using the MAP solution.

...

Related Papers

Showing 1 through 3 of 0 Related Papers

[PDF] UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild | Semantic Scholar (2024)

Figures and Tables from this paper

Topics

5,401 Citations

13 References

Related Papers