Graham Mann, Indulis Bernsteins
DIMEA 2007
Action recognition is an important problem in multimedia under- standing. This paper addresses this problem by building an expres- sive compositional action model. We model one action instance in the video with an ensemble of spatio-temporal compositions: A number of discrete temporal anchor frames, each of which is fur- Ther decomposed to a layout of deformable parts. In this way, our model can identify a Spatio-Temporal And-Or Graph (STAOG) to represent the latent structure of actions e.g. triple jumping, swing- ing and high jumping. The STAOG model comprises four layers: (i) a batch of leaf-nodes in bottom for detecting various action part- s within video patches; (ii) the or-nodes over bottom, i.e. switch variables to activate their children leaf-nodes for structural variabil- ity; (iii) the and-nodes within an anchor frame for verifying spatial composition; and (iv) the root-node at top for aggregating scores over temporal anchor frames. Moreover, the contextual interac- Tions are defined between leaf-nodes in both spatial and temporal domains. For model training, we develop a novel weakly super- vised learning algorithm which iteratively determines the structural configuration (e.g. the production of leaf-nodes associated with the or-nodes) along with the optimization of multi-layer parameters. By fully exploiting spatio-temporal compositions and interactions, our approach handles well large intra-class action variance (e.g. d- ifferent views, individual appearances, spatio-temporal structures). The experimental results on the challenging databases demonstrate superior performance of our approach over other methods. Copyright © 2013 ACM.
Graham Mann, Indulis Bernsteins
DIMEA 2007
Amit Anil Nanavati, Nitendra Rajput, et al.
MobileHCI 2011
Amol Thakkar, Andrea Antonia Byekwaso, et al.
ACS Fall 2022
Dimitrios Christofidellis, Giorgio Giannone, et al.
MRS Spring Meeting 2023