In this first blog entry I will briefly explain the topic of my Master thesis. I started this thesis around a month ago and it is scheduled to take up around 6 months of my time.
The topic for my Master thesis is: "Action Recognition in Movies". The aim of this research is to learn realistic human actions in diverse and realistic video settings. Actions in this context can be kissing, answering a phone, getting out of a car, etc. You can think of many purposes for this kind of action recognition. For instance it would be possible to search for specific movies in which particular actions occur. Such a query could then return the video's together with the time of the action. Another use might be in surveillance, where particular actions, like fighting could be of interest. If these actions are detected automatically, it could save time and costs.
I will work together with my supervisor Ivo Everts to build the software for this recognition. The software will rely of previous work, combining it, hopefully resulting in a better performance.
The dataset that we will be using is the same dataset Ivan Laptev used in his work on action recognition in movies (Hollywood2). It consists of a large set of short video fragments from a variety of Hollywood movies. The dataset can be found on his website: http://www.irisa.fr/vista/actions/hollywood2/.
For previous work in this subject, you can read the following articles. These articles show that most work is done in detecting the context of a video or detecting actions in movies. I will be combining both methods, context and action recognition to try and build a more robust system.
- Video Google: A Text Retrieval Approach to Object Matching in Videos (2003)
J. Sivic and A. Zisserman; in ICCV, volume 2, Oxford, United Kingdom. - On Space-Time Interest Points (2005)
I. Laptev; in International Journal of Computer Vision, vol 64, number 2/3, pp.107-123. in International Journal of Computer Vision, vol 64, number 2/3, pp.107-123. - Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words (2006)
J. C. Niebles, H. Wang and L. Fei-Fei; in BMVC, Barranquilla, Colombia. - Retrieving actions in movies (2007)
I. Laptev and P. Pérez; in Proc. ICCV'07, Rio de Janeiro, Brazil. - Learning realistic human actions from movies (2008)
I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld; in Proc. CVPR'08, Anchorage, US. - Actions in Context (2009)
M. Marszałek, I. Laptev and C. Schmid; in Proc. CVPR'09, Miami, US.