After a few weeks of reading to get familiar with the project I am working on, it is finally time to start working on the software. Ivo already retrieved the necessary Space-Time Interest Points (STIP) using Ivo Laptev's software. I will be using these descriptors to build a visual vocabulary for the video fragments. This method is best explained in the article by Josef Sivic and Andrew Zisserman: Video Google: A Text Retrieval Approach to Object Matching in Videos.
Basically all Space-Time descriptors from all video fragments are put together in a bag. A subset of these descriptors is clustered using k-means. Based on these clusters, a histogram (the so called visual word) can be created for each video sample. It is created by taking each descriptor on the sample and accumulating the Euclidean distances to all cluster centers.
As the Space-Time descriptors are based on changes in time, a lot of descriptors will be found when going from one scene to another (so called shot boundaries). As the descriptors on these boundaries will not be relevant for action detection, they will have to be filtered. Luckily, the Hollywood2 dataset contains the frame numbers for each shot boundary in the video fragments. As each descriptor lies on several frames, some code is required to find all the ones that need filtering.
Yesterday I have created the code for this filtering. It is written in Matlab, which will be the main programming tool for this project. As I want to show off the syntax coloring on my website, I will post a small piece of the code I created.
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | % Get the frame numbers of the descriptors frameNums = descStruct.t; % Get the start and end frame, using tau2 frameStart = descStruct.t - descStruct.tau2; frameEnd = descStruct.t + descStruct.tau2; n_descriptors = size(frameNums, 1); n_boundaries = size(boundaries, 2); % Make sure all matrices are the same size frameStartAll = repmat(frameStart, [1 n_boundaries]); frameEndAll = repmat(frameEnd, [1 n_boundaries]); boundariesAll = repmat(boundaries, [n_descriptors 1]); % Substract the boundaries from the framestart, if the desciptor % has been found on a boundary frame, start and end will be from % positive to negative or one of them is 0 frameStartFinal = frameStartAll - boundariesAll; frameEndFinal = frameEndAll - boundariesAll; % Find the descriptors that do not lie on the boundary indices = (sum((frameStartFinal .* frameEndFinal <= 0), 2) == 0); |
This code shows the filtering process. It first determines the frame start and end of the descriptor. Next, it substracts the frame numbers of the shot boundaries. Lastly, every descriptor that has a start lower or equal to zero and an end higher or equal to zero will be a descriptor that has to be filtered.