Many eye tracking studies rely on complex video stimuli and real world settings making analysis of the data highly complex. This analysis technique allows for a much richer and automated approach to analyzing video-based data than currently available methods leading to richer extraction of more complex data. This method could be used in many different eye tracking applications particularly in real world situations or those that use video as a stimulus.
Landscape studies have relied on understanding how people react to different visual stimuli. This technique combined with eye tracking could be used to test these assumptions. For this type of research, a team approach is essential for there are multiple aspects that require high level input and consideration.
Demonstrating the procedure with me will be my post graduate student Andrew Treller. The film sequences should be shown in an eye tracking laboratory in which natural light is available but which can be controlled to avoid reflections on the screen on as large a screen as possible to occupy as much of the visual field, thereby avoiding distractions from outside the field of view. After seating the participant 60-65 centimeters away from the screen, ask them to imagine being in need of restoration using a sentence that allows the participant to imagine in the context of the eye tracking video.
Then play the films for the participant in a predetermined random order using a desktop eye tracking device to record the participant's eye movements during each video. To design an area of interest, select items that are of interest to the study such as trees, shrubs, signposts, buildings, paths, steps. For optimal performance and minimal training requirements, use elements that are easily visually distinguishable from each other to the naked eye and/or that consistently occupy different regions of each video frame.
In general, including sufficient training examples depicting visually distinguishing differences of each AOI should be enough for a robust performance. When all of the items have been modified, select an appropriate number of training frames to make up the training set. There is no fixed number that is appropriate.
Next, open each training frame from the video in the image editing software and for each frame overlay a transparent image layer on the loaded image for labeling and create a color palette providing one color for each given object class of interest. To select the color for the sample area of interest, click and drag pixels within an area to color in a region of sample with the appropriate palette choice. Once the labeling of a frame is complete, export the overlaid layer as a separate image file taking care that the base filename matches the original frame base filename but with a C appended to the end.
To quantitatively validate the accuracy of the trained classifier, select frames from the original video sequence not already selected to be included in the training set and label the pixels in each frame as just demonstrated for the training frames being as precise and as comprehensive as possible. When the labeling of a frame is complete, use the same naming convention as for the training, saving the files in a separate validation frames folder. For automatic pixel labeling of the video sequence, launch the Darwin graphic user interface and click load training labels.
To configure the GUI for training and labeling, select create project and provide the project a name using the popup dialogue box. Select the folder containing all of the original frames of the video sequence in the popup window. Using the popup file explorer dialogue box, select the folder containing the labeled training images for the relevant video sequence.
And in the file explorer dialogue box, select the folder containing all of the labeled validation images for the relevant video sequence. Follow the prompt to select a destination folder for all of the output frames which will be in the form of labeled images using the same color palette as used in the training. Using the popup dialogue box, under areas of interest, enter the areas of interest to label including the red/green/blue values used to mark each region in the training examples.
The algorithm will examine each labeled training frame and learn a model of appearance for classifying the pixels into any of the specified object classes of interest. Once the training is complete, click validate training. And in the file explorer dialogue box, select the folder containing all of the labeled validation images for the relevant video sequence.
To visually validate the generated labels, click visual validation. Each generated labeled image will be displayed next to the original validation frame. If the accuracy observed in either the quantitative or qualitative validation falls below acceptable levels, include and retrain further training examples.
Once the classifier training and validation phases are complete, click run inference to begin the full labeling of all of the frames in the video sequence using the trained classifier. Once the labeling is complete, which may take several hours, click browse output to see the resulting labels. Most eye tracking software will show you that on average, participants scanned left and right on the x-coordinate of the video in the first video compared to the second video for which the heat map shows a rounder shape.
Using the machine learning pixel labeling technique described in this paper, we can see more detail. This graphical representation of the percent fixation time shows that the path is clearly visible during the course of the video. However, as this figure from the eye tracking data shows, the participant only looked at this feature occasionally at key points.
Here a summary of the dwell time of all 39 participants of this representative study when looking at objects throughout the length of the video is shown. In this graph, the same dwell time data was divided by the amount of time and space that different objects occupied in the video. A value of one indicates that the dwell time can be accounted for by the amount of object in the video.
For example, objects that were less pertinent such as the sky in both images were viewed comparatively less than other objects. Artificial objects such as street lamps and benches were dwelled on to a greater extent compared to other natural objects. These sorts of analysis can have many wide ranging uses to look into questions of attention and saliency and can have many broad reaching applications into different research areas.
As the use of short films as visual stimuli becomes increasingly common, we expect this technique to become more popular.