A visual attention system includes the procedure of selecting the most interesting areas (known as salient regions) across visual information that humans receive in daily life. It is necessary to understand how different visual cues affect the human visual system to be able to measure the significance (i.e., saliency) of different regions of a frame. To this end, we designed an empirically based study to investigate bottom-up features including color, texture, and motion in video sequences to achieve a ranking system stating the saliency priority. In this work, we introduced a saliency detection model using a Bayesian framework for static scenes and considered the feature combination scenarios for dynamic scenes under conditions in which we had no cognitive bias. First, we modeled our test data as videos in a virtual environment to avoid any cognitive bias. Then, we performed an eye-tracking based experiment using human subjects to determine how colors, textures, motion directions, and motion speeds interact with each other to attract human attention. This work provides a benchmark to specify the most salient stimulus with comprehensive information for both static and dynamic scenes. The main goal of this work is to create the ability to assign a saliency priority for the entirety of an image/video frame rather than simply extracting a salient object/area which is widely performed in the state-of-the-art.

Bayesian Network, Bottom-up Features, Dynamic Scenes, Saliency Detection
2nd International Conference on Multimedia Information Processing and Retrieval, MIPR 2019
School of Information Technology

Hosseinkhani, J. (Jila), & Joslin, C. (2019). Saliency Priority Using Bottom-up Features for Static and Dynamic Scenes Without Cognitive Bias. In Proceedings - 2nd International Conference on Multimedia Information Processing and Retrieval, MIPR 2019 (pp. 189–192). doi:10.1109/MIPR.2019.00041