Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2022 Vol.23 No.1 P.101-112

http://doi.org/10.1631/FITEE.2000318

EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum

Author(s): Yunzhan ZHOU, Tian FENG, Shihui SHUAI, Xiangdong LI, Lingyun SUN, Henry Been-Lirn DUH
Affiliation(s): 1. Department of Computer Science, Durham University, Durham DH1 3LE, UK more
Corresponding email(s): yunzhan.zhou@durham.ac.uk, t.feng@zju.edu.cn
Key Words: Visual attention, Virtual museums, Eye-tracking datasets, Gaze detection, Deep learning

Share this article to： More <<< Previous Article \|Next Article >>>

Yunzhan ZHOU, Tian FENG, Shihui SHUAI, Xiangdong LI, Lingyun SUN, Henry Been-Lirn DUH. EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(1): 101-112.

@article{title="EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum",
author="Yunzhan ZHOU, Tian FENG, Shihui SHUAI, Xiangdong LI, Lingyun SUN, Henry Been-Lirn DUH",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="1",
pages="101-112",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000318"
}

%0 Journal Article
%T EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum
%A Yunzhan ZHOU
%A Tian FENG
%A Shihui SHUAI
%A Xiangdong LI
%A Lingyun SUN
%A Henry Been-Lirn DUH
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 1
%P 101-112
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000318

TY - JOUR
T1 - EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum
A1 - Yunzhan ZHOU
A1 - Tian FENG
A1 - Shihui SHUAI
A1 - Xiangdong LI
A1 - Lingyun SUN
A1 - Henry Been-Lirn DUH
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 1
SP - 101
EP - 112
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000318

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience. Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases, and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective. We present the first 3D Eye-tracking Dataset for visual attention modeling in a virtual Museum, known as the EDVAM. In addition, a deep learning model is devised and tested with the EDVAM to predict a user's subsequent visual attention from previous eye movements. This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.

EDVAM：用于虚拟博物馆视觉注意建模的三维眼动数据集

周赟湛¹，冯天²，帅世辉³，厉向东⁴，孙凌云⁵，杜本麟²
¹杜伦大学计算机科学学院，英国杜伦市，DH1 3LE
²乐卓博大学计算机科学与信息技术学院，澳大利亚维多利亚州，3086
³阿里巴巴集团，中国杭州市，311121
⁴浙江大学数字媒体系，中国杭州市，310027
⁵浙江大学国际设计研究院，中国杭州市，310058
摘要：视觉注意预测能帮助建立适应性虚拟博物馆环境，提供上下文感知和交互式用户体验。目前，利用眼动数据探究视觉注意机制的研究仍限于二维场景。研究者尚未能从时间和空间的角度出发，在三维虚拟场景里研究这一问题。为此，我们构建了第一个用于虚拟博物馆视觉注意建模的三维眼动数据集，命名为EDVAM。我们还建立了一个深度学习模型，通过历史眼动轨迹预测用户未来的视觉注意区域，用于测试EDVAM。这项研究能为虚拟博物馆的视觉注意建模和上下文感知交互提供参考。

关键词：视觉注意；虚拟博物馆；眼动数据集；注视检测；深度学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Alers H, Redi JA, Heynderickx I, 2012. Examining the effect of task on viewing behavior in videos using saliency maps. Proc SPIE 8291, Human Vision and Electronic Imaging XVII, p.82910X. doi: 10.1117/12.907373

[2]Azmandian M, Hancock M, Benko H, et al., 2016. Haptic retargeting: dynamic repurposing of passive haptics for enhanced virtual reality experiences. Proc CHI Conf on Human Factors in Computing Systems, p.1968-1979. doi: 10.1145/2858036.2858226

[3]Barbieri L, Bruno F, Muzzupappa M, 2018. User-centered design of a virtual reality exhibit for archaeological museums. Int J Interact Des Manuf, 12(2):561-571. doi: 10.1007/s12008-017-0414-z

[4]Beer S, 2015. Digital heritage museums and virtual museums. Proc Virtual Reality Int Conf, p.1-4. doi: 10.1145/2806173.2806183

[5]Bruce NDB, Tsotsos JK, 2006. Saliency based on information maximization. Proc 18^th Int Conf on Neural Information Processing Systems, p.155-162.

[6]Carmi R, Itti L, 2006. Visual causes versus correlates of attentional selection in dynamic scenes. Vis Res, 46(26):4333-4345. doi: 10.1016/j.visres.2006.08.019

[7]Carrozzino M, Bergamasco M, 2010. Beyond virtual museums: experiencing immersive virtual reality in real museums. J Cult Herit, 11(4):452-458. doi: 10.1016/j.culher.2010.04.001

[8]Cerf M, Harel J, Einhäeuser W, et al., 2008. Predicting human gaze using low-level saliency combined with face detection. Proc 20^th Int Conf on Neural Information Processing Systems, p.241-248.

[9]Chen K, Zhou Y, Dai FY, 2015. A LSTM-based method for stock returns prediction: a case study of China stock market. Proc IEEE Int Conf on Big Data, p.2823-2824. doi: 10.1109/BigData.2015.7364089

[10]Ciolfi L, Damala A, Hornecker E, et al., 2015. Cultural heritage communities: technologies and challenges. Proc 7^th Int Conf on Communities and Technologies, p.149-152. doi: 10.1145/2768545.2768560

[11]Connor CE, Egeth HE, Yantis S, 2004. Visual attention: bottom-up versus top-down. Curr Biol, 14(19):R850-R852. doi: 10.1016/j.cub.2004.09.041

[12]David EJ, Gutiĺęrrez J, Coutrot A, et al., 2018. A dataset of head and eye movements for 360°videos. Proc 9^th ACM Multimedia Systems Conf, p.432-437. doi: 10.1145/3204949.3208139

[13]Davis MM, Gabbard JL, Bowman DA, et al., 2016. Depth-based 3D gesture multi-level radial menu for virtual object manipulation. Proc IEEE Virtual Reality, p.169-170. doi: 10.1109/VR.2016.7504707

[14]de Jesus Oliveira VA, Nedel L, Maciel A, 2016. Speaking haptics: proactive haptic articulation for intercommunication in virtual environments. Proc IEEE Virtual Reality, p.251-252. doi: 10.1109/VR.2016.7504748

[15]Eck D, Schmidhuber J, 2002. Finding temporal structure in music: blues improvisation with LSTM recurrent networks. Proc 12^th IEEE Workshop on Neural Networks for Signal Processing, p.747-756. doi: 10.1109/NNSP.2002.1030094

[16]Ehinger KA, Hidalgo-Sotelo B, Torralba A, et al., 2009. Modelling search for people in 900 scenes: a combined source model of eye guidance. Vis Cogn, 17(6-7):945-978. doi: 10.1080/13506280902834720

[17]Engelke U, Barkowsky M, Callet PL, et al., 2010. Modelling saliency awareness for objective video quality assessment. Proc 2^nd Int Workshop on Quality of Multimedia Experience, p.212-217. doi: 10.1109/QOMEX.2010.5516159

[18]Fan CL, Lee J, Lo WC, et al., 2017. Fixation prediction for 360° video streaming in head-mounted virtual reality. Proc 27^th Workshop on Network and Operating Systems Support for Digital Audio and Video, p.67-72. doi: 10.1145/3083165.3083180

[19]Fang YM, Zhang C, Li J, et al., 2016. Visual attention modeling for stereoscopic video. Proc IEEE Int Conf on Multimedia Expo Workshops, p.1-6. doi: 10.1109/ICMEW.2016.7574768

[20]Felnhofer A, Kothgassner OD, Beutl L, et al., 2012. Is virtual reality made for men only? Exploring gender differences. Proc Int Society for Presence Research Annual Conf, p.103-112.

[21]Fu HZ, Xu D, Lin S, 2017. Object-based Multiple Foreground Segmentation in RGBD Video. IEEE Trans Image Process, 26(3):1418-1427. doi: 10.1109/TIP.2017.2651369

[22]Gers FA, Schmidhuber J, Cummins F, 2000. Learning to forget: continual prediction with LSTM. Neur Comput, 12(10):2451-2471. doi: 10.1162/089976600300015015

[23]Hadizadeh H, Enriquez MJ, Bajic IV, 2012. Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process, 21(2):898-903. doi: 10.1109/TIP.2011.2165292

[24]Hirota K, Tagawa K, 2016. Interaction with virtual object using deformable hand. Proc IEEE Virtual Reality, p.49-56. doi: 10.1109/VR.2016.7504687

[25]Hou HT, Wu SY, Lin PC, et al., 2014. A blended mobile learning environment for museum learning. Edu Technol Soc, 17(2):207-218.

[26]Hou XD, Zhang LQ, 2007. Saliency detection: a spectral residual approach. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1-8. doi: 10.1109/CVPR.2007.383267

[27]Itti L, 2000. Models of Bottom-Up and Top-Down Visual Attention. PhD Thesis, California Institute of Technology, Pasadena, USA.

[28]Itti L, 2004. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process, 13(10):1304-1318. doi: 10.1109/TIP.2004.834657

[29]Itti L, Koch C, Niebur E, 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell, 20(11):1254-1259. doi: 10.1109/34.730558

[30]Jian MW, Dong JY, Ma J, 2011. Image retrieval using wavelet-based salient regions. Imag Sci J, 59(4):219-231. doi: 10.1179/136821910X12867873897355

[31]Judd T, Ehinger K, Durand F, et al., 2009. Learning to predict where humans look. Proc IEEE 12^th Int Conf on Computer Vision, p.2106-2113. doi: 10.1109/ICCV.2009.5459462

[32]Kadir T, Brady M, 2001. Saliency, scale and image description. Int J Comput Vis, 45(2):83-105. doi: 10.1023/A:1012460413855

[33]Kootstra G, de Boer B, Schomaker LRB, 2011. Predicting eye fixations on complex visual stimuli using local symmetry. Cogn Comput, 3(1):223-240. doi: 10.1007/s12559-010-9089-5

[34]Koskenranta O, Colley A, Häkkilä J, 2013. Portable CAVE using a mobile projector. Proc ACM Conf on Pervasive and Ubiquitous Computing Adjunct Publication, p.39-42. doi: 10.1145/2494091.2494102

[35]Kruthiventi SSS, Ayush K, Babu RV, 2017. DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans Image Process, 26(9):4446-4456. doi: 10.1109/TIP.2017.2710620

[36]Lang CY, Nguyen TV, Katti H, et al., 2012. Depth matters: influence of depth cues on visual saliency. Proc 12^th European Conf on Computer Vision, p.101-115. doi: 10.1007/978-3-642-33709-3_8

[37]LaViola JJ Jr, 2015. Context aware 3D gesture recognition for games and virtual reality. Proc ACM SIGGRAPH 2015 Courses, Article 10. doi: 10.1145/2776880.2792711

[38]LeCun Y, Bengio Y, Hinton G, 2015. Deep learning. Nature, 521(7553):436-444. doi: 10.1038/nature14539

[39]Li Y, Bengio S, Bailly G, 2018. Predicting human performance in vertical menu selection using deep learning. Proc CHI Conf on Human Factors in Computing Systems, p.1-7. doi: 10.1145/3173574.3173603

[40]Liu HT, Heynderickx I, 2009. Studying the added value of visual attention in objective image quality metrics based on eye movement data. Proc 16^th IEEE Int Conf on Image Processing, p.3097-3100. doi: 10.1109/ICIP.2009.5414466

[41]Lo WC, Fan CL, Lee J, et al., 2017. 360°video viewing dataset in head-mounted virtual reality. Proc 8^th ACM on Multimedia System Conf, p.211-216. doi: 10.1145/3083187.3083219

[42]Lopes P, You SJ, Cheng LP, et al., 2017. Providing haptics to walls & heavy objects in virtual reality by means of electrical muscle stimulation. Proc CHI Conf on Human Factors in Computing Systems, p.1471-1482. doi: 10.1145/3025453.3025600

[43]Mathe S, Sminchisescu C, 2012. Dynamic eye movement datasets and learnt saliency models for visual action recognition. Proc 12^th European Conf on Computer Vision, p.842-856. doi: 10.1007/978-3-642-33709-3_60

[44]Nielsen M, Toft C, Nilsson NC, et al., 2016. Evaluating two alternative walking in place interfaces for virtual reality gaming. Proc IEEE Virtual Reality, p.299-300. doi: 10.1109/VR.2016.7504772

[45]Pupil Labs, 2020. Pupil Labs Developer Documentation. https://docs.pupil-labs.com/developer/core/overview/ [Accessed on Sept. 27, 2020].

[46]Rai Y, Gutiérrez J, Le Callet P, 2017. A dataset of head and eye movements for 360 degree images. Proc 8^th ACM on Multimedia Systems Conf, p.205-210. doi: 10.1145/3083187.3083218

[47]Ramanathan S, Katti H, Sebe N, et al., 2010. An eye fixation database for saliency detection in images. Proc 11^th European Conf on Computer Vision, p.30-43. doi: 10.1007/978-3-642-15561-1_3

[48]Riche N, Mancas M, Culibrk D, et al., 2013. Dynamic saliency models and human attention: a comparative study on videos. Proc 11^th Asian Conf on Computer Vision, p.586-598. doi: 10.1007/978-3-642-37431-9_45

[49]Roth SD, 1982. Ray casting for modeling solids. Comput Graph Image Process, 18(2):109-144. doi: 10.1016/0146-664X(82)90169-1

[50]Shokoufandeh A, Marsic I, Dickinson SJ, 1999. View-based object recognition using saliency maps. Image Vis Comput, 17(5-6):445-460. doi: 10.1016/S0262-8856(98)00124-3

[51]Sitzmann V, Serrano A, Pavel A, et al., 2018. Saliency in VR: how do people explore virtual environments? IEEE Trans Vis Comput Graph, 24(4):1633-1642. doi: 10.1109/TVCG.2018.2793599

[52]Suma EA, Azmandian M, Grechkin T, et al., 2015. Making small spaces feel large: infinite walking in virtual reality. Proc ACM SIGGRAPH 2015 Emerging Technologies, p.16. doi: 10.1145/2782782.2792496

[53]Sun LY, Zhou YZ, Hansen P, et al., 2018. Cross-objects user interfaces for video interaction in virtual reality museum context. Multimed Tools Appl, 77(21):29013-29041. doi: 10.1007/s11042-018-6091-5

[54]Unity Technologies, 2019. Unity Documentation. https://docs.unity3d.com/ScriptReference/ [Accessed on Aug. 20, 2019].

[55]Winkler S, Subramanian R, 2013. Overview of eye tracking datasets. Proc 5^th Int Workshop on Quality of Multimedia Experience, p.212-217. doi: 10.1109/QoMEX.2013.6603239

[56]Xu PM, Ehinger KA, Zhang YD, et al., 2015. TurkerGaze: crowdsourcing saliency with webcam based eye tracking. https://arxiv.org/abs/1504.06755

[57]Zhao Q, Koch C, 2012. Learning visual saliency by combining feature maps in a nonlinear manner using AdaBoost. J Vis, 12(6):22. doi: 10.1167/12.6.22

[58]Zhou YZ, Feng T, Shuai SH, et al., 2019. An eye-tracking dataset for visual attention modelling in a virtual museum context. Proc 17^th Int Conf on Virtual-Reality Continuum and its Applications in Industry, Article 39. doi: 10.1145/3359997.3365738

[59]Zhu JY, Wu JJ, Xu Y, et al., 2015. Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans Patt Anal Mach Intell, 37(4):862-875. doi: 10.1109/TPAMI.2014.2353617

Open peer comments: Debate/Discuss/Question/Opinion

<1>