Two main devices were used for our data collection: the Insta 360 One X2 and Snapchat Spectacles 3 cameras. The 360 One X2 has two fish-eye cameras that collect 360◦ panoramic visual information in the scene with 5760*2880 resolution and a frame rate of 25 FPS. Additionally, directional audio was recorded using four microphones in directional audio mode. While the Spectacles 3 has a stereo camera attached to a pair of glasses used to capture the egocentric binocular vision within the scene at a resolution of 2432*1216 and a frame rate of 60 FPS.