Specifications

A Summarized view of the AMAS and its Components Specifications

This section will describe all the different specificaations for the VR Application, Volumetric Telepresense and Specification Audio.

The VR Application focuses on providing a VR interface compatible with specific hardware, natural control inputs, adjustable scaling, remote real-world operation, and immersive bi-directional audio. The Volumetric Telepresence emphasizes flexible camera fusion, combined sensory inputs, HD colored 3D geometry, 3rd party camera support, and hardware acceleration, all coupled with bi-directional audio capabilities.

Specifications VR

Technical Specification Extend Robotics Advanced Mechanics Assistance System (AMAS) v6.3

FactorImpactupto 120 FPS (PC + Quest2) 72 FPS (Quest2 standalone)

Frame rate

Resolution refers to the number of pixels in the video content. The video resolution must be compatible with the resolution of the display device. Otherwise, the video resolution may decrease or even the video cannot be displayed. To achieve better VR video quality, a 4K or higher resolution is required, because the VR has a 360 panoramic display function, and the single-eye resolution of the VR panoramic display determines the VR image quality. The low resolution of the VR content is magnified during VR near-eye display.

120 FPS

Resolution

Resolution refers to the number of pixels in the video content. The video resolution must be compatible with the resolution of the display device. Otherwise, the video resolution may decrease or even the video cannot be displayed. To achieve better VR video quality, a 4K or higher resolution is required, because the VR has a 360 panoramic display function, and the single-eye resolution of the VR panoramic display determines the VR image quality. The low resolution of the VR content is magnified during VR near-eye display.

1832 x 1920 per eye (Quest 2)

FOV

The FOV measures the range of visual environments at any given time. A wider FOV makes users feel more immersive. Therefore, FOV is an important parameter that can evaluate the immersive experience creation capability of the VR device. In the model, FOV is considered as an important factor for the spatial influence module and occlusion influence module.

360 degree 3D

Motion to Photon Latency

The MTP refers to the response duration of VR content after user performs movement (e.g. turns head). The value of this indicator must be less than 20 ms for cloud VR games and less than 60 ms for cloud VR videos.

<10 ms (PC + Quest2)

Degree of Freedom

DOF indicates the mode in which an object can move in space. It is a key factor that helps users create an immersive environment.

6 DoF (head tracking) 6 DoF (per hand tracking)

Loading delay

For video services, the loading delay refers to the initial delay at startup. Generally, the delay must be less than 10 seconds to ensure user experience. For Cloud VR games, the loading delay refers to the delay from the time when a user chooses to play a game to the time when the game starts. Generally, the delay must be less than 3 seconds to ensure user experience.

1 second loading after logo screen

Stalling

Stalling is a key factor for users to perceive streaming media smoothness. During VR video playback, there is still a buffer zone with a certain amount of data. If no data is available in the buffer zone, stalling occurs, affecting user experience. Generally, stalling occurs because the download throughput cannot meet the video encoding quality requirements.

No

Freezing

Freezing is a phenomenon that a user perceives a pause of the game image, and is a key factor for evaluating game smoothness. If a key frame (such as the frame I) is discarded during VR gaming, the key factors of the image will be lost and the image cannot be displayed. As a result, freezing occurs, affecting user experience in VR gaming.

No

Tiling artifacts/mosaic

For cloud VR games, tiling artifacts/mosaic is a key factor for evaluating the game smoothness because users perceive mosaics in some areas of the game image. During VR gaming, if some video frame information (some block information in the video frame) is lost, the image can be displayed, but mosaics occur in some areas, affecting user experience of the VR game fluency.

No

Specification Volumetric Telepresence

Technical Specification Extend Robotics Advanced Mechanics Assistance System (AMAS ) v6.3

FactorImpactVolumetric Telepresence

Frame rate

The frame rate indicates how fast 3D real world model is updated from sensors data in VR. This should not be confused with VR update frame rate, which is higher to avoid motion sickness.

30 FPS

Resolution

Resolution refers to the number of pixels in the streamed content. This should not be confused with VR rendering resolution. The depth resolution is imortant for 3D reconstruction. The RGB resolution is important for texturing 3D reconstruction.

1280x720 RGB, 848x480 Depth (Realsense) 1280x720 RGB, 640x576 Depth (Kinect4A)

FOV

The Field Of View of single 3D sensor. This should not be confused with VR FOV.

60-90 degree 3D per camera (typically) 85°x58° depth 69°x42° RGB (Realsense D435) 87°x58° depth 87°x58° RGB (Realsense D455) 75°x65° depth 90°x59° RGB (Kinect4A) Other options are possible

Compression

Raw video and depth data transfer would require hundreads of MB/s link. Data compression is necessary in non-local applications. Compression affects latency and lossy compression affects data quality. Video compression is mature and we use standard video codecs. 3D compression is area of active research worldwide and we use unique ER's solutions.

RGB: H.264, MJPEG Depth: statistically lossless compression (accuracy-range-curve quantised lossless)

Glass to Glass Latency

Time difference measured from photon hitting the glass of RGBD camera, to the photon of the corresponding point cloud hitting the glass of player's headset. The value of this indicator must be less than 300 ms

< 150 ms (typical) during 3D streaming on local network from SenseKit to operator's screen 3D rendering. In case of WAN streaming (internet) network latency may be higher. Depends on sensor used.

Degree of Freedom

Maximum Degree of Freedom allow user to visualise the workspace captured by volumetric camera without distortion

6 DoF (volumetric viewing angle and location) Data quality from viewing angles differing from sensor may be limited.

Loading delay

For video services, the loading delay refers to the initial delay at startup. The delay must be less than 3 seconds to ensure user experience.

<1s

Bit rate

Bit rate refers to the number of audio or video bits transmitted or processed per time period. Bit rate is important factor in case network bandwidth ("speed") is limited. Bit rate is related to audio and video quality. A high resolution, high frame rate, or low compression usually increases the bit rate in the same encoding environment. In case of volumetric streaming most of the bandwidth requirments come from 3D data streaming.

5 - 20 Mbps per 3D sensor depending on settings

Situational awareness

How user is aware of the remote workspace and robot state

Mesh with colour texture; HD video;

Communication Protocol

Communication protocol between robot and AMAS appCommunication protocol between robot and AMAS app

Custom RGBD protocol (TCP/IP with framedrop in upper layer on congestion/limited bandwidth)

Specification Teleoperation

Technical Specification Extend Robotics Advanced Mechanics Assistance System (AMAS) V6.3

FactorImpactInteractive Digital Twin Control

Control Frame Rate

The frame rate indicates the frequency at which frame-based images are continuously displayed. For VR services it is critical for quick reaction to user's head motion. For Volumetric Telepresence it determines how fast the real world model is updated in VR. The frame rate of VR services is higher than that of common 2D video services. This is because the difference between user's real world motion and reflected VR motion is the main reason why users feel dizzy when using VR services.

50 Hz

Perception

Resolution refers to the number of pixels in the video content. The video resolution must be compatible with the resolution of the display device. Otherwise, the video resolution may decrease or even the video cannot be displayed. To achieve better VR video quality, a 4K or higher resolution is required, because the VR has a 360 panoramic display function, and the single-eye resolution of the VR panoramic display determines the VR image quality. The low resolution of the VR content is magnified during VR near-eye display.

6 DoF volumetric viewing angle (mesh with colour texture, HD video)

Motion to Motion Latency (round trip)

Time difference measured from player's arm control motion, to the motion of the corresponding point cloud hitting the glass of player's headset. The value of this indicator must be less than 500 ms.

< 200 ms (typical) during teleoperation on local network. In case of WAN streaming (internet) network latency may be higher. Dependency on sensekit used.

Degree of Freedom

DOF indicates supported digital twins allowed motion. It is a key factor that helps users create an immersive environment.

Per arm: 6 or 7 DoF end of arm pose output, 1 DoF end effector actuator output

Loading delay

For video services, the loading delay refers to the initial delay at startup. Generally, the delay must be less than 10 seconds to ensure user experience. For Cloud VR games, the loading delay refers to the delay from the time when a user chooses to play a game to the time when the game starts. Generally, the delay must be less than 3 seconds to ensure user experience.

<1s

Bit rate

Bit rate refers to the number of audio or video bits transmitted or processed per time period. Bit rate is a more common indicator for measuring audio and video quality. A high resolution, high frame rate, or low compression usually increases the bit rate in the same encoding environment. For VR services, bit rate is not a key indicator. However, it is a basic indicator that can ensure that VR services provide high-quality images.

5 - 20 Mbps per camera (robot uplink, user downlink)

Control method

What action does the user needs to perform to send control command to robot.

Gesture: Grasp and drag the end effector or any other interactive target (example, elbow or torso) of the robot simulation in VR

Situational awareness

How user be aware of the remote workspace and robot state

Mesh with HD colour texture; HD video;

3D Robot simulation in VR scene overlaid with the colourd point cloud in geometrically correct manner

Safety features

How the teleoperation system ensure the safety of the robot, environments and the user

Large motion detection in robot arm;

Inverse kinematics signularity detection;

User to detach their hand from the robot by default (user has to actively grasp to activate control of robot);

Color change in digital twin according to robot status.

Communication ProtocolAC

Communication protocol between robot and AMAS app

(ROS) Rossharp websocket (TCPIP)

Specification Audio

Technical Specification Extend Robotics Advanced Mechanics Assistance System (AMAS) V6.3

FactorImpactVolumetric Telepresence

Compression

Raw audio requires high bandwidth link. Data compression is necessary in non-local applications. Compression affects latency and lossy compression affects data quality. Audio compression is mature and has high quality open royalty free solution.

Opus Codec https://opus-codec.org/

Latency

Time difference measured from audio source to transmitted audio played through speaker. The value of this indicator must be less than 300 ms.

200 ms order during streaming on local network from SenseKit to PC using USB microphone and built in laptop speaker. In case of WAN streaming (internet) network latency may be higher. Depends on audio devices used.

Loading delay

For video services, the loading delay refers to the initial delay at startup. The delay must be less than 3 seconds to ensure user experience.

<1s

Bit rate

Bit rate refers to the number of audio or video bits transmitted or processed per time period. Bit rate is important factor in case network bandwidth ("speed") is limited. Bit rate is related to audio quality.

0.4 Mbps for bi-directional dual-channel audio (~ 0.1 Mbps per channel, 2x directions with 2x channels)

Situational awareness

How user is aware of the remote workspace and robot state

Stereo audio from AudioKit point of reference

Communication Protocol

Communication protocol between robot and AMAS app

RTP trasport, RTSP coordination