Intel® RealSense™ R200 Camera
The Intel RealSense R200 camera is an active stereo camera with a 70-mm stereo baseline.
Indoors, the Intel RealSense camera R200 uses a class-1 laser device to project additional texture into a scene for better stereo performance. The Intel RealSense R200 camera works in disparity space, where disparity space is defined as the spatial shift in the same 3D point in space between the left and right laser. The larger the shift in the horizontal plane, the closer the object (depth is inversely proportional to the disparity). You can simulate the same effect by holding your thumb at eye level and looking at it one eye at a time.
The Intel RealSense R200 camera has a maximum search range of 63 pixels horizontally, with a result of a 72-cm minimum depth distance at the nominal 628×468 resolution. At 320×240, the minimum depth distance reduces to 32 cm. The laser texture from multiple Intel RealSense camera R200 devices produces constructive interference. Constructive interference is the interference pattern of equal frequency and phase, resulting in their mutual reinforcement and producing a single amplitude equal to the sum of the amplitudes of the individual waves. This results in the feature that multiple Intel RealSense R200 cameras can be collocated in the same environment. The dual IR cameras are global shutter (where every pixel is exposed simultaneously at the same time) while 1080p RGB imager is rolling shutter (in that each row in a frame will expose for the same amount of time but begin exposing at a different point in time, allowing overlapping exposures for two frames). An internal clock triggers all three image sensors as a group and the Intel® RealSense™ Cross Platform API provides matched frame sets.
Outdoors, the laser has no effect over ambient infrared from the sun. Furthermore, at default settings, IR sensors can become oversaturated in a fully sunlit environment so gain/exposure/frames-per-second tuning might be required. The recommended indoor depth range is around 3.5 m.
Depth Projections
Mapping from 2D pixel coordinates to 3D point coordinates via the rs_intrinsics
structure and the rs_deproject_pixel_to_point
(...) function requires knowledge of the depth of that pixel in meters. Certain pixel formats exposed by librealsense
contain per-pixel depth information and can be immediately used with this function. Other images do not contain per-pixel depth information and thus would typically be projected into instead of deprojected (reversing the projection of depth into the scene) from.
RS_FORMAT_Z16
orrs::format::z16
- Depth is stored as one unsigned 16-bit integer per pixel mapped linearly to depth in camera-specific units. The distance, in meters, corresponding to one integer increment in depth values can be queried via
rs_get_device_depth_scale
(...). The following pseudocode shows how to retrieve the depth of a pixel in meters:const float scale = rs_get_device_depth_scale(dev, NULL);
const uint16_t * image = (const uint16_t *)rs_get_frame_data(dev, RS_STREAM_DEPTH, NULL);
float depth_in_meters = scale * image[pixel_index];
- If a device fails to determine the depth of a given image pixel, a value of zero will be stored in the depth image. This is a reasonable sentinel for "no depth" because all pixels with a depth of zero would correspond to the same physical location, the location of the imager itself.
- The default scale (the smallest unit of precision attainable by the device) of an Intel RealSense camera (F200) or Intel RealSense camera (SR300) device is 1/32th of a millimeter. Allowing for 16 bits (or 2 to the power 16 = 65536) of units translates to a maximum expressive range of two meters. However, the scale is encoded into the camera's calibration information, potentially allowing for long-range models to use a different scaling factor.
- The default scale of an Intel RealSense camera (R200) device is one millimeter, allowing for a maximum expressive range of ~65 meters. The depth scale can be modified by calling
rs_set_device_option
(...) withRS_OPTION_R200_DEPTH_UNITS
, which specifies the number of micrometers per one increment of depth. 1000 would indicate millimeter scale, 10000 would indicate centimeter scale, while 31 would roughly approximate the Intel RealSense camera (F200) 1/32th of a millimeter scale.
- Depth is stored as one unsigned 16-bit integer per pixel mapped linearly to depth in camera-specific units. The distance, in meters, corresponding to one integer increment in depth values can be queried via
RS_FORMAT_DISPARITY16
orrs::format::disparity16
- Depth is stored as one unsigned 16-bit integer, as a fixed point representation of pixels of disparity. Stereo disparity is related to depth via an inverse linear relationship, and the distance of a point which registers a disparity of 1 can be queried via
rs_get_device_depth_scale
(...). The following pseudocode shows how to retrieve the depth of a pixel in meters:const float scale = rs_get_device_depth_scale(dev, NULL);
const uint16_t * image = (const uint16_t *)rs_get_frame_data(dev, RS_STREAM_DEPTH, NULL);
float depth_in_meters = scale / image[pixel_index];
- Unlike
RS_FORMAT_Z16
, a disparity value of zero is meaningful. A stereo match with zero disparity will occur for objects "at infinity," objects that are so far away that the parallax between the two imagers is negligible. By contrast, there is a maximum possible disparity. The Intel RealSense camera (R200) only matches up to 63 pixels of disparity in hardware, and even if a software stereo search were run on an image, you would never see a disparity greater than the total width of the stereo image. Therefore, when the device fails to find a stereo match for a given pixel, a value of0xFFFF
will be stored in the depth image as a sentinel. - Disparity is currently only available on the Intel RealSense camera (R200), which by default uses a ratio of 32 units in the disparity map to one pixel of disparity. The ratio of disparity units to pixels of disparity can be modified by calling
rs_set_device_option
(...) withRS_OPTION_R200_DISPARITY_MULTIPLIER
. For instance, setting it to 100 would indicate that 100 units in the disparity map are equivalent to one pixel of disparity.
- Depth is stored as one unsigned 16-bit integer, as a fixed point representation of pixels of disparity. Stereo disparity is related to depth via an inverse linear relationship, and the distance of a point which registers a disparity of 1 can be queried via
Depth Calculation
Since optical axes are always parallel and focal lengths are the same, the Intel RealSense camera (R200) internal circuitry determines d (disparity) based on the stereo baseline (B) and the focal length (f). “Xl” and “Xr” are the shifts that the left and right cameras see. (Xl,Yl) and (Xr,Yr) are the corresponding image points. Imagine the Y axis is perpendicular to the image toward you. Disparity is (Xl-Xr).
Using the concept of triangulation, we can now obtain depth:
Depth (Z) = (baseline * focal length)/disparity
Conclusion
The Intel RealSense R200 depth-camera provides depth data as a 16-bit number that can be easily converted to canonical distances measured in meters on a per-pixel level. As such, it is possible to extract scene information by any number of algorithms beyond those provided by RGB data alone. Thus it is possible to combine the RGB pixels and depth pixels together to produce point-clouds that represent a sampling in 3D of the scene the camera looking at.
References
Projection and Deprojection in librealsense: https://github.com/IntelRealSense/librealsense/blob/master/doc/projection.md
Camera specifications: https://github.com/IntelRealSense/librealsense/blob/master/doc/camera_specs.md
Camera Calibration and 3D Reconstruction: http://docs.opencv.org/3.1.0/d9/d0c/group__calib3d.html#gsc.tab=0