BlenDR Fusion | Joon Ha Kim

Introduction:

After our first rejection at SIGCOMM, we decided to continue our work by reflecting on the reviewers’ comments. The reviewers mostly commented on the missed opportunity of multi-view fusion, which provides 6DoF, a denser set of points, and thus more content. Although the rejection wasn’t what I wished for, it opened up another opportunity for me to conduct research on my own. I made plans and decisions myself while Dr. Jaehong Kim guided me to the right direction. This page aims to delineate the contributions I’ve made to the project. The following video is the final outcome which shows the full end-to-end implementation of BlenDR with multi-view fusion.

Review of BlenDR’s Structure (and their significance)

Sender Side: The sender side’s jobs (depth filling and packing) are important components that provide reasons to why efficient 2.5D/3D streaming is possible with BlenDR. Because we use 2D video codecs (H.26x) it is best to capitalize on smooth/consecutive data that prevents unnecessary motion prediction or compression calculations for noise. Noisy data is usually present as depth holes in depth data, which occurs due to inaccuracies of ToF measurements.

Depth Filling: Responsible for smoothing out data to remove noise that adds unnecessary computations during 2D video codec schemes
Depth Packing (Encoding): Responsible for converting depth data into a universal 2D data structure that can be understood by 2D video codecs

(Left) Before depth filling. (Right) After depth filling, all holes and even outlines are smoothed

Receiver Side: The receiver side’s responsibility is to unpack the data to recover (or at least recreate) the original depth/color data. The depth unpacking mechanism is the inverse of the depth packing process, and it gives an accurate decoding scheme, retrieving 99% of the original data. This, however, is still the depth-filled result and needs post processing:

Depth Unpacking (Decoding)
Edge Detection

My Contributions to BlenDR (Undergrad Thesis)

Multi-view Fusion (End-to-End)
- Added two new steps in the initial design system: the iterative closest point algorithm (ICP) (Open3D, 2018) to refine calibrated point cloud and the statistical point removal (Open3D, 2018) to further remove flying pixels from 3D point reconstruction
- Designed a new system to include for new streams of data, assigning ICP to the sender side (which is usually the server) to lessen the burden on the receiver and allow for accurate refine transform measurement
Flying Pixel Problem — originally dealt with the edge detection algorithm
- The flying pixel problem has been dealt with by correcting the edge detection algorithm used. Previously, the edge detection algorithm was simply subtracted from the depth image, which did not properly remove the flying pixels as expected. After edge detection, I also found the widespread use of the statistical outlier removal algorithm in the 3D reconstruction research (Ma et al., 2021) (Teng et al., 2021).
```
 # thr_result is matrix with 1 and 0 indicating edge
 inverted_thr_result = 1 - thr_result
 depth_with_edges_removed = decoded_depth * inverted_thr_result
```
- The results are shown in the following images. The same dataset is used for the Triangle Wave method (Pece et al., 2011) as well, which clearly shows a poor result compared to our model.

BlenDR

Triangle Method

(Left) No Post Processing. (Mid) Edge Detection Only. (Right) Edge Detection + Point Removal

Fair Comparison (vs GROOT) using Hausdorff Distance
1. Measure: how ‘close’ the reconstructed point cloud is to the ground truth point cloud
2. Without Fusion: o3d HD (cm): 3.49 (groot), 19.40 (triangle), 2.28 (ours)
3. With Fusion: o3d HD (cm): 4.53 (groot), 16.35 (triangle), 2.76 (ours)

Conclusion

This undergraduate thesis was important to me as it was my first opportunity to conduct/plan/execute research independently. Dr. Kim Jaehong helped by nudging me in the right direction and asking me rhetorical questions, making me answer my own questions regarding the next steps to take in research.

References

2021

Publication

Optimization of 3D Point Clouds of Oilseed Rape Plants Based on Time-of-Flight Cameras

Zhihong Ma, Dawei Sun, Haixia Xu, and 3 more authors

Sensors, 2021

Abs

Three-dimensional (3D) structure is an important morphological trait of plants for describing their growth and biotic/abiotic stress responses. Various methods have been developed for obtaining 3D plant data, but the data quality and equipment costs are the main factors limiting their development. Here, we propose a method to improve the quality of 3D plant data using the time-of-flight (TOF) camera Kinect V2. A K-dimension (k-d) tree was applied to spatial topological relationships for searching points. Background noise points were then removed with a minimum oriented bounding box (MOBB) with a pass-through filter, while outliers and flying pixel points were removed based on viewpoints and surface normals. After being smoothed with the bilateral filter, the 3D plant data were registered and meshed. We adjusted the mesh patches to eliminate layered points. The results showed that the patches were closer. The average distance between the patches was 1.88 × 10−3 m, and the average angle was 17.64°, which were 54.97% and 48.33% of those values before optimization. The proposed method performed better in reducing noise and the local layered-points phenomenon, and it could help to more accurately determine 3D structure parameters from point clouds and mesh models.
Publication

Three-Dimensional Reconstruction Method of Rapeseed Plants in the Whole Growth Period Using RGB-D Camera

Xiaowen Teng, Guangsheng Zhou, Yuxuan Wu, and 3 more authors

Sensors, 2021

Abs

The three-dimensional reconstruction method using RGB-D camera has a good balance in hardware cost and point cloud quality. However, due to the limitation of inherent structure and imaging principle, the acquired point cloud has problems such as a lot of noise and difficult registration. This paper proposes a 3D reconstruction method using Azure Kinect to solve these inherent problems. Shoot color images, depth images and near-infrared images of the target from six perspectives by Azure Kinect sensor with black background. Multiply the binarization result of the 8-bit infrared image with the RGB-D image alignment result provided by Microsoft corporation, which can remove ghosting and most of the background noise. A neighborhood extreme filtering method is proposed to filter out the abrupt points in the depth image, by which the floating noise point and most of the outlier noise will be removed before generating the point cloud, and then using the pass-through filter eliminate rest of the outlier noise. An improved method based on the classic iterative closest point (ICP) algorithm is presented to merge multiple-views point clouds. By continuously reducing both the size of the down-sampling grid and the distance threshold between the corresponding points, the point clouds of each view are continuously registered three times, until get the integral color point cloud. Many experiments on rapeseed plants show that the success rate of cloud registration is 92.5% and the point cloud accuracy obtained by this method is 0.789 mm, the time consuming of a integral scanning is 302 s, and with a good color restoration. Compared with a laser scanner, the proposed method has considerable reconstruction accuracy and a significantly ahead of the reconstruction speed, but the hardware cost is much lower when building a automatic scanning system. This research shows a low-cost, high-precision 3D reconstruction technology, which has the potential to be widely used for non-destructive measurement of rapeseed and other crops phenotype.

2018

Online Document

Colored Point Cloud Registration

Open3D

2018
Online Document

Point Cloud Outlier Removal

Open3D

2018

2011

Publication

Adapting Standard Video Codecs for Depth Streaming

Fabrizio Pece, Jan Kautz, and Tim Weyrich

In Joint Virtual Reality Conference of EGVE - EuroVR, 2011