News

Intel Labs reinforces its commitment to artificial vision with VI-Depth 1.0 and MiDaS 3.1

VI-Depth 1.0 and MiDaS 3.1 are two new models for the depth estimation in machine vision, and with them Intel Labs has achieved an important advance in the sector. In this article we are going to delve into both and we are going to tell you all their keys.

We start with VI-Depth 1.0, which we can define as a visual-inertial depth estimation line that integrates monocular depth estimation and visual inertial odometry (VIO) to produce dense depth estimates on a specific metric scale.

This method performs a global alignment of scale and displacement with the sparse metric depth, followed by a dense alignment based on learning. Depth perception is essential for visual navigationand correctly estimating distances can help you plan your movement, avoid obstacles, and prevent accidents.

The precise depth estimation offered by VI-Depth 1.0 can help in scene reconstruction, mapping and also object manipulation. Combining metric precision and high generality is a fundamental challenge. that we must overcome in order to achieve an effective estimation of depth based on learning.

This Intel Labs solution incorporates inertial data into the visual depth estimation process, not through sparse-to-dense depth completion, but through dense-to-dense depth alignment using learned and estimated scale factors.

VI-Depth reduces errors with learning-based local alignment than with global least-squares-only alignment, and successfully performs a zero-trigger cross-dataset transfer from synthetic training data to real world test data.

This modular approach allows the direct integration of existing and future VIO and depth estimation monocular systems. It also resolves the metric scale for metrically ambiguous monocular depth estimates, aiding the implementation of robust and general monocular depth estimation models. VI-Depth is available under an open source MIT license on GitHub.

Intel Labs MiDaS 3.1: Increased Accuracy in Estimation Model

midas intel

This update adds new features and enhancements to the open source deep learning model for monocular depth estimation in machine vision. MiDaS has been tested with large and diverse image data sets and is capable of providing relative depth indoors and outdoors, making it a versatile solution full of possibilities.

It offers a good level of performance and presents a highly efficient operation. He is able to estimate the relative depth of each pixel in an input image, and this makes it a very attractive and viable option in a large number of applications and environments, including everything from robotics to augmented reality (AR), virtual reality (VR) and computer vision.

Between the most important new features of MiDaS 3.1 we can stand out:

  • New models based on five different types of transformers (BEiT, Swin2, Swin, Next-ViT and LeViT). These models can offer higher accuracy and performance compared to the models used in previous versions of MiDaS.
  • The training data set has been expanded from 10 to 12 data sets, including the addition of KITTI and NYU Depth V2 using the BTS division. This expansion of the data set can improve the generalizability of the model, allowing it to perform effectively in a broader range of tasks and environments.
  • BEiTLarge 512, the model with the best resultsis on average 28% more accurate than MiDaS 3.0.
  • The latest version includes the ability to perform depth estimates in real time from a camera image, which could be useful in various machine vision and robotics applications, such as navigation and 3D reconstruction.

MiDaS 3.1 is available on GitHub, where it has received more than 2,600 stars (positive ratings) from the community.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *