Understanding Depth Perception in Computer Vision

What is GAN in Artificial Intelligence(AI)?
July 1, 2021
Monetization of AI models/assets
July 12, 2021

Depth perception is the visual ability to perceive the world in three dimensions (3D) and the ability to estimate the distance/depth of an object from the source. The world that we observe is three dimensional, but the image formed on the human retina is two dimensional, which means the input to our brain is in two dimensions (2D). But we are still able to perceive the world in 3D. It is the ability of our brain to perform depth perception which is the result of human evolution. It tells us about the depth of every object or we can say the relative distance of every object from our eyes. It is crucial to our everyday life and prevents us from bumping into things. It also helps us to determine the relative speed of an object.

In technology, there are many applications of depth perception, including self-driving cars. Here LiDAR is one of the many methods used for depth perception. It uses laser beams to measure the relative distance of an object by illuminating it with the laser light and then measuring the reflections using sensors.

Depth perception in humans

We know that artificial intelligence is based on the assumption that the process of human thoughts and abilities can be mechanized. So to understand how depth perception is used for computer vision, it is better to understand how we, humans perform depth perception.

Depth Cues

The details in the environment that allow us to perceive depth are called Depth Cues. In humans, depth perception is ascertained through both Monocular and Binocular cues. Here monocular means ‘with one eye’ and binocular means ‘with both the eyes’. Because of this, in spite of having no depth information in a 2D image, we can still interpret the depth effortlessly.

a. Relative size

If two objects are known to be of the same size but unknown absolute size, these cues provide us the information about the depth of the objects by the visual angles subtended on the retina. The larger the visual angle closer is the object. If two objects are on the same plane and at some distance away from the source of vision, the larger object appears to be closer.

b. Interposition


When an object overlaps the other, the object, which is partially hidden, is perceived as being farther away. It provides the depth of the object relative to one another.

c. Aerial Perspective

It refers to the objects that tend to look unclear, hazy, or blurry as compared to the other objects due to the atmosphere. This tells us the blurry object is farther away.


d. Linear Perspective


It helps an observer to perceive the depth of an area where two parallel lines appear to converge and meet at infinity. The closer the distance the two lines are, the greater the distance from the source.

e. Lighting and Shading

It provides information about the depth of objects by the way light that falls on them and gets reflected or the shadow the object cast.


f. Monocular movement parallax

As we move, the apparent relative motion of a stationary object against a background gives us an idea about their relative distance. When we drive, closer objects pass quickly as compared to farther objects.

g. Texture gradient

We can see the fine details of an object that is closer whereas it is not possible with the distant objects. For example, in a grassy field, the texture becomes less and less apparent the farther it goes into the distance.

h. Elevation

When an object is relatively closer to the horizon it tends to be farther away whereas those that appear to be relatively far from the horizon tend are usually seen as being closer.

Binocular Cues

Binocular cues provide the depth information when viewing a scene with both the eyes. Binocular cues allow us to gain a 3-dimensional interpretation of the world and allow us to navigate through it effortlessly. Some of the binocular cues are mentioned below.

a. Stereopsis

It explains how the image of the same scene obtained from slightly different angles can help us to judge the depth. The larger is the disparity, the closer is the object. It happens because of the horizontal separation of our eyes, we get two images of the same object but from slightly different angles. 3D movies are the best examples of stereopsis or retinal disparity. In such movies, the scenes are filmed with cameras at slightly different angles.

b. Convergence

We focus on an object from both the eyes, in doing so they converge. Our eye muscles have to contract and relax to focus on objects at different positions. Our muscle movements provide information to perceive the depth of an object. To observe convergence, we can hold our finger in front of our face and focus on its tip. Then slowly bring it closer. We could feel the stress and observe the image becoming two images and getting blurred.

Depth Perception in Computer Vision

In the 21st century, computer vision has become one of the most dominating sectors of Artificial Intelligence. Computer vision deals with how computers can gain a high-level understanding of images and videos to perform various tasks. The images are taken using electronic cameras which are usually in the visible spectrum. Cameras capture a 2D picture which is a projection of the 3D world on a medium (or film or sensor or screen) much like the human eyes, thereby losing the depth dimension. Click here to know more about depth sensors in computer vision applications.

In many of the applications, 2D image understanding is sufficient but some applications like autonomous vehicles need 3D scene understanding which is a challenge in itself because now we have to estimate the depth dimension from 2D images only.

Monocular cues are primarily used to achieve information about the depth of the objects in the imagery. This is because most of the time we use a single camera that captures monocular images. The monocular depth estimation has gained popularity and attracted many researchers. Most research activities used geometrical cues to estimate depth.

Implementing depth estimation

Machine Learning and Deep Learning techniques are heavily used in Computer Vision because they are able to efficiently mimic human-like pattern recognition.

Below are some videos for a quick starter on Machine Learning. Feel free to skip them, in case we are already familiar.

There are many techniques, architectures, algorithms used for depth estimation/perception. Let us look at some of these.

This approach is used to predict the depth of dynamic objects. This method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. It uses videos in which people imitate mannequin (freeze in elaborate, natural poses, while a hand-held camera tours the scenes).

The input of the neural network includes a reference image, a binary mask of human-region, a depth map estimated from motion parallax, a confidence map C, and an optional human keypoint map K. The network is trained to fit the Multi-View Stereo (MVS) depth map, which is the output of the network.

Unsupervised Learning

As we have seen how supervised depth estimation works. It is quite challenging to obtain depth datasets of high quality that account for all possible background conditions in an environment. Enhancing the performance of supervised methods is difficult due to the lack of accurate data. Since unsupervised (and semi-supervised) methods do not require ground truth depth at training time, hence they are not limited by this constraint. Click here for the reference research paper.

The below image shows the result of an unsupervised depth estimation approach.

During the training of unsupervised depth estimation models, the relative pose of multiple cameras is used to predict the appearances of a held-out nearby image. So this approach enables a CNN to learn to perform single image depth estimation in the absence of ground truth depth data.

These algorithms compute the similarity between each pixel in the first image and every other pixel in the second image. This approach presents the depth estimation as an image reconstruction problem.

The intuition of this approach is that, given a calibrated pair of binocular cameras, we can learn a function that can reconstruct one image from the other.

Applications of Depth Perception

1. Augmented Reality reconstruction of an object

Augmented reality is one of the key applications of depth estimation. It helps us to visualize objects in three dimensions as well as view it from multiple angles and scales. Depth information is absolutely necessary for the AR/VR devices.

2. Robotics

Most of the industries are using fully automated robots in their production lines and for such applications, depth-estimation is one of the major factors to calculate the motion along the third dimension. The depth information of objects also plays a primary role in an autonomous vehicle to detect the distance of objects from the vehicle.

3. Cameras

We use the depth information for many computer vision-based applications such as facial recognition and image-based classification to increase efficiency and accuracy. The 3D face modeling also adds up more features to the face recognition model. Depth estimation is also necessary to adjust the focus of the cameras and for the portrait mode photography.

depth in camera


The depth is an important cue that is lost while capturing the image or video that limits the performance of computer vision. With the advancement in technology, where automation is taking over in all the sectors, estimating depth is very important for estimating the motion and relative speed. This article would have given us a brief idea about depth perception and its implementation in the field of computer vision. In the coming years, we are going to witness dramatic innovations in this field.

Veer is a very passionate individual and has been working in the industry for the past 5 years. During his career, he has taken up different roles as a developer, senior developer, educator, consultant, mentor and team lead for various colleges, clients and projects. His interests include Deep Learning, Reinforcement Learning, Business Analytics, Extended Reality, Autonomous Vehicles and Electric Vehicles. He has trained many ML professionals and students. He has worked with companies such as Infosys, Lenskart.com and currently working on something really interesting.

Leave a Reply

Your email address will not be published. Required fields are marked *

Snehanshu Mitra,

Snehanshu leads the AI initiatives at NASSCOM. He heads the CoE for Data Science & AI – in partnership with Govt of Karnataka and the Telangana AI Mission in partnership with GoTS. He is responsible for creating, nurturing, and scaling up a vibrant AI ecosystem that involves driving AI adoption, accelerating AI startups, leveraging AI for societal good, work with enterprises to co-innovate and promote applied research and AI skilling.

In his two decade long career, Snehanshu has advised enterprises on driving business transformation and delivering impact through data science & AI. His core experience lies in developing strategy, creating & nurturing world-class capabilities, driving innovation, delivering value proposition to global clients, research and managing P&L.

Snehanshu has worked with several organizations across the globe – multinationals, GCCs and startups across sectors such as Technology, Telecom, Hospitality, Retail and Banking. Prior to joining NASSCOM, he was part of Vodafone Shared Services, Zyme, Dell Global Analytics and Accenture.

Madhav Bissa

Madhav brings more than 20 years of experience in Strategy Consulting, Research & Analysis and Executive Search.  He has advised Fortune 500 and FTSE 500 and leading Indian organizations on the topics of Corporate Strategy, and M&A. He has worked at global organizations like Arthur D. Little, Heidrick & Struggles and Accenture.  He has been a founder of two start-ups wherein he provided business support services to organizations in the areas of strategy, fund raising, recruitment and documentation.

Madhav is also a visiting faculty at various academic institutions and from time-to-time delivers lectures and workshops on Strategy and Business Analytics.

Currently Madhav works at NASSCOM’s Centre of Excellence for Data Science and Artificial Intelligence as Program Director.  In this role, he helps organizations adopt Data Science and Artificial Intelligence solutions and assists DS&AI startups to connect with investors.

Supriya Samuel
Branding & Marketing Manager – CoE, DATA SCIENCE & AI NASSCOM

Supriya Samuel has more than 14 years of work experience across many profiles in Sales, Branding, Campaign Management, Digital & Product Marketing, Channel Enablement, Event Management and Account Based Marketing.She holds a Client Centricity and an Agile Explorer Badge from IBM and is also a Certified Digital and Product Marketer from Udemy.

During her stint with IBM for more than a decade, she has been a part of the ISA (India-South Asia) Inside Sales and worldwide teams to drive Marketing efforts for the Global Alliances, Industry, Product and the Account based Marketing Teams. She played a crucial role in setting up the MDF process for Pan Europe to leverage the SAP Funds to run demand generation activities and created a new digital experience like Oracle Virtual University for the IBM Sales Teams to help them navigate a wide range of enablement materials. She also drove the end-to-end planning of IBM’s Cloud presence at the world’s premier Banking Event- Sibos in 2018 and was instrumental in conceptualizing the VIP Framework for IBM’s Top Integrated accounts in 2019.

As the Marketing & Branding Manager at NASSCOM – CoE DS&AI, Supriya leads and drives the Integrated communication plan for promotion and dissemination of various NASSCOM’s CoE DS&AI Programs which includes Events, Webinars, Technology workshops and Marketing content to the NASSCOM Teams and the DS&AI ecosystem. At present ,she is spearheading activities such as driving & engaging conversations across various social media handles of CoE DS&AI. She has worked very closely with the Government of Karnataka for the CoE’s participation in Asia’s largest Summit (Bangalore Technology Summit) creating a strong brand presence in the DS&AI ecosystem.

Become a Member

    Explore how CoE can help Enterprises

      Let us know if you have any interesting AI Blogs on trending Topics to share and we’d be happy to feature them on our Website

        Join Our Ecosystem

          Become a Partner

            Co-create With Us

              Krishna Prabhu,

              Currently Technical Director at NASSCOM CoE DS & AI, play pivotal role in National initiatives like Open Data Platform, AI HPC Labs, AI – Technical mentoring, help accelerate AI adoption in Industry with initiatives like Innovate to build, Data and AI Policy frameworks

              Over 23 years experience in Leading and delivering Analytics engagements and Solutions across Industries. Played key roles
              Strategizing Analytics Solutions and Leading Advanced Analytics Centre of Excellence. Delivered Advanced Analytics engagements
              in Cognitive, Data Sciences, IoT, Predictive Customer Intelligence (PCI), Predictive Maintenance and Quality (PMQ) across
              Domain areas

              A Senior Data Scientist, AI specialist and practitioner

              Have lead Concept to roll-out of Advanced Analytics solutions for Fortune 500 companies across USA, UK, SE.Asia, Africa

              An alumni of IIM Bangalore specializing in Business Analytics & Intelligence
              Background in Bachelor of Engineering from Bangalore Univ and holds a Diploma in Management & Economics

              Get in Touch

                Sudeep Kumar Das,
                PROGRAM MANAGER

                Sudeep with over 17 years of experience in Customer Management, Account Strategy, and Partner Management. Having spent around a decade in technology companies like CISCO & Oracle in business development and customer success roles. Sudeep always had a keen interest in organizational development as a subject and hence took up an Executive PG course on Organisation Development and Change in the famous Tata Institute of Social Sciences.

                Currently, the go-to person for anything on the AI Startup Ecosystem and driving State level Skilling initiatives for the CoE. Driving key initiatives like the Advance Acceleration Program and Faculty Development Program for the CoE

                He is a Go-getter and hustler in chief. Sudeep is an avid swimmer and runner; and believes in the learnings from sports in our daily lives.

                      Advance Acceleration Program

                        Raj Shekhar
                        Lead – Responsible AI KTECH COE Data Science & AI NASSCOM

                        Raj is driving NASSCOM’s efforts at defining a roadmap for an extensive roll-out and adoption of responsible AI in India. Before joining NASSCOM, Raj served as Consultant (Data, AI) at International Innovation Corps (IIC) of The University of Chicago, supporting operations of the Open Data Working Group—an initiative by IDFC Institute and IIC to advance India’s open data aspirations, and IIC’s engagement with the Ministry of Electronics and Information Technology, Government of India—aimed at building capacity for data and AI innovation through policy and program implementation. Raj also is the Founder & Executive Director at AI Policy Exchange, an Affiliate at The Future Society, and sits on the Founding Editorial Board of Springer Nature’s AI and Ethics Journal.

                        Tarun Kumar
                        Consultant – Evangelist, Data Strategist, Knowledge Asset, NASSCOM

                        Tarun is currently leading the Data Strategy Initiatives for CoE – Data Science & AI at NASSCOM. During his 20+ years of work history, he has led multiple teams with a focus on the application of machine learning and cloud-native across various sectors such as Telecom, Digital and GIS & IT.

                        An avid learner, Tarun is passionate about creating an impact on society, environment, corporates and developer communities with the adoption of emerging technologies. He is a B Tech graduate from IIT Mumbai and holds various other certifications as well.

                        Tarun is currently engaged in key COE initiatives like Telangana AI Mission, Responsible AI (RAI), MLOps and AI Pathshala.