Skip to content


The World Through Robot Perception

Robotics, Robot Perception, Computer Vision4 min read

Avnish Gupta

Meet Avnish Gupta

Robotics Engineer @ Addverb

Uttar Pradesh, India

Avnish Gupta is a Robotic enthusiast and actively working in the fields of Computer Vision, Controls, Deep Learning for Robotics.

Engaged in the development of algorithms for perception, manipulation, system integration and control of mobile robots and industrial manipulators.

Strong engineering professional with a Bachelor of Technology (BTech) in Mechanical Engineering from the Indian Institute of Technology, Mandi with a strong focus on the fundamentals of computer programming.

Have hands-on experience of working on ABB Industrial Manipulator Robot, Omron TM 14 Cobot, Lidar, stereo cameras, depth sensors, Servo Motors, etc. Passionate about learning new technologies and stays abreast of developments in the field of robotics.

What inspired you to pursue a career in robotics?

Robotics caught my attention in early childhood when I watched with amazement a television series on robots in manufacturing.

The visuals showing fully automated factories functioning with clockwork precision are still vivid in my mind.

I have left no stone unturned to build upon this fascination; from my backyard projects at home as a child, all the way to tinkering Arduino based robots during highschool.

The early inspiration even served as a guiding light for making up to the prestigious Indian Institute of Technology(IIT) Mandi for my undergraduate studies in mechanical engineering.

This was a significant step, as the interdisciplinary curriculum offered a gamut of theoretical and practical subjects, from electronics, computer science, and mechanical engineering.

This helped me develop a broad intellectual base necessary for robotics and strengthened my desire to further explore in this field.

After my graduation, I finally started working as a Robotics and Perception engineer at a start-up - Addverb Technologies.

Define robot perception and its application?

Perception means how we as humans see and react to the world around us. Similarly, robot perception poses a novel problem of teaching a robot to perceive, comprehend, and reason about the surrounding environment using sensory data from sources like cameras, lasers, radars, etc.

The typical problems of robot perception include object recognition, face recognition, SLAM, semantic segmentation, etc.

Since the past few years, a significant leap in sensor technology and the development of artificial intelligence techniques have made the development of perception-based robots a top priority for a majority of robotic industries.

Current advancements in robot perception have enabled us to develop unmanned aerial vehicles, self-driving cars, autonomous ground vehicles, humanoids, etc.

More importantly, it is also benefiting healthcare by advancement in the field of surgical and medical robotics, where robots are being used to perform complex surgical procedures with high precision and lower human errors.

What is the difference between human perception and robot perception?

Human perception is a very complex process that we humans unknowingly do to sense the world and act around it.

It is dependent on the input from our five main sense organs and stimulus carried by nerve cells. This unlabelled data is continuously processed by our brain in real-time considering the learnings and events from the past, based on which we perform our actions.

On the contrary, robot perception is something that we take as an analogy to human perception.

We try to teach computers to sense the world just as we humans do, based on the input from various sensory data.

The researchers in this field are pioneering new algorithms each day but as of today also we are still very far from developing robots that can perceive, learn and work exactly as human brains do.

Do you have any Robotics projects that want to share your experience?

I am developing a 3D bin picking project as part of my current employment with Addverb Technologies, a warehouse automation startup based in India.

The primary aim of this project is to pick and place randomly kept objects kept in a bin using a robotic manipulator arm.

This problem looks very simple when looked upon with a perspective of a human performing such a task, but the main intricacies occur when this task has to be taught to a machine.

This is a core perception problem, in which we teach computers to segment and recognize the object in a random cluttered scene and simultaneously compute its 6 DoF pose to pick it up using a manipulator.

This is a very interesting and challenging problem and people from all over the world are trying to solve it as a part of the Amazon Binpicking Challenge.

So far, I have published research titled “A Low-Cost Bin Picking Solution for E-Commerce Warehouse Fulfillment Centers” in Australasian Conference on Robotics and Automation (ACRA) 2019.

How close are we to have humanoid robots between us?

This is something that does not have an absolute answer, but a Hong Kong-based company called Hanson Robotics has set the first milestone in the development of Humanoids.

They crafted the world's first robot citizen Sophia, which is hybrid human-AI intelligence.

Another UK based company called Deepmind Technologies is developing advanced artificial intelligence algorithms to formalize human intelligence by understanding the functioning of the human brain and implementing it into machines.

What type of data is processed input and output in the execution of robot perception?

The input to the system in case of perception based tasks is the sensory data. It can be an image or video or audio data from the camera, laser scan, point cloud, etc or a combination of all of these.

This data is then processed using various algorithms to generate certain output based on the application requirements.

Let’s take the example of a lane detection problem.

In this problem, firstly we train a neural network using thousands of labeled camera images.

Slowly our network learns how to predict and classify lanes as an output in an input image having random street scenes.

Mention a company that as roboticist is your dream to work for? Why?

My vision is to be a part of an institution or corporate where I can put my knowledge and skills to work for the common good.

It is my fervent desire to engage in research and development of perception-based robots that can mimic human behavior and actions.

My long term goal is to master the field of robotic perception and controls and be at the forefront of innovation by sticking to a research-oriented approach.

I dream of collaborating with scientists who are trying to solve the perception problem at the core by understanding how the human brain works.

To know more about Avnish Gupta Robotics projects you can see his Portfolio