An Introduction to Computer Vision

6 min readSep 25, 2020

by Vamshi Kumar Bogoju

Eyes, one of the most important of our five senses, help to perceive the world around us — taking extremely complex information in the form of shapes, colors, textures, patterns, and familiar forms, and converting these features into packets of information that are transmitted via the optic nerve to the brain, where they are interpreted and compared to a library of memories and associations to derive meaning or classification. In a simple example, it is how our brain knows that a dog is a dog, regardless of whether we are looking at a Beagle or a Great Dane!

In a similar fashion, a computer can use a camera as its eyes. Typically, we use a camera simply as the eyes, with our human brains as the interpreter. Computer Vision (CV) takes human interpretation out of the equation, by training a computer to translate the visual information into meaningful outputs in order to perform tasks and jobs.

Computer Vision is a subset of Artificial Intelligence (AI), a broader field where computers are taught how to learn and interpret data, including visual information — and also including other information, like complex numerical data sets, “big data”. AI uses mathematics and statistics to understand the environment or the scene. CV usually also employs machine learning — the subset of artificial intelligence that involves the construction of complex mathematical models, where computers can improve automatically (or learn) through experience. Computer vision is playing an increasingly important role in our lives: including facial recognition, self-driving cars, medical analysis, and more!

In this article, you will learn how computer vision works, its applications, and key innovation in the field of computer vision.

How Computer Vision Works

As mentioned, images captured by cameras act as input to CV algorithms. To humans, images can be thought of as copies of the real-world, but in 2 dimensions vs. 3. To machines, images are the matrices of pixel intensities. Simply, an image is a matrix with numbers. If an image is a color image, it is a matrix with the size of that of the image and depth-3 (for Red, Green and Blue channels, if RGB space, for example). If the image is grayscale, it has depth-1 (no color information). Almost every CV algorithm performs various operations on these matrices to achieve the objective, the defined task or problem to be solved.

With the rise of availability of data, advances in deep learning algorithms, and an increase in computing power, the CV field is progressing rapidly with new developments. Usage of deep neural networks & graphics processing units (GPU’s) have enabled CV algorithms to recognize complex patterns with optimized computing times that can still produce commendable results.

Applications in Computer Vision

A few of the basic applications or functionalities in Computer Vision are:
● Object Detection
● Image Segmentation
● 3D Scene Reconstruction
● Generative Adversarial Networks (GANs)

Object Detection:

What it is: This is the most explored area of Computer vision, where objects in the images are defined, then detected.

How it works: The CV algorithm is taught to draw a “bounding box” around an object in the image. This task involves the classification of objects in the image, and localization of the object among other objects present in the image. For example, for an image that consists of both cats and dogs, the CV algorithm classifies each object — is it a cat or a dog? — then draws a box around that object irrespective of its location in the image. This process is done for all cats and dogs present in the image.

Image Segmentation:

What it is: Instead of a bounding box, image segmentation employs the classification of the image at the pixel level. Every pixel is classified into one of the different classes it belongs to.

How it works: This task has two variants. In semantic segmentation, all objects of the same class are classified as a single instance, represented with the same color. In instance segmentation, each unique object is represented as different instances with different colors. This task is commonly used in the field of medical analysis, for example, segmenting cancer cells in a medical image.

3D Scene Reconstruction:

What it is: In this task, the goal is to obtain a 3D scene from 2D images of the scene.

How it works: The computer performs what is known as inverse mapping from 2D to 3D. Slight differences in angle and orientation from a collection of 2D images can be used to map back, or reconstruct, a 3D scene. This task is prominent in the fields of robotics, and augmented reality — like virtual room decorators.

Generative Adversarial Networks (GANS):

What it is: This field of Computer Vision is the most happening research field (I ♥ GANS!). GANs are a type of neural network used to generate a new data distribution by learning the data distribution from the input data.

How it works: GANs are a network of generator and discriminator, which learns from input data to generate new data by leveraging adversarial learning. This advanced application is reserved for high-tech purposes including the increase of image resolution, for example generating new photographs of human faces for facial recognition.

Now that we have covered a few of the basic applications of CV, it’s important to understand how we would teach these applications to learn the patterns required to achieve the task. That’s where machine learning and deep learning paradigms come into play — which can take the form of supervised learning or unsupervised learning

● Supervised learning is employed when the data available consists of images with their corresponding labels. Labels are the ground truths for the algorithm — in other words, the known state that the algorithm strives to detect and achieve with new images for a particular task.

● Unsupervised learning comes into action when the available data is not labeled: when images are available without any ground truths.

In the last few years, new learning paradigms have been developed, including

● Semi-Supervised and Self Supervised learning models, which work on the usage of hybrid labeled/unlabeled data in the dataset.

● Few shot learning and Zero shot learning, which are used when the amount of data available is minimal, and zero respectively.

Key Innovation in Computer Vision

It is an exciting time for the field of Computer Vision, especially as deep learning techniques started coming into action to propel advancements in the technique:

● Alexnet[1] has employed deep neural networks for the first time in 2012 to classify the images. With this, neural networks have become a vital part of most computer vision algorithms.

● Another breakthrough is the introduction of GANS[2] by Ian Goodfellow in 2014.

● Development of single stage detectors like SSD[3] and YOLO[4], which treat object detection as a regression problem, has made real time object detection easy.

● StyleGans[5] — changes style of input data, GauGans[6] — photorealistic images from simple patches of color, are few of the recent developments in GANs

As you can see, we are just scratching the surface when it comes to empowering machines to make visual sense out of the world around them. At DIG labs, we can’t wait to apply the latest computer vision principles to better understand our pets’ health, especially since they can’t tell us what’s wrong. Check out our latest research opportunities!

[1] “ImageNet Classification with Deep Convolutional Neural ….” https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

[2] “Generative Adversarial Nets — NIPS Proceedings.” https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf.

[3] “SSD: Single Shot MultiBox Detector.” https://arxiv.org/abs/1512.02325.

[4] “You Only Look Once: Unified, Real-Time Object Detection.”https://arxiv.org/abs/1506.02640..

[5] “A Style-Based Generator Architecture for Generative ….” https://arxiv.org/abs/1812.04948.

[6] “Semantic Image Synthesis with Spatially-Adaptive Normalization.” https://arxiv.org/abs/1903.07291.

An Introduction to Computer Vision

Written by DIG Labs