Deep Learning

An absolute beginner's guide to Deep Learning (Updated)

April 7, 2020

Have been hearing a lot of discussion about Deep Learning but have no idea where does the term even come from? Well, you are at quite the right place. Today, we will be discussing Deep Learning in this guide from an absolute novice’s perspective!


  • Introduction to Deep Learning
  • Understanding Deep Learning
  • Applications of Deep Learning
  • The Know-how of Deep Learning
  • Deep Learning and You
  • Conclusion

Introduction to Deep Learning

The Definition: 

While we do not have a specific definition for deep learning, the concept backing it is quite elaborate. 

The Concept:

Deep Learning methods are a subset of the Machine Learning domain. 

Let that sink in. 

Machine Learning: A Quick Recap

Before we move forward, let’s do a quick recap of what machine learning is. We have posted about machine learning previously  and you can find the detailed information here

In short, Machine learning is the process of making the computer understand what to learn. To make a computer understand what to learn, we use the following methods of learning: 

  • Supervised Learning
  • Semi-supervised Learning
  • Unsupervised Learning
  • Under Supervised learning, we help a computer learn by feeding it labelled and well-organized data. The computer recognizes and learns patterns from this data by applying an inference equation. We carry out regression from time to time to purge the incorrect correlations that the computer might have learnt. 
  • During Semi-supervised learning, the processes followed are more or less the same. Just that the data sets used to train the system have a higher percentage of unlabelled data than labelled data. These are an advancement over supervised learning. 
  • In case of Unsupervised Learning, the approach takes an interesting shape. Instead of only labeled input data, we have labeled responses. That is, we provide the system with the initial state data and the results that we expect. How to get to these results is what the system figures out on its own. This is even more data intensive than the previous two methods but is the most advanced method at present. 

We apply one of these methods depending upon the complexity of the target system. 

Ok! Where is Deep Learning in this picture?

Deep learning is the group of algorithms enforcing a process of making a computing system learn like a human brain. It employs neural networks for this job.

These algorithms are based on learning data representations, contrary to the traditional task-specific algorithms. Here, learning can be unsupervised, semi-supervised or supervised. 

In simpler words, Deep learning uses neural networks to help the computer replicate the thinking and analytic procedure of a human brain. 

Neural Networks are?

Called Artificial Neural Networks (ANN) as well, Neural networks are simple processing elements, combined together in layers, processing external inputs and reacting with dynamic state outputs. 

Quite complex? Let’s break it down then.

Neural networks are computing systems, programs that take an input and give away an output. What makes them different is that a neural network is organized in layers and has nodes. These nodes have an ‘activation function’. Patterns are presented to the network by input layer which communicates to one or more hidden layers where the actual processing is done by a system of weighted connections. The hidden layer(s) then links to an output layer, as represented by the image below:

Neural Network

What is an Activation Function?

An Activation function is a standard function which can be off or on depending upon the input. If it is on then, the corresponding neural connections are fired (On) else not. It could be seen as an abstraction of ‘action potential’ of the corresponding node. 

Action Potential:

The action potential is the set of all the actions or results a node could trigger or perform depending on the input. 


Connections are the correlations between various nodes and are derived by analyzing extensive amounts of data (generally). Each connection has its own importance rating, or relevancy score, this is called the weight of the connection

Weights decide the extent to which a connection affects the end result. 

What is a Learning Rule? How is it Relevant?

Most neural networks, or more appropriately, Artificial Neural Networks (ANN) have a ‘learning rule’, according to which they modify the weights assigned to connections.

Hence, as more data is analyzed, the weights get streamlined and the model gets better at making decisions. Hence, we observe a ‘forward activation flow’ of outputs and ‘backwards error propagation’ of errors. 

How are Neural Networks different from conventional computing solutions? 

Conventional computing systems are sequential and deterministic, that is, they execute a set of instructions in a certain order and save the output in a certain memory spot. The CPU does all the work, taking input, processing it, applying and executing the instructions and saving the output in a memory location. The state or value of a variable of this program can be tracked throughout the process of execution or processing. 

Whereas, in an ANN, the system is not necessarily deterministic or sequential. The neural networks run in parallel and analyze the input. They update the weights of connections according to the information and results if any appear at the end of the chain. The network doesn’t maintain states of variables but updates knowledge. In a way, the network is the knowledge

Learning in ANNs can be supervised, semi-supervised or unsupervised as discussed in the beginning of the tutorial. 

ANNs can be slow in processing and may seem like a black box. We call ANNs a black box because after creating the initial framework and seeding in the initializing input, we don’t know what is happening among the networks. We can just sit back and wait for the output to turn up and keep feeding more data to somehow adjust the output to the accurate decision. Their usefulness, though, overshadows these drawbacks. 

We have a lot more to discuss about ANNs but that’s for another day! Today, let’s get you familiarized with Deep Learning in this guide. 

Deep Learning Vs. Machine Learning

Even though a daughter discipline of Machine Learning, Deep Learning is quite different from it. In the following discussion we will differentiate between the two. 

FeatureMachine Learning (ML)Deep Learning (DL)
Data Intensive?ML works well with small scale but well-labeled data. It is not particularly data intensive and doesn’t show significant improvement in performance with enhanced data sets. Deep Learning algorithms learn patterns and hence are dependent on data for their performance. As the size of feed data goes up, so does the accuracy of DL algorithms. It is hence, highly data intensive.
‘What to Learn?’Most traditional ML algorithms can’t decide what to learn by themselves. They need human interference to know parameters they should be learning. It is implemented by inference functions in generals. The very beauty of DL algorithms is that they figure it out on their own what to learn from a data set. Powered by an activation function, weighted connections and layered architecture, DL algorithms are true AI.
Solving a problemTraditional ML algorithms solve problems using an inference function. They break the problem into separately solvable morsels and solve these morsels individually. Finally, they combine the results to get the final answer.DL algorithms do not break down problems, they take the problem statement, the desired result and device the path to the result as they go. This is called the end-to-end problem solving approach.
Turn-around TimeML algorithms build a learning model when executed for the first time. They create generalizations which are referred to for answering all future questions. Hence, after the first go, short turnaround time for most inquiries. DL algorithms continuously evolve, the weights on connections keep getting updated and hence better and more accurate results are produced. However, this leads to a higher turnaround time
Model Interpretation and data pathsBeing rule based models, reverse engineering (decoding and interpretation) is possible and somewhat easier for these networks. The values of variables can also be tracked throughout the system. Being based on Neural Networks and layered architecture, DL systems do not provide much insight into how they function. In simpler words, nobody knows what exactly happens inside a neural network once you have fed it data. 
Hardware RequirementML systems do not demand very high-end resources. Just like many other algorithms like genetic algorithms, these run on generic hardware.DL algorithms need high-end computing resources, even commodity hardware in certain cases.

Why exactly Deep Learning is needed? 

Let’s say we want to devise a program which will identify spoken words and convert them to text. Now we could provide this program with an extensive database and program it to compare pronunciation input to the stored information and map to the corresponding word. Even when assuming that there is no difference in accent, we will have the issue of homophones. 

For example consider this sentence, 

‘I went to see my friend’

Now, how would the computer know which word to use, ‘sea’ or ‘see’ or ‘C’? 

This is the issue which programmers faced while developing AI solutions. Also, there are so many more classes of problems of the same nature. Understanding concepts, recognizing objects, identifying handwriting, lip reading: we don’t know which program to write to perform these functions. It is so because we don’t know how our brain does it. Even if we knew, the programs will be too extensive and complex. The error margin would be enormous and its efficiency will hardly be scalable. 

Now, if the computer could learn identifying the contexts and decide which word to use, that will be quite nice, right?

Yes! But how would that happen?

For this to happen, we would need to provide the computer with an extensive database of speech, specific information on homophone and an efficient algorithm

This system will then analyze provided data and learn patterns and how often certain words are used together. 

Also, this system would be better at speech to text conversion as compared to the traditional programa. This is narrow AI at work. 

When we want better results, we juice up the algorithms, use more complex data structures and more sophisticated object models.

This is where deep learning comes in the picture. 

How does it function?

In one line, by continuously evolving by processing more and more data and updating the weights on connections so as to produce highly accurate and logical results.

In the example above, where the Narrow AI solution is discussed, we are providing to our program a huge data set and some rules as to what to look for when crunching the data. This leads to formation of rules based on which, we get outputs when we input unseen data.

Now, we know ANNs use weighted connections, activation function powered nodes and multiple layers to absorb information and organize it themselves. 

Just imagine having a machine to which we can feed unorganized data and it will do all the rule development work for us and more data it processes, better are its results. We don’t write an algorithm exactly, we just develop a framework. With Deep Learning, we do just that. We prepare a framework which learns, without us supervising it in any way. 

ANNs are the obvious choices to develop this system, but for more advanced and critical real-world problems, we use convolution networks. They are also a part of Deep Learning tactics. 

What is a Convolutional Network and why is it important to Deep Learning?

A convolutional neural network (CNN) is a feed-forward artificial neural network that has successfully applied to analyzing visual imagery. A CNN is designed to take advantage of 2-D structure of input like images, audio signal to learn. The learning process is employed using local weights and tied weights followed by some sort of pooling which results in translation invariant features. CNNs are easier to train and may have fewer parameters tha fully connected networks with the same number of hidden units. 

Deep Learning makes use of ANN and CNN to establish various application which will discuss henceforth. These applications will also elaborate upon the fields where Deep learning is becoming commonplace by the second.

Applications of DL

  • Colorization of Black and White Images.

To perform this feat, generally very large convolutional layers and supervised layers are employed which recreate the image with addition of color. The same approach is used to colorize still frames of black and white movies. 


The networks employ object detection as well as object identification while learning which color is apt for which object. The mode of learning can be both supervised and unsupervised. 

  • Adding Sounds To Silent Movies.

Research is underway to establish this monumental task. Using DL trained algorithms to add sound to otherwise silent movies is as hard as it sounds. Recently, in an experiment, researchers tried to make the computer add sound to a drum video by studying the pattern of the stick hitting the drum. 

This program not only had to study the frequency of the stick hitting the drum but also figure out the speed and force put into the stick and the kind of sound it would produce. This was a rudimentary example when compared to the real life 

  • Automatic Machine Translation

Deep Learning has completely rewritten our approach to machine aided translation. The technology used here is ‘Sequence to sequence’ learning. 

Sequence to Sequence Learning?

This powerful technology uses a stateful model of neural networks. As mentioned earlier, neural networks are computing elements which follow the pattern used by human brain while making a decision. 

Stateful neural networks use the output of previous calculation as an input to the next calculation, contrary to Stateless neural networks which do not maintain this state (storing the previous output). 

The benefit of stateful system is that the output of previous translation serves as context and helps in better translation for the next sentence. 

Stateful System

The Concept of Parallel Corpora: 

Some systems also use the ‘parallel corpora’. Using this concept we train the system on translations of the text in more than two languages. This way the language semantics are learnt by the system.

Google Translate is an excellent example here. It can translate 100+ languages and all with a satisfactory degree of accuracy. One could build their translation system using deep learning and the available libraries but this is still resource-intensive and hence an expensive endeavour to take on. 

  • Object Classification in Photographs

Object detection has been around for quite sometime now so we won’t reinvent the wheel today. Object classification, however, is grabbing interest widely. 

What is Object Classification?

The concept here is that the machine searches for the object on internet and based on the retrieved images which are similar to the object, the object is classified in a category. 

The system finds visually similar images and then chooses the classifiers, as mentioned with already labeled images. Quite obviously, it comes as a next step to object detection. 

How do we perform this sorcery!?

Neural networks are trained on extensive amounts of data to learn how to look for images and run a probabilistic function to figure out the best classifier and image matches. 

The system used by Facebook as DeepFace is eerily accurate and has been called ‘as good as a human brain’ in identifying users in pictures. DeepFace applies deep learning for this job.

  • Automatic Handwriting Generation

Why is it even a problem?

To generate text that looks handwritten, the system needs to study an extensive amount of data. No two people have the same handwriting and generating unique handwritten text is equally complex. 

For languages like Chinese (which has over twenty seven thousand official characters, which most people write differently), the algorithm needs to examine incomparable amounts of data to learn all ways of writing a single alphabet.

How does DL help?

 An algorithms could always use probabilistic functions to determine whether an image is closer to the standard alphabet. DL algorithms are better at this considering they leverage the probabilistic and statistical abilities.  

In short, DL algorithms learn the pattern of standard text and then run probabilistic comparison on a database to identify and/or generate handwritten alphabets. 

  • Character Text Generation

What is Character Text Generator?

Character text generation deals with generating text for use in videos like subtitles. 

How do we do this?

To accomplish this goal, deep learning solutions apply speech/audio signal to text generation and lip reading. Lip reading is comparatively a new addition and is being researched as of now. 

Traditional Systems Vs. DL Systems:

  • Traditional systems use a statistical analysis of speech input to find out relevant words. These always had a high degree of misinterpretations and were not equipped to identify if the sentence makes sense. 
  • On the other hand, neural networks learn patterns and stateful neural networks take it a step further by adding context to the observations. 

Youtube is currently using this technique to provide real-time subtitles to all videos on the website. The service does falter at time but more than average number of times, it works quite well. Further research is underway in this area.

  • Image Caption Generation

Why is it complex?

Generating caption for an image, takes more than just identification of objects. The system should be able of producing sensible and coherent sentences on its own and should be able to employ creativity. To caption an image, we need text that best describes the image, its contents, or at least relevance of the image. 

How do systems do this?

To help a computer employ creativity, we need to form the very concept of creativity in it. Deep Learning helps the computer scan creative text and learn contexts and vocabulary from there. To recognize the image, object identification and classification techniques are being researched and fine tuned to be employed.

  • Automatic Game Playing

What is Automatic Game Playing?

Automatic game playing means a the ability of a computer to learn playing a certain game without exclusively being taught. The computer is, however, given an extensive database to learn from. This database contains game scenarios and actions that led to winning or losing in the game. With help of neural networks, the system learns its way through rules and complexities of the game and when tested for its ability to play the game, it should be as good as a human at the very least. 

Recently, an AI system, AlphaGo (by Google AI) defeated the World Go champion Lee Sedol. Go is an East-Asian game fabled for its complexity and enormous number of possible moves. This program wasn’t provided with any initiation in Go, instead it learnt from a database the rules and moves of Go. However, Mr. Lee constantly pushed AlphaGo in a corner and even won one game. This sheds light on possibility of improvement in future. 

How to begin with Deep Learning?

There are a number of sources to study deep learning algorithms from and it could get confusing to choose. If you are comfortable with online tutorials Keras Tutorial: Deep Learning in Python, TensorFlow Tutorial for Beginner, keras: Deep Learning in R are among the best online resources. It’ll be good if you chose a source and stick to it. 

If you want to do a certificate course, Coursera, Udacity, could be your go-to choices. 

Online academic resources at, are quite up-to-date and helpful. 

We recommend that you develop a good grip over statistics, probability theory, linear algebra, R/Python, Machine Learning basic concepts and some information about neural networks would also be good.


Much like machine learning, Deep learning has a flourishing community and ecosystem. The applications and models that we discussed earlier in the article have been implemented or are being implemented in one programming language or another. Another thing we want to discuss in this Deep Learning guide is DL libraries:


TensorFlow is an open source library for numerical computations using data flow graphs, by none other than the Google Brain Team (it has been made open source after initial launch). 


  • TensorFlow supports distributed computing, particularly among multiple GPUs.
  • It has queues to manage the deep learning tasks in pipeline. 
  • Like UNIX systems, TensorFlow enforces an event-logging system. 
  • TensorFlow has TensorBoard which helps in mapping and visualizing a project’s progress. 


  • For most machine learning tasks, TensorFlow is overkill. 
  • Computation Graphics in TensorFlow consist entirely of Python Code, which makes it slow.


Keras is a high-level language deep learning library which works on top of Theano and TensorFlow among other deep learning frameworks. It provides an interface between the complex computing back end and the user. 


  • Modular , simple to use and efficient.
  • Simplifies framework architecture development by adding an abstraction layer between the language and the backend tool. 
  • Gradually becoming a must for neural networks development


  • Not enough support for R on Keras. 


Caffe is a modular deep learning framework, quite fast and easy to pick up! It is an open source library which provides support for Python, MATLAB and command line utilities as well. It is not a general purpose library as it is focussed on computer vision and is a pure C++/CUDA library. It switches smoothly between CPUs and GPUs and hence provides good user experience even at lower computing power hardware. 


  • Caffe is easily programmable due to its expressive architecture. 
  • Caffe is everywhere! It has an extensive active community which keep improving caffe and keeps adding new libraries to the source code. 
  • It is specifically designed for multimedia. The Caffe binaries take your .prototext files and train the network. Once Caffe is done, you can use the Caffe binaries, Python or MATLAB to start classifying images. 


  • Constructing an architecture inside prototext files can become complex and tiresome.
  • Tuning hyperparameters with Caffe is not an easy job programmatically. 
  • Not the best fit for recurrent networks
  • One needs to write C++/CUDA code for every new GNU layer when using Caffe.


Torch is based on an API written in a language Lua which supports machine learning algorithms. Torch is used by tech giants like Facebook which have dedicated teams to customize their deep learning platforms.

Generally used with Python as PyTorch, it offers you dynamic computation graphs which let you process variable length inputs and outputs. This feature helps in developing RNNs in particular. 


  • Modular code blocks which can be used in various combinations.
  • Makes it easy to write layers and run it on GPU.
  • Abundant pretrained models.


  • Poor documentation
  • Unlike Keras, plug and play is not a real option with Torch, you will be writing most of your code. 


Theano is a library which handles multidimensional arrays, like Numpy. In combination with other libraries, it performs well in data exploration and intended for research.

Theano has been the standard tool used for developing deep learning networks for quite some time now. Keras with Theano is among the most popular combinations for developing deep learning applications using python.


  • Computational Graphs and NumPy sit very well with Theano which makes development of RNNs easy.
  • Conjoins well with Keras using Python.
  • High level wrappers using Keras, Lasagne etc make development a breeze.


  • Single GPU support.
  • Causes bugs when running on AWS
  • The community is dying because the development on Theano has been ceased in Sept 2017.

To Do or not to do? 

Prerequisites:Linear Algebra, Statistics, Calculus, proficiency in Java, Python/R, intermediate to advanced competency in IntelliJ IDE and Maven, familiarity with Machine Learning and the know-how of deploying a model.

Deep Learning might be the buzzword of the day but just like every other technology, it demands a strong commitment and extensive skill set. 

To get started with Deep learning, you should not only be aware of mathematical concepts like Linear Algebra, Statistics, Calculus, probability; but also have decent programming skills in at least one of Java, Python or R programming languages. 

You should be comfortable using the IntelliJ IDE and Maven. Knowledge of Tensorflow will also come in handy. Along with it, machine learning concepts should be clear to you to make sense of deep learning algorithms. It is preferable if you have a good idea of how to deploy a model using deep learning algorithms.

Not everyone is good at maths or enjoys it. Some people like coding, some don’t. Before you plan on plunging in, do consider your reasons and your motivations. If you want to switch to DL but find any of the required skills hard, it is advisable to either overcome these hurdles or rethink your decision of switching.


In this beginner's guide to Deep Learning, we learned about basics, applications, and libraries of DL. Deep Learning is gaining importance by the minute and with the rising demand of AI, it’s only obvious why. The requirement and talent supply pool mismatch might be a favorable outcome for aspirants but the employers will still look for a skillset suiting the job. If your feel Deep Learning is for you, today is when you begin to level up your game! 


  1. (Retrieved on Jan 1, 2018)
  2. (Retrieved on January 1, 2018)
  3. (Convolutional Networks) (Retrieved on January 1, 2018)