The concept of Deep Learning (DL) has had different interpretations in recent years. Often DL is simply used to refer to a specific subset of Artificial Neural Networks, or ANNs, a family of Machine Learning (ML) models that can be used for both classification and regression tasks. Specifically, it is used to denote ANNs with a large number of so-called hidden layers.
An ANN model consists of a set of connected units called neurons, where the output of each neuron is calculated by some nonlinear function, called the activation function, applied to the weighted sum of its inputs. Neuronal connections have associated weights or coefficients, so activations of different neurons may have greater impact than others. Neurons in one layer can connect to neurons in the preceding and following layers. The layer that receives external data is the input layer and the last layer, the one that produces the final result, is the output layer. Between these input and output layers there are zero or more hidden layers. When the number of these hidden layers is large, we speak of Deep Artificial Neural Networks. An example of this type of model is shown in the following image.
However, the name DL has also been used to refer to any type of Machine Learning model framework consisting of a training scheme containing several optimization layers, each of which affects the outcome of the final model. A case in point is Deep Belief Networks, a type of Machine Learning models that are used for unsupervised learning and are based on multiple layers, and which have significant differences from the standard ANN scheme described above.
Nevertheless, it is true that the link between DL and Deep Learning is strong and almost ubiquitous today. This has probably been influenced by several factors, including the fact that the NNA scheme is almost perfectly adapted to the concept of Deep Learning, and that some of the early pioneering developments in DL indeed correspond to this type of structures.
With the above in mind, we should be clear that, although it is often considered a separate field, Deep Learning is neither more nor less than another family of Machine Learning models. However, it is a family of models with some extremely relevant properties, highlighting the following two:
With the rise of Deep Learning frameworks it has been shown that these DL models are able to achieve superior performance than classical models such as SVM when trained on data sets that are sufficiently large. This fact is probably the main factor why this family of models is becoming the preferred choice for solving a wide variety of supervised learning tasks.
However, due to the specific nature of Deep Learning frameworks, which consist of several layers that perform intermediate tasks necessary to solve ML problems, these preprocessing steps may no longer be necessary when applying DL models, since it is the initial layers that will perform this process of extracting relevant patterns to be employed by the subsequent layers. Simplistically, the concept of depth in these structures is based on the fact that the model structure should consist of multiple layers representing the level of abstraction, and each layer should be adapted to the model training. Features from lower levels or layers of the model should be progressively combined to form higher level features in subsequent layers. This property is often referred to as End-to-End learning and allows researchers and data scientists to avoid complex and time-consuming steps that were previously necessary and usually required the help of human experts in the field corresponding to the task at hand.
An illustrative example can be found in speech recognition, where the goal is to take an input, such as an audio clip, and map it to an output, which would be a transcription of the audio clip. Traditionally, speech recognition required more than one stage of processing. First, some features had to be extracted from the audio with preprocessing methods, such as Mel Frequency Ceptral Coeficients or MFCCs, which are coeficients for speech representation based on human auditory perception. Then, once some low-level features were extracted, a Machine Learning algorithm could be applied to find, for example, the basic phonemes of the sound in the audio clip.
When using DL frameworks, this multi-stage process can be directly replaced by training a Deep Neural Network, allowing the audio clip to be input and the transcript to be directly obtained as output.
These and other special features have made Deep Learning models one of the most popular techniques in recent years within the field of Machine Learning and, in general, Artificial Intelligence. In particular, this type of models have clearly become the state of the art in problems involving unstructured data, i.e. data that are not represented in the form of tables or matrix structures, such as image recognition or natural language processing. In particular, DL has different structures proposed according to the type of problem to be solved. When working with structured data, the standard option is to use a Fully Connected Neural Network, FCNN. However, there are different DL structures specialized in working with different types of unstructured data:
- Convolutional Neural Networks, or CNNs: They were first introduced in the 1980s by Yann LeCun, a postdoctoral researcher in computer science. These models are specially designed for optimal performance on spatially structured data, such as images. Variations of these structures exist for use on video.
- Recurrent Neural Networks, or RNN: Recurrent Neural Networks were based on the work of David Rumelhart in 1986. The famous Long Short-Term Memory architecture, LSTM, which has been for many years the state of the art regarding this type of structures, was invented in 1997. These DL models are specialized for temporal data, such as numerical time series or NLP tasks, such as translation from one language to another. They can be used in combination with a CNN structure for use on videos.
- Transformers: they were introduced in 2017 by a Google Brain team and are increasingly the model of choice for NLP problems, replacing RNN models. Moreover, although they were not initially designed for this purpose, it has been observed that these structures can be adapted to be employed on images, where they have shown great potential, even surpassing the results obtained by CNNs, the state of the art up to this point. However, their ability to replace CNNs as the model of choice in this type of task still requires further investigation.
In summary, the enormous amount of unstructured data already available and expected to increase in the future, with particular importance in the healthcare environment, means that Deep Learning is the optimal solution for many Machine Learning tasks. In addition, the computational power, especially through cloud computing, provides the platform for rapid training of deep learning models.
If you are interested in learning more about the projects developed by Horus ML using Deep Learning techniques, do not hesitate to contact us.