Currently, the top two platforms offering assistance in developing deep learning solutions are Theano and TensorFlow. However, even as these are powerful, these platforms can be pretty difficult to work with.
This is where Keras comes to our rescue. Today, we will discuss what is Keras, why is it important and what should you know about working with Keras. This guide to Keras is written for that.
We will complement the details with a demo project using an MNIST dataset. So, ladies and gentlemen, grab your mug of coffee and read on!
Content:
- What is Keras?
- Prerequisites for using Keras
- Keras with Python
- Keras with R
- Keras Vs. Others
- Basic Concepts: Quick Recap
- Machine Learning
- Deep Learning
- Neural Networks
- CNNs
- Where does Keras fit the picture?
- Step by Step Guide to Build a project in Keras
- Step 1: Set the Stage
- Step 2: Installing and Gearing Up Keras
- Step 3: Importing Libraries and Modules
- Step 4: Load image data from MNIST
- Step 5: Prepping the Data for Keras
- Step 6: Prepping up Class Labels for Keras
- Step 7: Defining Model Architecture
- Step 8: Compile Model
- Step 9: Fit Model on Training Data
- Step 10: Evaluate Model on Test Data
- Keras Cheat sheet
- Way Ahead
- Endnote
What is Keras?
Keras is a powerful library generally used most with Theano and TensorFlow. It provides a high-level neural networks API to develop and evaluate convolutional neural networks.
About:
This library is written in the simple programming language Python. MXNet, Theano, TensorFlow, Microsoft Cognitive Toolkit and DeepLearning4j are some of the major libraries and deep learning tools which work in conjunction with Keras.
Inception and Growth:
Keras was conceived to be an interface instead of being an end-to-end framework. It was developed as a part of the ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System) by a Google engineer François Chollet.
Microsoft has been developing a CNTK backend to Keras. This functionality is currently in beta release with CNTK v2.0.
What does it do?
Keras presents a higher level, more intuitive set of abstractions that make it easy to configure neural networks regardless of the backend scientific computing library.
In other words, it helps you build the architecture of your model with the direct structure calls instead of building the structures the ground up. You can mix and match these predefined code blocks and develop a solution particular to your requirements.
Also, Keras code is portable. This means that if you developed a part of your program using Theano as the back-end library, you could run the same code on TensorFlow by simply specifying TensorFlow as the new backend.
Prerequisites
To use Keras, you need to have at least intermediate level proficiency in machine learning concepts and Python/R.
- Keras and Python
Suitable for: Building analytic tools using deep learning networks
We have two viable versions of Python, 2.7 and 3. While 3 is the latest version, 2.7 has more resources (at the time of in form of support and compatible libraries. We recommend using Python 2.7 as most existing projects are using Python 2.7. While looking for demo projects to practice on, you’d be using Python 2.7.
If you have a certain project or vision of what you wish to build using Keras and Python, you should do good using Python 3. After release of Python 3, the v2.7 would not be receiving any updates.
Porting code from Python 2.7 to 3 is also an option for those who intend to use Keras with some already existing Python 2.7 code. However, the process has many pitfalls and would need an expert programmer to do it with minimal to no errors.
- Keras and R
Suitable for: Data Visualization using deep learning networks
With features like ggplot2, rCharts, and googleVis, R has been commended as a development language for deep learning networks for data visualization. However, there are only a few available tools that assist the development of deep learning solutions using R.
MXNET and TensorFlow tend to be the generally used tools in this context. While MXNET offers only limited functionality, TensorFlow code using R look wildly different than what a general R snippet for the same job looks like.
Keras on top of TensorFlow changes a lot of things though. Keras adds the much-needed layer of abstraction on top of TensorFlow and makes it easier to code using R.
Further, in the article, you will find the comparison of codes written using and without using Keras in TensorFlow for R. These snippets should help you in understanding the relevance of Keras further.
Keras and Others:
It rarely happens, even more so in the field of Deep learning that one library rises above all. Keras seems to have become this library.
With its modular approach, intuitive APIs and rapidly growing framework, Keras is almost there to become a given while developing neural networks. It has little competition if any.
To understand these differences more, read this blog.
Basic Concepts: Quick Recap
Before we begin our discussion, there are some concepts that you should get through first.
To work with Keras, you need to have a grip on concepts of machine learning and even more so, concepts of deep learning. Keras is designed to provide a user interface that makes coding easy. Using Keras without deeper understanding will, however, compromise the quality of your deep learning network.
What is Machine Learning
The field of study and research dealing with making a computer capable of learning, without human interference, is machine learning.
With the current growth spurt of Artificial Intelligence systems and the renewed interest in deep learning for real-AI, has caused a major increase in the demand for deep learning frameworks. These frameworks need to not only be efficient and accurate but also adaptable as the demands mature and increase.
Owing to this, a part of machine learning talent pool has started pursuing deep learning frameworks and the shebang. To know more about Machine Learning and the trends in future, click here.
What is Deep Learning?
Deep Learning can be understood as the family of special algorithms and development practices which aim to produce frameworks or networks so equipped to learn and adapt on their own and perform complex tasks which need human interference otherwise.
Just as much fun as this sounds, deep learning is also quite complex. To make this concept take a real shape, you need highly advanced and resource intensive neural networks, which can analyze data and learn from it.
For example, let's take a deep learning system which classifies images as dogs and wolves. It could have multiple layers of convolution neural networks (CNNs, explained further in the article) which process the images sequentially processing many complex features to classify an image.
- The first hidden layers might learn local edge patterns.
- Then, each subsequent layer (or filter) learns more complex representations.
- Finally, the last layer can classify the image as a dog or a wolf.
To be able of doing this, a system needs to ‘learn’.
Now, a machine can be trained in three ways:
- Supervised Learning
In this mode of learning, we feed labeled data to the machine along with a preset inference function and let the machine learn the patterns in data. The learning process doesn’t need human interference after feeding in the data but we need to perform regularization from time to time to ensure that the system doesn’t overfit/ underfit.
- Semi-Supervised Learning
In this mode of learning, we provide the system with a mix of labeled data and unlabeled data. Generally, the unlabeled data is in a higher percentage. The System then learns to generalize and classifies and labels data itself. Further, it can draw inferences and process unseen data based on these inferences. Overfitting is still an issue in this mode of learning but not as severe as it could be with supervised learning systems.
- Unsupervised learning
In this mode of learning, we provide the system with unlabeled data with labeled responses. This is different from the previous modes in the sense that the computer is given the response it is expected to reach at. The system needs to figure out the way on its own. It basically relies on cluster analysis methods to help the system learn.
Again, it's all nice and shiny but how do we create frameworks that learn? Using Neural Networks.
For those who haven’t had a chance to know neural networks, these networks are basically frameworks which copy how human brain cells, neurons, work. (A single brain cell is called a neuron).
Neural Networks:
Perceptron, this is what we call the simplest neural network. Quite like biological neurons, which have dendrites and axons, a perceptron is a simple tree structure which has input nodes and a single output node. This output node is connected to each input node. Here’s a visual comparison of the two:
Source: https://www.datacamp.com/community/tutorials/deep-learning-python
A perceptron only works with numerical data. This means that you would need to convert any nominal data into a numerical format to feed it to a perceptron.
Concept of feedforward neural networks
Multilayer perceptron networks are known as ‘feed-forward’ neural networks. As obvious from the name, these networks have layers which forward the input to the layer next to it.
The first layer to process data is called the ‘input layer’ and the layer which processes the data, in the end, summates results from all previous layers and displays the results, is called the ‘output layer’.
The intermediate or intermediary layers are not visible or accessible to a user outside the system.
Convolutional Neural Networks:
A convolutional neural network has seventeen or more layers and assumes the input to be images by default. These are feedforward neural networks.
By assuming every input to be an image, CNNs require less fine-tuning and a lower number of parameters need to be processed. Also, these parameters are highly input oriented.
To know more about deep learning frameworks and deep learning, read this. Here, you will find a comprehensive analysis of deep learning from an absolute beginner’s point of view.
Where does Keras fit in the picture?
Keras works as a bridge between your scientific and powerful development tools and frameworks like Theano, TensorFlow etc and the code you need to write. It is a library and it functions on top of these difficult to use and complex tools.
Step By Step Guide to Building a Project in Keras (Using Python)
- Step 1: set up the stage
In the first step, you will ready your system for using Keras. Make sure that your system has:
- Python 3 or Python 2.7 if you already have Python in your system and you want to keep using it in your projects.
- SciPy with NumPy
- Matplotlib (Optional, recommended for exploratory analysis)
- Theano or TensorFlow. We will use and prefer Theano because it is simpler. With TensorFlow you have to remodel your data a bit before feeding it to the system
We recommend installing Python, NumPy, SciPy, and matplotlib through the Anaconda Distribution. It comes with all of those packages bound into one.
To verify if everything is installed correctly, please follow these steps:
- Go to your command line prompt or the terminal on a Mac and type:
$python
- You will see the Python interpreter (in Output Text)
Python 3.1.1 |Anaconda 4.0.0 (x86_64)| (default, Jan 22 2018, 17:43:17)
Now you can start importing libraries and print their versions.
>>> import numpy >>> import theano >>> print numpy.__version__ 1.11.0 >>> print theano.__version__ 0.8.2 >>> quit()
- Step 2: Install Keras
If you are using Anaconda Distribution, you already have a nice package management system called pip installed.
To confirm that you have it installed, type $pip in your command line. It should output a list of commands and options. If you do not have pip, get it from here.
Once you have pip, installing Keras is quite easy.
$pip install keras
To verify the installation:
$ python -c "import keras; print keras.__version__" Using Theano backend. 1.0.4
Oh wait! This version looks like it is an older one! Let’s upgrade it then, right?
$ pip install --upgrade keras ... $ python -c "import keras; print keras.__version__" Using Theano backend. 1.1.1
Now, we will create a new file and save it as keras_cnn_example.py
- Step 3: Import Libraries and Modules
Now, we will be importing numpy and setting a seed for the computer's pseudorandom number generator. This lets us reproduce the results from our script:
import numpy as np np.random.seed(123) # for reproducibility
Next, we will import the Sequential model type from Keras.
A sequential type model is simply a linear stack of neural network layers, and it's perfect for the type of feed-forward CNN we're building today.
from keras.models import Sequential
Next, we will import the ‘core’ layers from Keras library which are the building blocks of almost every kind of neural network architectures.
from keras.layers import Dense, Dropout, Activation, Flatten
Further, we will import the CNN layers from Keras. These are the convolutional layers which help us train our system more efficiently with Keras.
from keras.layers import Convolution2D, MaxPooling2D
Finally, we will import some utilities. These will be useful when we will remodel our data.
from keras.utils import np_utils
We are well equipped to start building the neural architecture of our project now.
- Step 4: Load Image Data from MNIST
MNIST is a good dataset for getting started with deep learning and computer vision. It’s complex enough a job to require neural networks, but it's manageable on a single computer.
The Keras library includes the MNIST dataset itself. To load it:
from keras.datasets import mnist # Load pre-shuffled MNIST data into train and test sets (X_train, y_train), (X_test, y_test) = mnist.load_data()
If you want to look at the database’s shape,
print X_train.shape # (60000, 28, 28)
This information in the box above means that in our dataset we have 60,000 sample images which are 28 pixel by 28 pixel big. To confirm this, we will plot the first sample using maptolib.
from matplotlib import pyplot as plt plt.imshow(X_train[0])
The image output looks like this:
Why did we plot the data? Well, because in computer vision projects it is usually better to visually plot the data first of all. This works as a quick sanity check as it helps us prevent easily avoidable mistakes like misinterpreting the data dimensions.
- Step 5: Preprocess data for Keras
As we are using Theano as our backend, there is a little tweak we need to take care of.
In case of Theano, we have to declare a separate field to represent the image depth. For example, if we had an image based on RGB and has value for all three components, the depth field would be 3.
Our sample dataset here has images of image depth 1 (unicolor) but we will still need to define the image depth.
In other words, we will transform our dataset into (n, depth, width, height) from (n, width, height).
To perform this,
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28) X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
To confirm that we have effected the change, we will perform the following code:
print X_train.shape # (60000, 1, 28, 28)
Our final step here, is to convert our data type to float32 and normalize our data values to the range [0, 1].
X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 X_test /= 255
This resultant data is ready to be used for training our model.
- Step 6: Preprocess class labels for Keras
You will now look at the shape of class label data in our dataset.
print y_train.shape # (60000,)
Does this look problematic? Well, sure it does. This is because we have 1-dimensional array where we need ten different classes, one for each digit. Let’s consider the labels for the first 10 training samples:
print y_train[:10] # [5 0 4 1 9 2 1 3 1 4]
This right here is the problem The y_train and y_test data are not split into 10 distinct class labels, instead they are represented as a single array with the class values.
However, we can fix this by:
# Convert 1-dimensional class arrays to 10-dimensional class matrices Y_train = np_utils.to_categorical(y_train, 10) Y_test = np_utils.to_categorical(y_test, 10)
To ensure that this works, this is how you can check:
print Y_train.shape # (60000, 10)
Does this make you feel better? It sure makes us feel better.
- Step 7: Define Model Architecture
In real-life projects, defining the model architecture is a humongous task. To stay focused on the task at hand, we will discuss this another day. Today we will simply follow one of the recommended model architectures from research papers or academic modules. Here you will find some example and recommended model architectures.
For this tutorial, we will start by declaring a sequential model:
model = Sequential()
Next, we will declare an input layer for our target CNN:
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28)))
Shape 1 of the sample should be the input shape parameter. In this case, it's the same (1, 28, 28) that corresponds to the (depth, width, height) of each digit image.
The first three parameters correspond to
- the number of convolution filters to use
- the number of rows in each convolution kernel
- the number of columns in each convolution kernel, in order of appearance.
We can tune the step size using ‘subsample’. By default, the step size is (1,1).
To confirm this, we will be printing the shape of the current model output:
print model.output_shape # (None, 32, 26, 26)
Now, just like we build using legos, we can keep adding layers.
model.add(Convolution2D(32, 3, 3, activation='relu')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25))
The Dropout layer added in the last line of code in the box above is of particular interest because this layer is a method to regularize our model so as to avoid overfitting.
MaxPooling2D reduces the number of parameters in our model by sliding a 2x2 pooling filter across the previous layer and taking the max of the 4 values in the 2x2 filter.
So far, we have added we have added two convolutional layers for our model parameters. Now we will a fully connected layer and then the output layer.
model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax'))
The first parameter is the output size for a dense layer. Keras takes care of connections between layers.
Corresponding to the number of classes being ten, the final layer has an output size on ten.
Also, the weights from a convolutional layer must be made one dimensional (‘flattened’) before passing them to a fully connected dense layer.
Let’s look at the model architecture now:
model = Sequential() model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28))) model.add(Convolution2D(32, 3, 3, activation='relu')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax'))
Now ,we just need to add a loss function and the optimizer, then we will be prepared to train it.
- Step 8: Compile Model
When we compile the model we will add the loss function and the optimizer like Adam.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Keras offers a good variety of loss-functions and out-of-the-box optimizers which you can select based on your requirement.
- Step 9: Fit Model on Training Data
To begin training, we just need to declare the batch size and the number of epochs for which to train the system.
Then, we just need to pass in the data.
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1) # Epoch 1/10 # 7744/60000 [==>...........................] - ETA: 96s - loss: 0.5806 - acc: 0.8164
There are options of using ‘Callbacks’ to set early stopping rules, save model weights along the way, or log the history of each training epoch.
- Step 10: Evaluate model on Test Data
In this step, we will evaluate our model on the test data.
score = model.evaluate(X_test, Y_test, verbose=0)
Congratulations, you have just developed your first project in deep learning using Keras on top of Theano.
Here is the complete code that we have written today:
# 3. Import libraries and modules import numpy as np np.random.seed(123) # for reproducibility from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Convolution2D, MaxPooling2D from keras.utils import np_utils from keras.datasets import mnist # 4. Load pre-shuffled MNIST data into train and test sets (X_train, y_train), (X_test, y_test) = mnist.load_data() # 5. Preprocess input data X_train = X_train.reshape(X_train.shape[0], 1, 28, 28) X_test = X_test.reshape(X_test.shape[0], 1, 28, 28) X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 X_test /= 255 # 6. Preprocess class labels Y_train = np_utils.to_categorical(y_train, 10) Y_test = np_utils.to_categorical(y_test, 10) # 7. Define model architecture model = Sequential() model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28))) model.add(Convolution2D(32, 3, 3, activation='relu')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax')) # 8. Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # 9. Fit model on training data model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1) # 10. Evaluate model on test data score = model.evaluate(X_test, Y_test, verbose=0)
Keras Cheat Sheet (Using Python)
Following are code snippets to help your build your project in Keras. You can tweak the code as you go but these are apt for someone just beginning her work:
- A Basic Example:
>>> import numpy as np >>> from keras.models import Sequential >>> from keras.layers import Dense >>> data = np.random.random((1000,100)) >>> labels = np.random.randint(2,size=(1000,1)) >>> model = Sequential() >>> model.add(Dense(32, activation='relu', input_dim=100)) >>> model.add(Dense(1, activation='sigmoid')) >>> model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) >>> model.fit(data,labels,epochs=10,batch_size=32) >>> predictions = model.predict(data)
- Data
You would need to store your data as Numpy array or a list of Numpy list of arrays. Ideally, the data should be split in test set and training set, for this you can use the train_test_split module of sklearn.cross_validation
- Keras Datasets
>>> from keras.datasets import boston_housing, mnist, cifar10, imdb >>> (x_train,y_train),(x_test,y_test) = mnist.load_data() >>> (x_train2,y_train2),(x_test2,y_test2) = boston_housing.load_data() >>> (x_train3,y_train3),(x_test3,y_test3) = cifar10.load_data() >>> (x_train4,y_train4),(x_test4,y_test4) = imdb.load_data(num_words=20000) >>> num_classes = 10
- Others
>>> from urllib.request import urlopen >>> data = np.loadtxt(urlopen("http://archive.ics.uci.edu/ ml/machine-learning-databases/pima-indians-diabetes/ pima-indians-diabetes.data"),delimiter=",") >>> X = data[:,0:8] >>> y = data [:,8]
- Train and Test Sets
>>> from sklearn.model_selection import train_test_split >>> X_train5,X_test5,y_train5,y_test5 = train_test_split(X, y, test_size=0.33, random_state=42)
- Standardization/Normalization
>>> from sklearn.preprocessing import StandardScaler >>> scaler = StandardScaler().fit(x_train2) >>> standardized_X = scaler.transform(x_train2) >>> standardized_X_test = scaler.transform(x_test2)
- Preprocessing:
- Sequence Padding
>>> from keras.preprocessing import sequence >>> x_train4 = sequence.pad_sequences(x_train4,maxlen=80) >>> x_test4 = sequence.pad_sequences(x_test4,maxlen=80)
- One-Hot encoding
>>> from keras.utils import to_categorical >>> Y_train = to_categorical(y_train, num_classes) >>> Y_test = to_categorical(y_test, num_classes) >>> Y_train3 = to_categorical(y_train3, num_classes) >>> Y_test3 = to_categorical(y_test3, num_classes)
- Train and Test sets
>>> from sklearn.model_selection import train_test_split >>> X_train5,X_test5,y_train5,y_test5 = train_test_split(X, y, test_size=0.33, random_state=42)
- Standardization/Normalization
>>> from sklearn.preprocessing import StandardScaler >>> scaler = StandardScaler().fit(x_train2) >>> standardized_X = scaler.transform(x_train2) >>> standardized_X_test = scaler.transform(x_test2)
- Model Architecture
- Sequential Model
>>> from keras.models import Sequential >>> model = Sequential() >>> model2 = Sequential() >>> model3 = Sequential()
- Multi-layer Perceptron
- Binary Classification
>>> from keras.layers import Dense >>> model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu')) >>>model.add(Dense(8,kernel_initializer='uniform',activation='relu')) >>>model.add(Dense(1,kernel_initializer='uniform',activation='sigmoid'))
- Multi-class Classification
>>> from keras.layers import Dropout >>> model.add(Dense(512,activation='relu',input_shape=(784,))) >>> model.add(Dropout(0.2)) >>> model.add(Dense(512,activation='relu')) >>> model.add(Dropout(0.2)) >>> model.add(Dense(10,activation='softmax'))
- Regression
>>>model.add(Dense(64,activation='relu',input_dim=train_data.shape[1])) >>> model.add(Dense(1))
- Convolutional Networks
>>> from keras.layers import Activation,Conv2D,MaxPooling2D,Flatten >>>model2.add(Conv2D(32,(3,3),padding='same',input_shape=x_train.shape[1:])) >>> model2.add(Activation('relu')) >>> model2.add(Conv2D(32,(3,3))) >>> model2.add(Activation('relu')) >>> model2.add(MaxPooling2D(pool_size=(2,2))) >>> model2.add(Dropout(0.25)) >>> model2.add(Conv2D(64,(3,3), padding='same')) >>> model2.add(Activation('relu')) >>> model2.add(Conv2D(64,(3, 3))) >>> model2.add(Activation('relu')) >>> model2.add(MaxPooling2D(pool_size=(2,2))) >>> model2.add(Dropout(0.25)) >>> model2.add(Flatten()) >>> model2.add(Dense(512)) >>> model2.add(Activation('relu')) >>> model2.add(Dropout(0.5)) >>> model2.add(Dense(num_classes)) >>> model2.add(Activation('softmax'))
- Recurrent Neural Networks
>>> from keras.klayers import Embedding,LSTM >>> model3.add(Embedding(20000,128)) >>> model3.add(LSTM(128,dropout=0.2,recurrent_dropout=0.2)) >>> model3.add(Dense(1,activation='sigmoid'))
- Inspect Model
>>> model.output_shape #Model output shape >>> model.summary() #Model summary representation >>> model.get_config() #Model configuration >>> model.get_weights() #List all weight tensors in the model
- Compile Model
- MLP: Binary Classification
>>> model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
- MLP: Multi-Class Classification
>>> model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) MLP: Regression >>> model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
- Recurrent Neural Networks
>>> model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
- Model Training
>>> model3.fit(x_train4, y_train4, batch_size=32, epochs=15, verbose=1, validation_data=(x_test4,y_test4))
- Evaluate Your Model’s Performance
>>> score = model3.evaluate(x_test, y_test, batch_size=32)
- Prediction
>>> model3.predict(x_test4, batch_size=32) >>> model3.predict_classes(x_test4,batch_size=32
- Save-Reload Models
>>> from keras.models import load_model >>> model3.save('model_file.h5') >>> my_model = load_model('my_model.h5')
- Model Fine Tuning
- Optimization Parameters
>>> from keras.optimizers import RMSprop >>> opt = RMSprop(lr=0.0001, decay=1e-6) >>> model2.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
- Early Stopping
>>> from keras.callbacks import EarlyStopping >>> early_stopping_monitor = EarlyStopping(patience=2) >>> model3.fit(x_train4, y_train4, batch_size=32, epochs=15, validation_data=(x_test4,y_test4), callbacks=[early_stopping_monitor])
2. Using R
There are not many good tools available for developing deep learning solutions using R (MXNET also has some limitations. As discussed earlier, TensorFlow code using R looks very different than what a standard R code snippet looks like and will confuse you for a long time.
But with Keras, things change. Following is a comparison between code for training your model using R with TensorFlow and code for training your model using R with Keras on top of TensorFlow.
Code for Training Your Model
Code using R with TensorFlow
Code Using R with Keras on top of TensorFlow
cross_entropy <-tf$reduce_mean(-tf$reduce_sum(y_ * tf$log(y_conv), reduction_indices=1L)) train_step <- tf$train$AdamOptimizer(1e-4)$minimize(cross_entropy) correct_prediction <- tf$equal(tf$argmax(y_conv, 1L), tf$argmax(y_, 1L)) accuracy <- tf$reduce_mean(tf$cast(correct_prediction, tf$float32)) sess$run(tf$global_variables_initializer()) for (i in 1:20000) { batch <- mnist$train$next_batch(50L) if (i %% 100 == 0) { train_accuracy <- accuracy$eval(feed_dict = dict( x = batch[[1]], y_ = batch[[2]], keep_prob = 1.0)) cat(sprintf("step %d, training accuracy %g\n", i, train_accuracy)) } train_step$run(feed_dict = dict( x = batch[[1]], y_ = batch[[2]], keep_prob = 0.5)) } test_accuracy <- accuracy$eval(feed_dict = dict( x = mnist$test$images, y_ = mnist$test$labels, keep_prob = 1.0)) cat(sprintf("test accuracy %g", test_accuracy))
model_top %>% fit( x = train_x, y = train_y, epochs=epochs, batch_size=batch_size, validation_data=valid)
It doesn’t look like a hard decision to choose Keras on top of Tensorflow, right?
Following Code snippets show some basic function in developing deep networks which perform image recognition with R with Keras on top of TensorFlow:
- To load from a folder:
train_generator <- flow_images_from_directory(train_directory, generator = image_data_generator(), target_size = c(img_width, img_height), color_mode = "rgb", class_mode = "binary", batch_size = batch_size, shuffle = TRUE, seed = 123)
- To define a simple convolutional neural network:
model <- keras_model_sequential() model %>% layer_conv_2d(filter = 32, kernel_size = c(3,3), input_shape = c(img_width, img_height, 3)) %>% layer_activation("relu") %>% layer_max_pooling_2d(pool_size = c(2,2)) %>% layer_conv_2d(filter = 32, kernel_size = c(3,3)) %>% layer_activation("relu") %>% layer_max_pooling_2d(pool_size = c(2,2)) %>% layer_conv_2d(filter = 64, kernel_size = c(3,3)) %>% layer_activation("relu") %>% layer_max_pooling_2d(pool_size = c(2,2)) %>% layer_flatten() %>% layer_dense(64) %>% layer_activation("relu") %>% layer_dropout(0.5) %>% layer_dense(1) %>% layer_activation("sigmoid")
- To augment data:
augment <- image_data_generator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=TRUE)
- To load a pretrained network:
A pre-trained network could be a ready-made framework or a standard deployment. This option not only lets us use an existing model but also provides opportunity to improve upon an existing model without having to rewrite it from the scratch.
model_vgg <- application_vgg16(include_top = FALSE, weights = "imagenet")
- To save model weights:
Saving model weights lets us save the weights for last computation, this helps in finetuning the model once we start using it on tst data and later on real data.
save_model_weights_hdf5(model_ft, 'finetuning_30epochs_vggR.h5', overwrite = TRUE)
Here you will find the complete GitHub repo on image classification using R with Keras on top of TensorFlow. All the data you need and all the code is in one place at this link.
Endnote:
This article guide to Keras is aimed at familiarizing with Keras and its use in deep learning. It scratches only the surface of the colossal field. You would do good to use Keras extensively, because, let’s admit it, it’s easy, it’s effective and it gets the job done!
Happy Coding!