[1]:

#@title
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Identity recurrent neural network (IRNN)

Download the tutorial as a Jupyter notebook

In this tutorial, we’ll demonstrate that an identity recurrent neural network (IRNN) can efficiently process long temporal sequences, reproducing one of the experiments described in the Identity RNN article.

The experiment tests the IRNN on the MNIST dataset, first transforming its 28 x 28 images into 784-pixel-long sequences. The article claims that IRNN can achieve 0.9+ accuracy in these conditions.

The tutorial includes the following steps:

Download and prepare the dataset
Build the network
Train the network and evaluate the results

Download and prepare the dataset

We will download the MNIST dataset from scikit-learn.

[2]:

from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)

Now we need to normalize it and convert to 32-bit datatypes for NeoML.

[3]:

import numpy as np

# Normalize
X = (255 - X) * 2 / 255 - 1

# Fix data types
X = X.astype(np.float32)
y = y.astype(np.int32)

Finally, we’ll split the data into subsets used for training and for testing.

[4]:

# Split into train/test
train_size = 60000
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
del X, y

Build the network

Choose the device

We need to create a math engine that will perform all calculations and allocate data for the neural network. The math engine is tied to the processing device.

In this tutorial we’ll use a single-threaded CPU math engine.

[5]:

import neoml

math_engine = neoml.MathEngine.CpuMathEngine()

Create the network and connect layers

Create a neoml.Dnn.Dnn object that represents a neural network (a directed graph of layers). The network requires a math engine to perform its operations; it must be specified at creation and can’t be changed later.

[6]:

dnn = neoml.Dnn.Dnn(math_engine)

A neoml.Dnn.Source layer feeds the data into the network.

[7]:

data = neoml.Dnn.Source(dnn, 'data')  # source for data

Now we need to transpose this data into sequences of 784 pixels each. We can do that using the neoml.Dnn.Transpose layer, which swaps 2 dimensions of the blob.

Original data will be wrapped into a 2-dimensional blob with BatchWidth equal to batch size and Channels equal to image size. (We’re creating blobs before training the network, see below.) This layer will transform it into sequences (BatchLength) of image size, where each element of the sequence will be of size 1.

[8]:

transpose = neoml.Dnn.Transpose(data, first_dim='batch_length',
                                second_dim='channels', name='transpose')

We add the neoml.Dnn.Irnn layer, connecting its input to the output of the transposition layer.

[9]:

hidden_size = 100
irnn = neoml.Dnn.Irnn(transpose, hidden_size, identity_scale=1.,
                      input_weight_std=1e-3, name='irnn')

But recurrent layers in NeoML usually return whole sequences. To reproduce the experiment, we only need the last element of each. The neoml.Dnn.SubSequence layer will help us here.

[10]:

subseq = neoml.Dnn.SubSequence(irnn, start_pos=-1,
                               length=1, name='subseq')

Now we use a fully-connected layer to form logits (non-normalized distribution) over MNIST classes.

[11]:

n_classes = 10
fc = neoml.Dnn.FullyConnected(subseq, n_classes, name='fc')

To train the network, we also need to define a loss function to be optimized. In this tutorial we’ll be optimizing cross-entropy loss.

A loss function needs to compare the network output with the correct labels, so we’ll add another source layer to pass the correct labels in.

[12]:

labels = neoml.Dnn.Source(dnn, 'labels')  # Source for labels
loss = neoml.Dnn.CrossEntropyLoss((fc, labels), name='loss')

NeoML also provides a neoml.Dnn.Accuracy layer to calculate network accuracy. Let’s connect this layer and create an additional neoml.Dnn.Sink layer for extracting its output.

[13]:

# Auxilary layers in order to get statistics
accuracy = neoml.Dnn.Accuracy((fc, labels), name='accuracy')
# accuracy layers writes its result to its output
# We need additional sink layer to extract it
accuracy_sink = neoml.Dnn.Sink(accuracy, name='accuracy_sink')

Create a solver

Solver is an object that optimizes the weights using gradient values. It is necessary for training the network. In this sample we’ll use a neoml.Dnn.AdaptiveGradient solver, which is the NeoML implementation of Adam.

[14]:

lr = 1e-6

# Create solver
dnn.solver = neoml.Dnn.AdaptiveGradient(math_engine, learning_rate=lr,
                                           l1=0., l2=0.,  # no regularization
                                           max_gradient_norm=1.,  # clip gradients
                                           moment_decay_rate=0.9,
                                           second_moment_decay_rate=0.999)

Train the network and evaluate the results

NeoML networks accept data only as neoml.Blob.Blob.

Blobs are 7-dimensional arrays located in device memory. Each dimension has a specific purpose:

BatchLength - temporal axis (used in recurrent layers)
BatchWidth - classic batch
ListSize - list axis, used when objects are related to the same entity, but without ordering (unlike BatchLength)
Height - height of the image
Width - width of the image
Depth - depth of the 3-dimensional image
Channels - channels of the image (also used when object is a 1-dimensional vector)

We will use ndarray to split data into batches, then create blobs from these batches right before feeding them into the network.

[15]:

def irnn_data_iterator(X, y, batch_size, math_engine):
    """Slices numpy arrays into batches and wraps them in blobs"""
    def make_blob(data, math_engine):
        """Wraps numpy data into neoml blob"""
        shape = data.shape
        if len(shape) == 2:  # data
            # Wrap 2-D array into blob of (BatchWidth, Channels) shape
            return neoml.Blob.asblob(math_engine, data,
                                     (1, shape[0], 1, 1, 1, 1, shape[1]))
        elif len(shape) == 1:  # dense labels
            # Wrap 1-D array into blob of (BatchWidth,) shape
            return neoml.Blob.asblob(math_engine, data,
                                     (1, shape[0], 1, 1, 1, 1, 1))
        else:
            assert(False)

    start = 0
    data_size = y.shape[0]
    while start < data_size:
        yield (make_blob(X[start : start+batch_size], math_engine),
               make_blob(y[start : start+batch_size], math_engine))
        start += batch_size

To train the network, call dnn.learn with data as its argument.

To run the network without training, call dnn.run with data as its argument.

The input data is a dict where each key is a neoml.Dnn.Source layer name and the corresponding value is the neoml.Blob.Blob that should be passed in to this layer.

[16]:

def run_net(X, y, batch_size, dnn, is_train):
    """Runs dnn on given data"""
    start = time.time()
    total_loss = 0.
    run_iter = dnn.learn if is_train else dnn.run
    math_engine = dnn.math_engine
    layers = dnn.layers
    loss = layers['loss']
    accuracy = layers['accuracy']
    sink = layers['accuracy_sink']

    accuracy.reset = True  # Reset previous statistics
    # Iterate over batches
    for X_batch, y_batch in irnn_data_iterator(X, y, batch_size, math_engine):
        # Run the network on the batch data
        run_iter({'data': X_batch, 'labels': y_batch})
        total_loss += loss.last_loss * y_batch.batch_width  # Update epoch loss
        accuracy.reset = False  # Don't reset statistics within one epoch

    avg_loss = total_loss / y.shape[0]
    avg_acc = sink.get_blob().asarray()[0]
    run_time = time.time() - start
    return avg_loss, avg_acc, run_time

Note: It will take 3-4 hours to train. You may uncomment print statements to see the progress.

[17]:

%%time

import time

batch_size = 40
n_epoch = 200

for epoch in range(n_epoch):
    # Train
    train_loss, train_acc, run_time = run_net(X_train, y_train, batch_size,
                                      dnn, is_train=True)
    # print(f'Train #{epoch}\tLoss: {train_loss:.4f}\t'
    #       f'Accuracy: {train_acc:.4f}\tTime: {run_time:.2f} sec')
    # Test
    test_loss, test_acc, run_time = run_net(X_test, y_test, batch_size,
                                      dnn, is_train=False)
    # print(f'Test  #{epoch}\tLoss: {test_loss:.4f}\t'
    #       f'Accuracy: {test_acc:.4f}\tTime: {run_time:.2f} sec')
print(f'Final test acc: {test_acc:.4f}')

Final test acc: 0.9050
Wall time: 3h 54min 34s

As we can see, this model actually has achieved 0.9+ accuracy on these long sequences, confirming the paper’s results.