[1]:

#@title
#
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#
# Unless required by applicable law or agreed to in writing, software
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and


# Identity recurrent neural network (IRNN)¶

In this tutorial, we’ll demonstrate that an identity recurrent neural network (IRNN) can efficiently process long temporal sequences, reproducing one of the experiments described in the Identity RNN article.

The experiment tests the IRNN on the MNIST dataset, first transforming its 28 x 28 images into 784-pixel-long sequences. The article claims that IRNN can achieve 0.9+ accuracy in these conditions.

The tutorial includes the following steps:

[2]:

from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)


Now we need to normalize it and convert to 32-bit datatypes for NeoML.

[3]:

import numpy as np

# Normalize
X = (255 - X) * 2 / 255 - 1

# Fix data types
X = X.astype(np.float32)
y = y.astype(np.int32)


Finally, we’ll split the data into subsets used for training and for testing.

[4]:

# Split into train/test
train_size = 60000
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
del X, y


## Build the network¶

### Choose the device¶

We need to create a math engine that will perform all calculations and allocate data for the neural network. The math engine is tied to the processing device.

In this tutorial we’ll use a single-threaded CPU math engine.

[5]:

import neoml

math_engine = neoml.MathEngine.CpuMathEngine(1)


### Create the network and connect layers¶

Create a neoml.Dnn.Dnn object that represents a neural network (a directed graph of layers). The network requires a math engine to perform its operations; it must be specified at creation and can’t be changed later.

[6]:

dnn = neoml.Dnn.Dnn(math_engine)


A neoml.Dnn.Source layer feeds the data into the network.

[7]:

data = neoml.Dnn.Source(dnn, 'data')  # source for data


Now we need to transpose this data into sequences of 784 pixels each. We can do that using the neoml.Dnn.Transpose layer, which swaps 2 dimensions of the blob.

Original data will be wrapped into a 2-dimensional blob with BatchWidth equal to batch size and Channels equal to image size. (We’re creating blobs before training the network, see below.) This layer will transform it into sequences (BatchLength) of image size, where each element of the sequence will be of size 1.

[8]:

transpose = neoml.Dnn.Transpose(data, first_dim='batch_length',
second_dim='channels', name='transpose')


We add the neoml.Dnn.Irnn layer, connecting its input to the output of the transposition layer.

[9]:

hidden_size = 100
irnn = neoml.Dnn.Irnn(transpose, hidden_size, identity_scale=1.,
input_weight_std=1e-3, name='irnn')


But recurrent layers in NeoML usually return whole sequences. To reproduce the experiment, we only need the last element of each. The neoml.Dnn.SubSequence layer will help us here.

[10]:

subseq = neoml.Dnn.SubSequence(irnn, start_pos=-1,
length=1, name='subseq')


Now we use a fully-connected layer to form logits (non-normalized distribution) over MNIST classes.

[11]:

n_classes = 10
fc = neoml.Dnn.FullyConnected(subseq, n_classes, name='fc')


To train the network, we also need to define a loss function to be optimized. In this tutorial we’ll be optimizing cross-entropy loss.

A loss function needs to compare the network output with the correct labels, so we’ll add another source layer to pass the correct labels in.

[12]:

labels = neoml.Dnn.Source(dnn, 'labels')  # Source for labels
loss = neoml.Dnn.CrossEntropyLoss((fc, labels), name='loss')


NeoML also provides a neoml.Dnn.Accuracy layer to calculate network accuracy. Let’s connect this layer and create an additional neoml.Dnn.Sink layer for extracting its output.

[13]:

# Auxilary layers in order to get statistics
accuracy = neoml.Dnn.Accuracy((fc, labels), name='accuracy')
# accuracy layers writes its result to its output
# We need additional sink layer to extract it
accuracy_sink = neoml.Dnn.Sink(accuracy, name='accuracy_sink')


### Create a solver¶

Solver is an object that optimizes the weights using gradient values. It is necessary for training the network. In this sample we’ll use a neoml.Dnn.AdaptiveGradient solver, which is the NeoML implementation of Adam.

[14]:

lr = 1e-6

# Create solver
l1=0., l2=0.,  # no regularization
moment_decay_rate=0.9,
second_moment_decay_rate=0.999)


## Train the network and evaluate the results¶

NeoML networks accept data only as neoml.Blob.Blob.

Blobs are 7-dimensional arrays located in device memory. Each dimension has a specific purpose:

1. BatchLength - temporal axis (used in recurrent layers)

2. BatchWidth - classic batch

3. ListSize - list axis, used when objects are related to the same entity, but without ordering (unlike BatchLength)

4. Height - height of the image

5. Width - width of the image

6. Depth - depth of the 3-dimensional image

7. Channels - channels of the image (also used when object is a 1-dimensional vector)

We will use ndarray to split data into batches, then create blobs from these batches right before feeding them into the network.

[15]:

def irnn_data_iterator(X, y, batch_size, math_engine):
"""Slices numpy arrays into batches and wraps them in blobs"""
def make_blob(data, math_engine):
"""Wraps numpy data into neoml blob"""
shape = data.shape
if len(shape) == 2:  # data
# Wrap 2-D array into blob of (BatchWidth, Channels) shape
return neoml.Blob.asblob(math_engine, data,
(1, shape[0], 1, 1, 1, 1, shape[1]))
elif len(shape) == 1:  # dense labels
# Wrap 1-D array into blob of (BatchWidth,) shape
return neoml.Blob.asblob(math_engine, data,
(1, shape[0], 1, 1, 1, 1, 1))
else:
assert(False)

start = 0
data_size = y.shape[0]
while start < data_size:
yield (make_blob(X[start : start+batch_size], math_engine),
make_blob(y[start : start+batch_size], math_engine))
start += batch_size


To train the network, call dnn.learn with data as its argument.

To run the network without training, call dnn.run with data as its argument.

The input data is a dict where each key is a neoml.Dnn.Source layer name and the corresponding value is the neoml.Blob.Blob that should be passed in to this layer.

[16]:

def run_net(X, y, batch_size, dnn, is_train):
"""Runs dnn on given data"""
start = time.time()
total_loss = 0.
run_iter = dnn.learn if is_train else dnn.run
math_engine = dnn.math_engine
layers = dnn.layers
loss = layers['loss']
accuracy = layers['accuracy']
sink = layers['accuracy_sink']

accuracy.reset = True  # Reset previous statistics
# Iterate over batches
for X_batch, y_batch in irnn_data_iterator(X, y, batch_size, math_engine):
# Run the network on the batch data
run_iter({'data': X_batch, 'labels': y_batch})
total_loss += loss.last_loss * y_batch.batch_width  # Update epoch loss
accuracy.reset = False  # Don't reset statistics within one epoch

avg_loss = total_loss / y.shape[0]
avg_acc = sink.get_blob().asarray()[0]
run_time = time.time() - start
return avg_loss, avg_acc, run_time


Note: It will take 3-4 hours to train. You may uncomment print statements to see the progress.

[17]:

%%time

import time

batch_size = 40
n_epoch = 200

for epoch in range(n_epoch):
# Train
train_loss, train_acc, run_time = run_net(X_train, y_train, batch_size,
dnn, is_train=True)
# print(f'Train #{epoch}\tLoss: {train_loss:.4f}\t'
#       f'Accuracy: {train_acc:.4f}\tTime: {run_time:.2f} sec')
# Test
test_loss, test_acc, run_time = run_net(X_test, y_test, batch_size,
dnn, is_train=False)
# print(f'Test  #{epoch}\tLoss: {test_loss:.4f}\t'
#       f'Accuracy: {test_acc:.4f}\tTime: {run_time:.2f} sec')
print(f'Final test acc: {test_acc:.4f}')

Final test acc: 0.9050
Wall time: 3h 54min 34s


As we can see, this model actually has achieved 0.9+ accuracy on these long sequences, confirming the paper’s results.