Copyright © 2017-2023 ABBYY
[1]:
#@title
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Identity recurrent neural network (IRNN)
Download the tutorial as a Jupyter notebook
In this tutorial, we’ll demonstrate that an identity recurrent neural network (IRNN) can efficiently process long temporal sequences, reproducing one of the experiments described in the Identity RNN article.
The experiment tests the IRNN on the MNIST dataset, first transforming its 28 x 28 images into 784-pixel-long sequences. The article claims that IRNN can achieve 0.9+ accuracy in these conditions.
The tutorial includes the following steps:
Download and prepare the dataset
We will download the MNIST dataset from scikit-learn.
[2]:
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
Now we need to normalize it and convert to 32-bit datatypes for NeoML.
[3]:
import numpy as np
# Normalize
X = (255 - X) * 2 / 255 - 1
# Fix data types
X = X.astype(np.float32)
y = y.astype(np.int32)
Finally, we’ll split the data into subsets used for training and for testing.
[4]:
# Split into train/test
train_size = 60000
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
del X, y
Build the network
Choose the device
We need to create a math engine that will perform all calculations and allocate data for the neural network. The math engine is tied to the processing device.
In this tutorial we’ll use a single-threaded CPU math engine.
[5]:
import neoml
math_engine = neoml.MathEngine.CpuMathEngine()
Create the network and connect layers
Create a neoml.Dnn.Dnn
object that represents a neural network (a directed graph of layers). The network requires a math engine to perform its operations; it must be specified at creation and can’t be changed later.
[6]:
dnn = neoml.Dnn.Dnn(math_engine)
A neoml.Dnn.Source
layer feeds the data into the network.
[7]:
data = neoml.Dnn.Source(dnn, 'data') # source for data
Now we need to transpose this data into sequences of 784 pixels each. We can do that using the neoml.Dnn.Transpose
layer, which swaps 2 dimensions of the blob.
Original data will be wrapped into a 2-dimensional blob with BatchWidth
equal to batch size and Channels
equal to image size. (We’re creating blobs before training the network, see below.) This layer will transform it into sequences (BatchLength
) of image size, where each element of the sequence will be of size 1
.
[8]:
transpose = neoml.Dnn.Transpose(data, first_dim='batch_length',
second_dim='channels', name='transpose')
We add the neoml.Dnn.Irnn
layer, connecting its input to the output of the transposition layer.
[9]:
hidden_size = 100
irnn = neoml.Dnn.Irnn(transpose, hidden_size, identity_scale=1.,
input_weight_std=1e-3, name='irnn')
But recurrent layers in NeoML usually return whole sequences. To reproduce the experiment, we only need the last element of each. The neoml.Dnn.SubSequence
layer will help us here.
[10]:
subseq = neoml.Dnn.SubSequence(irnn, start_pos=-1,
length=1, name='subseq')
Now we use a fully-connected layer to form logits (non-normalized distribution) over MNIST classes.
[11]:
n_classes = 10
fc = neoml.Dnn.FullyConnected(subseq, n_classes, name='fc')
To train the network, we also need to define a loss function to be optimized. In this tutorial we’ll be optimizing cross-entropy loss.
A loss function needs to compare the network output with the correct labels, so we’ll add another source layer to pass the correct labels in.
[12]:
labels = neoml.Dnn.Source(dnn, 'labels') # Source for labels
loss = neoml.Dnn.CrossEntropyLoss((fc, labels), name='loss')
NeoML also provides a neoml.Dnn.Accuracy
layer to calculate network accuracy. Let’s connect this layer and create an additional neoml.Dnn.Sink
layer for extracting its output.
[13]:
# Auxilary layers in order to get statistics
accuracy = neoml.Dnn.Accuracy((fc, labels), name='accuracy')
# accuracy layers writes its result to its output
# We need additional sink layer to extract it
accuracy_sink = neoml.Dnn.Sink(accuracy, name='accuracy_sink')
Create a solver
Solver is an object that optimizes the weights using gradient values. It is necessary for training the network. In this sample we’ll use a neoml.Dnn.AdaptiveGradient
solver, which is the NeoML implementation of Adam.
[14]:
lr = 1e-6
# Create solver
dnn.solver = neoml.Dnn.AdaptiveGradient(math_engine, learning_rate=lr,
l1=0., l2=0., # no regularization
max_gradient_norm=1., # clip gradients
moment_decay_rate=0.9,
second_moment_decay_rate=0.999)
Train the network and evaluate the results
NeoML networks accept data only as neoml.Blob.Blob
.
Blobs are 7-dimensional arrays located in device memory. Each dimension has a specific purpose:
BatchLength
- temporal axis (used in recurrent layers)BatchWidth
- classic batchListSize
- list axis, used when objects are related to the same entity, but without ordering (unlikeBatchLength
)Height
- height of the imageWidth
- width of the imageDepth
- depth of the 3-dimensional imageChannels
- channels of the image (also used when object is a 1-dimensional vector)
We will use ndarray
to split data into batches, then create blobs from these batches right before feeding them into the network.
[15]:
def irnn_data_iterator(X, y, batch_size, math_engine):
"""Slices numpy arrays into batches and wraps them in blobs"""
def make_blob(data, math_engine):
"""Wraps numpy data into neoml blob"""
shape = data.shape
if len(shape) == 2: # data
# Wrap 2-D array into blob of (BatchWidth, Channels) shape
return neoml.Blob.asblob(math_engine, data,
(1, shape[0], 1, 1, 1, 1, shape[1]))
elif len(shape) == 1: # dense labels
# Wrap 1-D array into blob of (BatchWidth,) shape
return neoml.Blob.asblob(math_engine, data,
(1, shape[0], 1, 1, 1, 1, 1))
else:
assert(False)
start = 0
data_size = y.shape[0]
while start < data_size:
yield (make_blob(X[start : start+batch_size], math_engine),
make_blob(y[start : start+batch_size], math_engine))
start += batch_size
To train the network, call dnn.learn
with data as its argument.
To run the network without training, call dnn.run
with data as its argument.
The input data is a dict
where each key is a neoml.Dnn.Source
layer name and the corresponding value is the neoml.Blob.Blob
that should be passed in to this layer.
[16]:
def run_net(X, y, batch_size, dnn, is_train):
"""Runs dnn on given data"""
start = time.time()
total_loss = 0.
run_iter = dnn.learn if is_train else dnn.run
math_engine = dnn.math_engine
layers = dnn.layers
loss = layers['loss']
accuracy = layers['accuracy']
sink = layers['accuracy_sink']
accuracy.reset = True # Reset previous statistics
# Iterate over batches
for X_batch, y_batch in irnn_data_iterator(X, y, batch_size, math_engine):
# Run the network on the batch data
run_iter({'data': X_batch, 'labels': y_batch})
total_loss += loss.last_loss * y_batch.batch_width # Update epoch loss
accuracy.reset = False # Don't reset statistics within one epoch
avg_loss = total_loss / y.shape[0]
avg_acc = sink.get_blob().asarray()[0]
run_time = time.time() - start
return avg_loss, avg_acc, run_time
Note: It will take 3-4 hours to train. You may uncomment print statements to see the progress.
[17]:
%%time
import time
batch_size = 40
n_epoch = 200
for epoch in range(n_epoch):
# Train
train_loss, train_acc, run_time = run_net(X_train, y_train, batch_size,
dnn, is_train=True)
# print(f'Train #{epoch}\tLoss: {train_loss:.4f}\t'
# f'Accuracy: {train_acc:.4f}\tTime: {run_time:.2f} sec')
# Test
test_loss, test_acc, run_time = run_net(X_test, y_test, batch_size,
dnn, is_train=False)
# print(f'Test #{epoch}\tLoss: {test_loss:.4f}\t'
# f'Accuracy: {test_acc:.4f}\tTime: {run_time:.2f} sec')
print(f'Final test acc: {test_acc:.4f}')
Final test acc: 0.9050
Wall time: 3h 54min 34s
As we can see, this model actually has achieved 0.9+ accuracy on these long sequences, confirming the paper’s results.