用Pytorch实现循环神经网络RNN

2 年前 · 来自专栏深度学习笔记

拉筹伯大学哲学博士

这篇文章主要讲用pytorch实现基本的RNNs（Vanilla RNNs）、多层RNNs（Stacked RNNs）、双向RNNs（Bidirectional RNNs）和多层双向RNNs（Stacked Bidirectional RNNs）的Pytorch实现。重点关注输入、输出、隐层状态的维度和含义。

RNNs的种类

RNN主要用于处理时间序列数据、自然语言处理（NLP）等序列数据，根据输入输出所含时间序列的步长，RNNs大体可以分为以下几种。

对时间序列数据而言

预测任务——多对多/多对一
分类任务——多对一

对自然语言处理而言

文本分类：多对一
文本生成：多对多
机器翻译：多对多
命名实体识别：多对多
自动图像描述：一对多

多层RNNs（Stacked RNNs）的结构

多层RNNs一般用于提高性能。

双向RNNs（Bidirectional RNNs）的结构

双向RNNs使用了两个RNNs网络结构，输入序列按照正序输入其中一个RNNs，按照逆序输入另一个RNNs。

各种RNN的pytorch代码实例：

包导入Import Libraries

import torch
import torch.nn as nn

任务描述

时间序列数据预测。用5个时间步（Sequence Length = 5）的数据取预测接下来两个时间步的数据。

输入数据： [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) 
print("Data: ", data.shape, "\n\n", data)

运行结果：

###################### ######################
Data:
tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.])
Data Shape:
torch.Size([20])

将其切分为4个batch：

[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]]

Batch Size = 4
Sequence Length = 5
Input Size = 1 (因为，该例子数据就是1维的)

Hidden_Size=2 (隐藏层状态的特征维度)

INPUT_SIZE = 1
SEQ_LENGTH = 5
HIDDEN_SIZE = 2
NUM_LAYERS = 1 #单层RNN网络
BATCH_SIZE = 4

基本RNNs（Vanilla RNNs）的Pytorch实现

torch.nn.RNN 有两个输入：

input 即RNN网络的输入，维度应该为(seq_len, batch, input_size)。如果设置 batch_first=True ，输入维度则为 (batch, seq_len, input_size) 。
h_0 即RNN网络的初始隐藏层状态，维度应该为(num_layers * num_directions, batch, input_size)。num_layers表示堆叠的RNN网络的层数。对于双向RNNs而言 num_directions = 2，对于单向RNNs而言，num_directions= 1。

torch.nn.RNN 有两个输出：

out 即最后一层RNN网络在每一个时间步的输出，维度为 (seq_len, batch, num_directions * hidden_size) 。如果 batch_first=True 输出的维度为 (batch, seq_len, num_directions * hidden_size) .
h_n 即RNN网络所有层的最后一个时间步的隐藏层状态，维度为 (num_layers * num_directions, batch, hidden_size) . h_n 的维度不受 batch_first=True 的影响。

下图是一个LSTM网络前向传播的例子，batch size=1, LSTM含有（h,c）两个中间层状态，而RNN和GRU都只有一个中间层状态h。

强调一下：

out is the output of the RNN from all time steps from the last RNN layer .
h_n is the hidden value from the last time-step of all RNN layers .

# Initialize the RNN.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)
# input size : (batch, seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)

其中，输入、输出、隐层状态的维度分别为：

input shape = [4, 5, 1] ，4表示batch size, 5表示步长，即SEQ_LENGTH，1表示每个步长上数据特征的维度。
out shape = [4, 5, 2] ， 4表示batch size, 5表示步长，即SEQ_LENGTH，2表示隐层状态的维度，即隐层每一个步长上的数据维度。
h_n shape = [1, 4, 2] ，1表示最后一个步长， 4表示batch size,2表示隐层状态的维度，即隐层每一个步长上的数据维度。

双向RNNs的Pytorch实现

与基本RNN实现相比，双向RNN只需要在实例化时设置 bidirectional=True .

rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 1, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)

input shape = [4, 5, 1] ，4表示batch size, 5表示步长，即SEQ_LENGTH，1表示每个步长上数据特征的维度。
out shape = [4, 5, 4] ，4表示batch size, 5表示步长，即SEQ_LENGTH，4=2*2表示一共2个RNN方向（前向和后向）每个方向的隐层状态维度为2，即隐层每一个步长上的数据维度为2。
h_n shape = [2, 4, 2] ，2=2*1表示2个RNN方向（前向和后向），每个方向取最后1个步长， 4表示batch size,2表示隐层状态的维度，即隐层每一个步长上的数据维度。

双向RNN（BiRNN）两个方向上的模型结果分离

如下面的代码所示，可以分别将前向和后向的out, h_n的计算结果分离出来。在实现过程中一定要注意在实例化模型的时候是否有设置batch_first=True，注意保持各个维度含义的一致性。

#out 
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(1, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)

多层双向RNN的pytorch实现：Stacked Bidirectional RNN

只需要在实例化的时候设置 bidirectional=True 和 num_layers = 3即可。

rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 3, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)

input shape = [4, 5, 1] ，4表示batch size, 5表示步长，即SEQ_LENGTH，1表示每个步长上数据特征的维度。
out shape = [4, 5, 4] ，4表示batch size, 5表示步长，即SEQ_LENGTH，4=2*2表示最后一层BiRNN一共2个RNN方向（前向和后向）每个方向的隐层状态维度为2，即隐层每一个步长上的数据维度为2。
h_n shape = [6, 4, 2] ，6=3*2*1表示3层BiRNN网络，每层2个RNN方向（前向和后向），每个方向取最后1个步长， 4表示batch size,2表示隐层状态的维度，即隐层每一个步长上的数据维度。

多层双向RNN（BiRNN）模型结果分离

#out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(3, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)

全文完整代码

#%%
import torch
import torch.nn as nn
data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
INPUT_SIZE = 1
SEQ_LENGTH = 5
HIDDEN_SIZE = 2
NUM_LAYERS = 1 #单层RNN网络
BATCH_SIZE = 4
#%% RNN.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)
# input size : (batch, seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
print('input size:',inputs.shape)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('out size:',out.shape)
print('h_n size:',h_n.shape)
#%% BiRNN
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 1, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)
#%% BiRNN seperated out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(1, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)
#%% Stacked Bidirectional RNN
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 3, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
#%% Stacked BiRNN Separated out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(3, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)

完结，撒花！

英文原文链接：

补充：双向LSTM 如何取output和hidden_state各个方向上的最后一个step上的结果。

来源： pytorch 中LSTM模型获取最后一层的输出结果，单向或双向

import torch.nn as nn
import torch
seq_len = 20
batch_size = 64
embedding_dim = 100
num_embeddings = 300
hidden_size = 128
number_layer = 3
input = torch.randint(low=0,high=256,size=[batch_size,seq_len])  #[64,20]
embedding = nn.Embedding(num_embeddings,embedding_dim)
input_embeded = embedding(input)  #[64,20,100]
#转置，变换batch_size 和seq_len
# input_embeded = input_embeded.transpose(0,1)
# input_embeded = input_embeded.permute(1,0,2)
#实例化lstm
lstm = nn.LSTM(input_size=embedding_dim,hidden_size=hidden_size,batch_first=True,num_layers=number_layer,bidirectional=True)
output,(h_n,c_n) = lstm(input_embeded)
print(output.size()) #[64,20,128*2]       [batch_size,seq_len,hidden_size]
print(h_n.size()) #[3*2,64,128]           [number_layer,batch_size,hidden_size]
print(c_n.size()) #同上
#获取反向的最后一个output
output_last = output[:,0,-128:]
#获反向最后一层的h_n
h_n_last = h_n[-1]
print(output_last.size())
print(h_n_last.size())
# 反向最后的output等于最后一层的h_n
print(output_last.eq(h_n_last))