用Pytorch实现循环神经网络RNN
这篇文章主要讲用pytorch实现基本的RNNs(Vanilla RNNs)、 多层RNNs(Stacked RNNs)、双向RNNs(Bidirectional RNNs)和多层双向RNNs(Stacked Bidirectional RNNs)的Pytorch实现。重点关注输入、输出、隐层状态的维度和含义。
RNNs的种类
RNN主要用于处理时间序列数据、自然语言处理(NLP)等序列数据,根据输入输出所含时间序列的步长,RNNs大体可以分为以下几种。
- 对时间序列数据而言
- 预测任务——多对多/多对一
- 分类任务——多对一
- 对自然语言处理而言
- 文本分类:多对一
- 文本生成:多对多
- 机器翻译:多对多
- 命名实体识别:多对多
- 自动图像描述:一对多
多层RNNs(Stacked RNNs)的结构
多层RNNs一般用于提高性能。
双向RNNs(Bidirectional RNNs)的结构
双向RNNs使用了两个RNNs网络结构,输入序列按照正序输入其中一个RNNs,按照逆序输入另一个RNNs。
各种RNN的pytorch代码实例:
包导入Import Libraries
import torch
import torch.nn as nn
任务描述
时间序列数据预测。用5个时间步(Sequence Length = 5)的数据取预测接下来两个时间步的数据。
输入数据: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
print("Data: ", data.shape, "\n\n", data)
运行结果:
###################### ######################
Data:
tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.])
Data Shape:
torch.Size([20])
将其切分为4个batch:
[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]]
Batch Size = 4
Sequence Length = 5
Input Size = 1 (因为,该例子数据就是1维的)
Hidden_Size=2 (隐藏层状态的特征维度)
INPUT_SIZE = 1
SEQ_LENGTH = 5
HIDDEN_SIZE = 2
NUM_LAYERS = 1 #单层RNN网络
BATCH_SIZE = 4
基本RNNs(Vanilla RNNs)的Pytorch实现
torch.nn.RNN 有两个输入:
-
input 即RNN网络的输入,维度应该为(seq_len, batch, input_size)。如果设置
batch_first=True
,输入维度则为(batch, seq_len, input_size)
。 -
h_0 即RNN网络的初始隐藏层状态,维度应该为(num_layers * num_directions, batch, input_size)。num_layers表示堆叠的RNN网络的层数。对于双向RNNs而言
num_directions
= 2,对于单向RNNs而言,num_directions= 1。
torch.nn.RNN
有两个输出:
-
out
即最后一层RNN网络在每一个时间步的输出,维度为(seq_len, batch, num_directions * hidden_size)
。如果batch_first=True
输出的维度为(batch, seq_len, num_directions * hidden_size)
. -
h_n 即RNN网络所有层的最后一个时间步的隐藏层状态,维度为
(num_layers * num_directions, batch, hidden_size)
.h_n
的维度不受batch_first=True
的影响。
下图是一个LSTM网络前向传播的例子,batch size=1, LSTM含有(h,c)两个中间层状态,而RNN和GRU都只有一个中间层状态h。
强调一下:
out
is the output of the RNN from all time steps from the last RNN layer .
h_n
is the hidden value from the last time-step of all RNN layers .
# Initialize the RNN.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)
# input size : (batch, seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)
其中,输入、输出、隐层状态的维度分别为:
input
shape =[4, 5, 1]
,4表示batch size, 5表示步长,即SEQ_LENGTH,1表示每个步长上数据特征的维度。
out
shape =[4, 5, 2]
, 4表示batch size, 5表示步长,即SEQ_LENGTH,2表示隐层状态的维度,即隐层每一个步长上的数据维度。
h_n
shape =[1, 4, 2]
,1表示最后一个步长, 4表示batch size,2表示隐层状态的维度,即隐层每一个步长上的数据维度。
双向RNNs的Pytorch实现
与基本RNN实现相比,双向RNN只需要在实例化时设置
bidirectional=True
.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 1, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)
input
shape =[4, 5, 1]
,4表示batch size, 5表示步长,即SEQ_LENGTH,1表示每个步长上数据特征的维度。
out
shape =[4, 5, 4]
,4表示batch size, 5表示步长,即SEQ_LENGTH,4=2*2表示一共2个RNN方向(前向和后向)每个方向的隐层状态维度为2,即隐层每一个步长上的数据维度为2。
h_n
shape =[2, 4, 2]
,2=2*1表示2个RNN方向(前向和后向),每个方向取最后1个步长, 4表示batch size,2表示隐层状态的维度,即隐层每一个步长上的数据维度。
双向RNN(BiRNN)两个方向上的模型结果分离
如下面的代码所示,可以分别将前向和后向的out, h_n的计算结果分离出来。在实现过程中一定要注意在实例化模型的时候是否有设置batch_first=True,注意保持各个维度含义的一致性。
#out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(1, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)
多层双向RNN的pytorch实现:Stacked Bidirectional RNN
只需要在实例化的时候设置
bidirectional=True
和
num_layers
= 3即可。
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 3, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
input
shape =[4, 5, 1]
,4表示batch size, 5表示步长,即SEQ_LENGTH,1表示每个步长上数据特征的维度。
out
shape =[4, 5, 4]
,4表示batch size, 5表示步长,即SEQ_LENGTH,4=2*2表示最后一层BiRNN一共2个RNN方向(前向和后向)每个方向的隐层状态维度为2,即隐层每一个步长上的数据维度为2。
h_n
shape =[6, 4, 2]
,6=3*2*1表示3层BiRNN网络,每层2个RNN方向(前向和后向),每个方向取最后1个步长, 4表示batch size,2表示隐层状态的维度,即隐层每一个步长上的数据维度。
多层双向RNN(BiRNN)模型结果分离
如下面的代码所示,可以分别将前向和后向的out, h_n的计算结果分离出来。在实现过程中一定要注意在实例化模型的时候是否有设置batch_first=True,注意保持各个维度含义的一致性。
#out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(3, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)
全文完整代码
#%%
import torch
import torch.nn as nn
data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
INPUT_SIZE = 1
SEQ_LENGTH = 5
HIDDEN_SIZE = 2
NUM_LAYERS = 1 #单层RNN网络
BATCH_SIZE = 4
#%% RNN.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)
# input size : (batch, seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
print('input size:',inputs.shape)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('out size:',out.shape)
print('h_n size:',h_n.shape)
#%% BiRNN
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 1, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)
#%% BiRNN seperated out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(1, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)
#%% Stacked Bidirectional RNN
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 3, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
#%% Stacked BiRNN Separated out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(3, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)
完结,撒花!
英文原文链接:
补充:双向LSTM 如何取output和hidden_state各个方向上的最后一个step上的结果。
来源: pytorch 中LSTM模型获取最后一层的输出结果,单向或双向
import torch.nn as nn
import torch
seq_len = 20
batch_size = 64
embedding_dim = 100
num_embeddings = 300
hidden_size = 128
number_layer = 3
input = torch.randint(low=0,high=256,size=[batch_size,seq_len]) #[64,20]
embedding = nn.Embedding(num_embeddings,embedding_dim)
input_embeded = embedding(input) #[64,20,100]
#转置,变换batch_size 和seq_len
# input_embeded = input_embeded.transpose(0,1)
# input_embeded = input_embeded.permute(1,0,2)
#实例化lstm
lstm = nn.LSTM(input_size=embedding_dim,hidden_size=hidden_size,batch_first=True,num_layers=number_layer,bidirectional=True)
output,(h_n,c_n) = lstm(input_embeded)
print(output.size()) #[64,20,128*2] [batch_size,seq_len,hidden_size]
print(h_n.size()) #[3*2,64,128] [number_layer,batch_size,hidden_size]
print(c_n.size()) #同上
#获取反向的最后一个output
output_last = output[:,0,-128:]
#获反向最后一层的h_n
h_n_last = h_n[-1]
print(output_last.size())
print(h_n_last.size())
# 反向最后的output等于最后一层的h_n
print(output_last.eq(h_n_last))