Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
This is my implementation:
class Net(BaseFeaturesExtractor):
def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 256):
super(Net, self).__init__(observation_space, features_dim)
n_input_channels = observation_space.shape[0]
print("Observation space shape:"+str(observation_space.shape))
print("Number of channels:" + str(n_input_channels))
self.cnn = nn.Sequential(
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Flatten(),
nn.Linear(in_features=128,out_features=64),
nn.ReLU(),
nn.Linear(in_features=64,out_features=7),
nn.Sigmoid()
def forward(self, observations: th.Tensor) -> th.Tensor:
print("Observation shape:"+str(observations[0].shape))
return self.cnn(observations)
When I tried to run the code which uses this CNN, I am getting following log:
Observation space shape:(3, 6, 7)
Number of channels:3
Observation shape:torch.Size([3, 6, 7])
Traceback (most recent call last): File "/Users/joe/Documents/JUPYTER/ConnectX/training3.py", line 250, in <module>
learner.learn(total_timesteps=iterations, callback=eval_callback)
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[4, 32, 6, 7] to have 3 channels, but got 32 channels instead
What is the problem here? How can I solve it?
in_channels
of a conv
layer should be equal to out_channels
of the previous layer. In your case, in_channels
of the 2nd and 3rd conv
layers don't have the correct values. They should be like below,
self.cnn = nn.Sequential(
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
Also, you should check in_features
of the 1st Linear
layer. It depends on the input shape and should be equal to last_conv_out_channels * last_conv_output_height * last_conv_output_width
.
For example, for an input=torch.randn(1, 3, 256, 256)
last conv
layer's output shape would be ([1, 32, 64, 64])
, in that case the 1st Linear
layer should be,
nn.Linear(in_features=32*64*64,out_features=64)
---- Update after the comment:
Output shape of a conv
layer is calculated through the formula here (see under "Shape:" section). Using input = torch.randn(1, 3, 256, 256)
as input to the network, here are outputs of each conv
layer (I skipped the ReLU
s since they don't change the shape),
conv1: (1, 3, 256, 256) -> (1, 32, 256, 256)
conv2: (1, 32, 256, 256) -> (1, 32, 128, 128)
conv3: (1, 32, 128, 128) -> (1, 32, 64, 64)
So how did last_conv_output_height
and last_conv_output_width
became 64 ? The last conv
layer is defined as follows,
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1)
Data is processed as (num_samples, num_channels, height, width)
in PyTorch and the default value for dilation
is stated as 1 in the conv2d doc. So, for the last conv
layer, H_in
is 128, padding[0]
is 1, dilation[0]
is 1, kernel_size[0]
is 3 and stride[0]
is 2. Therefore, height of its output becomes,
H_out = ⌊(128 + 2 * 1 - 1 * (3 - 1) - 1) / 2⌋ + 1
H_out = 64
Since square-size kernels and equal-size stride, padding and dilation are used, W_out
also becomes 64
for the last conv
layer.
I think the easiest way to compute in_features
for the 1st Linear
layer would be run the model for the desired size input until that layer. An example for your architecture,
inp = torch.randn(1, 3, 256, 256)
arch = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1)
outp = arch(inp)
print('outp.shape:', outp.shape)
This prints,
outp.shape: torch.Size([1, 32, 64, 64])
Finally, last_conv_out_channels
is out_channels
of the last conv
layer. The last conv
layer in your architecture is nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1)
. Here out_channels
is the 2nd parameter, so last_conv_out_channels
is 32.
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.