添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to initialize a tensor on Google Colab with GPU enabled.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
t = torch.tensor([1,2], device=device)

But I am getting this strange error.

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Even by setting that environment variable to 1 seems not showing any further details.
Anyone ever had this issue?

While I tried your code, and it did not give me an error, I can say that usually the best practice to debug CUDA Runtime Errors: device-side assert like yours is to turn collab to CPU and recreate the error. It will give you a more useful traceback error.

Most of the time CUDA Runtime Errors can be the cause of some index mismatching so like you tried to train a network with 10 output nodes on a dataset with 15 labels. And the thing with this CUDA error is once you get this error once, you will recieve it for every operation you do with torch.tensors. This forces you to restart your notebook.

I suggest you restart your notebook, get a more accuracate traceback by moving to CPU, and check the rest of your code especially if you train a model on set of targets somewhere.

I receive this error, what is the problem? File "/home/tf/.virtualenvs/torch/lib/python3.6/site-packages/torch/nn/functional.py", line 2824, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. – fisakhan Oct 4, 2021 at 14:18 I am sorry to ressurrect this. but I am facing the same issue but when I run in CPU the error does not happen. Is there any other procedure I can try to find out what is happening? – OldMan Nov 30, 2022 at 14:09

Maybe, I mean in some cases

It is due to you forgetting to add a sigmoid activation before you send the logit to BCE Loss.

Hope it can help :P

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Community Apr 28, 2022 at 4:40

1st time:

Got the same error while using simpletransformers library to fine-tuning transformer-based model for multi-class classification problem. simpletransformers is a library written on the top of transformers library.

I changed my labels from string representations to numbers and it worked.

2nd time:

Face the same error again while training another transformer-based model with transformers library, for text classification. I had 4 labels in the dataset, named 0,1,2, and 3. But in the last layer (Linear Layer) of my model class, I had two neurons. nn.Linear(*, 2)* which I had to replace by nn.Linear(*, 4) because I had total four labels.

For example, I have a sentiment analysis problem with two labels, "Positive" and "Negative". I changed my labels from "Positive" to 1 and from "Negative" to 0, in my data. This is what I mean by "changing labels from string representation to numbers." – Shaida Muhammad Dec 12, 2021 at 4:30

I am a filthy casual coming from the VQGAN+Clip "ai-art" community. I get this error when I already have a session running on another tab. Killing all sessions from the session manager clears it up, and let's you connect with the new tab, which is nice if you have fiddled with a lot of settings you don't want to loose