Cuda 10 on AWS: Nvidia GPU unclaimed after restart
I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw
tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.
Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.
For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
[EDIT]
Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.
drivers cuda amazon-ec2
add a comment |
I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw
tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.
Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.
For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
[EDIT]
Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.
drivers cuda amazon-ec2
add a comment |
I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw
tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.
Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.
For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
[EDIT]
Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.
drivers cuda amazon-ec2
I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw
tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.
Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.
For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
[EDIT]
Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.
drivers cuda amazon-ec2
drivers cuda amazon-ec2
edited Jan 15 at 14:56
Scipio
asked Jan 15 at 13:15
ScipioScipio
1215
1215
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1109947%2fcuda-10-on-aws-nvidia-gpu-unclaimed-after-restart%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1109947%2fcuda-10-on-aws-nvidia-gpu-unclaimed-after-restart%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown