Cuda 10 on AWS: Nvidia GPU unclaimed after restart












0















I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.



Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.



For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):



wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run

tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*


[EDIT]



Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.










share|improve this question





























    0















    I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.



    Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.



    For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):



    wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
    mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
    sudo sh cuda_10.0.130_410.48_linux.run

    tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
    sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
    sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*


    [EDIT]



    Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.










    share|improve this question



























      0












      0








      0








      I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.



      Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.



      For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):



      wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
      mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
      sudo sh cuda_10.0.130_410.48_linux.run

      tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
      sudo cp cuda/include/cudnn.h /usr/local/cuda/include
      sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
      sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*


      [EDIT]



      Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.










      share|improve this question
















      I have installed CUDA 10.0 and the associated version of cuDNN on AWS (p2 and p3 instances), running with Ubuntu 16 (I use the AWS Deep Learning image, which comes with multiple older versions of CUDA - I'm not uninstalling anything). Everything works fine, but if I shutdown the instance and come back a day later, the drivers are not loaded properly: lshw tells me the GPUs are "UNCLAIMED". I can resolve the issue by reinstalling CUDA 10.0 and cuDNN, but doing that every time would be very annoying.



      Clearly, I'm doing something wrong during installation. But I'm a bit clueless as to how this should be done properly. The AWS website provides instructions here, but I couldn't really make sense of it and copy/pasting ran into all sorts of problems, so I ended up following Nvidias instructions.



      For installation, I'm running the following (I download the cudnn tar manually as it is behind a password wall):



      wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
      mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
      sudo sh cuda_10.0.130_410.48_linux.run

      tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
      sudo cp cuda/include/cudnn.h /usr/local/cuda/include
      sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
      sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*


      [EDIT]



      Strangely enough, the problem only occurs when I leave the instance down for a long time (~several days); if I restart the instance within a few hours, the drivers load fine.







      drivers cuda amazon-ec2






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 15 at 14:56







      Scipio

















      asked Jan 15 at 13:15









      ScipioScipio

      1215




      1215






















          0






          active

          oldest

          votes











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "89"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1109947%2fcuda-10-on-aws-nvidia-gpu-unclaimed-after-restart%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Ask Ubuntu!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1109947%2fcuda-10-on-aws-nvidia-gpu-unclaimed-after-restart%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How did Captain America manage to do this?

          迪纳利

          南乌拉尔铁路局