lspci returns “Cannot open /sys/bus/pci/devices/xxxxx/resource: No such file or directory”












1















My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.



The nvidia-smi application freezes.



The command lspci yields



lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory


And of course, that path (no longer?) exists. What does exist is,



$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/


Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.



This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.



Anyone have any ideas?










share|improve this question

























  • I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.

    – RasmusW
    Apr 30 '17 at 19:38











  • In /sys/bus/pci/devices/ I have a device called 2f36c0b8:00:00.0. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci stops working after apt-get dist-upgrade and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade.

    – RasmusW
    Apr 30 '17 at 20:53











  • @RasmusW, check out Chris' answer below. This fixed it for me.

    – larslovlie
    May 2 '17 at 7:51
















1















My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.



The nvidia-smi application freezes.



The command lspci yields



lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory


And of course, that path (no longer?) exists. What does exist is,



$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/


Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.



This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.



Anyone have any ideas?










share|improve this question

























  • I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.

    – RasmusW
    Apr 30 '17 at 19:38











  • In /sys/bus/pci/devices/ I have a device called 2f36c0b8:00:00.0. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci stops working after apt-get dist-upgrade and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade.

    – RasmusW
    Apr 30 '17 at 20:53











  • @RasmusW, check out Chris' answer below. This fixed it for me.

    – larslovlie
    May 2 '17 at 7:51














1












1








1


2






My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.



The nvidia-smi application freezes.



The command lspci yields



lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory


And of course, that path (no longer?) exists. What does exist is,



$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/


Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.



This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.



Anyone have any ideas?










share|improve this question
















My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.



The nvidia-smi application freezes.



The command lspci yields



lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory


And of course, that path (no longer?) exists. What does exist is,



$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/


Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.



This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.



Anyone have any ideas?







nvidia gpu-drivers lspci






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 2 '17 at 7:50







larslovlie

















asked Apr 28 '17 at 17:49









larslovlielarslovlie

814




814













  • I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.

    – RasmusW
    Apr 30 '17 at 19:38











  • In /sys/bus/pci/devices/ I have a device called 2f36c0b8:00:00.0. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci stops working after apt-get dist-upgrade and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade.

    – RasmusW
    Apr 30 '17 at 20:53











  • @RasmusW, check out Chris' answer below. This fixed it for me.

    – larslovlie
    May 2 '17 at 7:51



















  • I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.

    – RasmusW
    Apr 30 '17 at 19:38











  • In /sys/bus/pci/devices/ I have a device called 2f36c0b8:00:00.0. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci stops working after apt-get dist-upgrade and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade.

    – RasmusW
    Apr 30 '17 at 20:53











  • @RasmusW, check out Chris' answer below. This fixed it for me.

    – larslovlie
    May 2 '17 at 7:51

















I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.

– RasmusW
Apr 30 '17 at 19:38





I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.

– RasmusW
Apr 30 '17 at 19:38













In /sys/bus/pci/devices/ I have a device called 2f36c0b8:00:00.0. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci stops working after apt-get dist-upgrade and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade.

– RasmusW
Apr 30 '17 at 20:53





In /sys/bus/pci/devices/ I have a device called 2f36c0b8:00:00.0. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci stops working after apt-get dist-upgrade and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade.

– RasmusW
Apr 30 '17 at 20:53













@RasmusW, check out Chris' answer below. This fixed it for me.

– larslovlie
May 2 '17 at 7:51





@RasmusW, check out Chris' answer below. This fixed it for me.

– larslovlie
May 2 '17 at 7:51










3 Answers
3






active

oldest

votes


















2














I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:




Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:



root@pd-nvtest2:~# lspci
lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory




They suggest backing up the OS drive, running



apt-get remove linux-image-4.4.0-75-generic



and then



update-grub



Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.






share|improve this answer



















  • 2





    Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

    – larslovlie
    May 2 '17 at 7:47





















0














Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.



[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/






share|improve this answer
























  • This is complete non-sense.

    – larslovlie
    May 2 '17 at 7:47



















-1














On Azure VM this seems to be an issue with LIS on RedHat 7.5
Update Azure LIS for the VM and it should fix the issue.



wget https://aka.ms/lis
tar xvzf lis
cd LISISO
sudo ./install.sh
sudo reboot





share|improve this answer










New contributor




Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "89"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f909991%2flspci-returns-cannot-open-sys-bus-pci-devices-xxxxx-resource-no-such-file-or%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:




    Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
    Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:



    root@pd-nvtest2:~# lspci
    lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory




    They suggest backing up the OS drive, running



    apt-get remove linux-image-4.4.0-75-generic



    and then



    update-grub



    Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.






    share|improve this answer



















    • 2





      Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

      – larslovlie
      May 2 '17 at 7:47


















    2














    I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:




    Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
    Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:



    root@pd-nvtest2:~# lspci
    lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory




    They suggest backing up the OS drive, running



    apt-get remove linux-image-4.4.0-75-generic



    and then



    update-grub



    Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.






    share|improve this answer



















    • 2





      Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

      – larslovlie
      May 2 '17 at 7:47
















    2












    2








    2







    I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:




    Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
    Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:



    root@pd-nvtest2:~# lspci
    lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory




    They suggest backing up the OS drive, running



    apt-get remove linux-image-4.4.0-75-generic



    and then



    update-grub



    Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.






    share|improve this answer













    I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:




    Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
    Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:



    root@pd-nvtest2:~# lspci
    lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory




    They suggest backing up the OS drive, running



    apt-get remove linux-image-4.4.0-75-generic



    and then



    update-grub



    Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered May 2 '17 at 6:28









    Chris GormanChris Gorman

    361




    361








    • 2





      Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

      – larslovlie
      May 2 '17 at 7:47
















    • 2





      Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

      – larslovlie
      May 2 '17 at 7:47










    2




    2





    Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

    – larslovlie
    May 2 '17 at 7:47







    Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!

    – larslovlie
    May 2 '17 at 7:47















    0














    Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.



    [1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/






    share|improve this answer
























    • This is complete non-sense.

      – larslovlie
      May 2 '17 at 7:47
















    0














    Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.



    [1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/






    share|improve this answer
























    • This is complete non-sense.

      – larslovlie
      May 2 '17 at 7:47














    0












    0








    0







    Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.



    [1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/






    share|improve this answer













    Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.



    [1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Apr 30 '17 at 0:27









    EvanEvan

    12




    12













    • This is complete non-sense.

      – larslovlie
      May 2 '17 at 7:47



















    • This is complete non-sense.

      – larslovlie
      May 2 '17 at 7:47

















    This is complete non-sense.

    – larslovlie
    May 2 '17 at 7:47





    This is complete non-sense.

    – larslovlie
    May 2 '17 at 7:47











    -1














    On Azure VM this seems to be an issue with LIS on RedHat 7.5
    Update Azure LIS for the VM and it should fix the issue.



    wget https://aka.ms/lis
    tar xvzf lis
    cd LISISO
    sudo ./install.sh
    sudo reboot





    share|improve this answer










    New contributor




    Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.

























      -1














      On Azure VM this seems to be an issue with LIS on RedHat 7.5
      Update Azure LIS for the VM and it should fix the issue.



      wget https://aka.ms/lis
      tar xvzf lis
      cd LISISO
      sudo ./install.sh
      sudo reboot





      share|improve this answer










      New contributor




      Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.























        -1












        -1








        -1







        On Azure VM this seems to be an issue with LIS on RedHat 7.5
        Update Azure LIS for the VM and it should fix the issue.



        wget https://aka.ms/lis
        tar xvzf lis
        cd LISISO
        sudo ./install.sh
        sudo reboot





        share|improve this answer










        New contributor




        Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.










        On Azure VM this seems to be an issue with LIS on RedHat 7.5
        Update Azure LIS for the VM and it should fix the issue.



        wget https://aka.ms/lis
        tar xvzf lis
        cd LISISO
        sudo ./install.sh
        sudo reboot






        share|improve this answer










        New contributor




        Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer








        edited Mar 11 at 19:58









        cmak.fr

        2,1541021




        2,1541021






        New contributor




        Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered Mar 11 at 18:47









        Sham SVSham SV

        1




        1




        New contributor




        Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Sham SV is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Ask Ubuntu!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f909991%2flspci-returns-cannot-open-sys-bus-pci-devices-xxxxx-resource-no-such-file-or%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How did Captain America manage to do this?

            迪纳利

            南乌拉尔铁路局