lspci returns “Cannot open /sys/bus/pci/devices/xxxxx/resource: No such file or directory”
My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.
The nvidia-smi application freezes.
The command lspci
yields
lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory
And of course, that path (no longer?) exists. What does exist is,
$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/
Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.
This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.
Anyone have any ideas?
nvidia gpu-drivers lspci
add a comment |
My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.
The nvidia-smi application freezes.
The command lspci
yields
lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory
And of course, that path (no longer?) exists. What does exist is,
$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/
Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.
This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.
Anyone have any ideas?
nvidia gpu-drivers lspci
I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.
– RasmusW
Apr 30 '17 at 19:38
In/sys/bus/pci/devices/
I have a device called2f36c0b8:00:00.0
. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out thatlspci
stops working afterapt-get dist-upgrade
and reboot. Unfortunately, I have no solution - for this test VM I can skipapt-get dist-upgrade
.
– RasmusW
Apr 30 '17 at 20:53
@RasmusW, check out Chris' answer below. This fixed it for me.
– larslovlie
May 2 '17 at 7:51
add a comment |
My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.
The nvidia-smi application freezes.
The command lspci
yields
lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory
And of course, that path (no longer?) exists. What does exist is,
$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/
Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.
This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.
Anyone have any ideas?
nvidia gpu-drivers lspci
My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.
The nvidia-smi application freezes.
The command lspci
yields
lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory
And of course, that path (no longer?) exists. What does exist is,
$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/
Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.
This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.
Anyone have any ideas?
nvidia gpu-drivers lspci
nvidia gpu-drivers lspci
edited May 2 '17 at 7:50
larslovlie
asked Apr 28 '17 at 17:49
larslovlielarslovlie
814
814
I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.
– RasmusW
Apr 30 '17 at 19:38
In/sys/bus/pci/devices/
I have a device called2f36c0b8:00:00.0
. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out thatlspci
stops working afterapt-get dist-upgrade
and reboot. Unfortunately, I have no solution - for this test VM I can skipapt-get dist-upgrade
.
– RasmusW
Apr 30 '17 at 20:53
@RasmusW, check out Chris' answer below. This fixed it for me.
– larslovlie
May 2 '17 at 7:51
add a comment |
I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.
– RasmusW
Apr 30 '17 at 19:38
In/sys/bus/pci/devices/
I have a device called2f36c0b8:00:00.0
. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out thatlspci
stops working afterapt-get dist-upgrade
and reboot. Unfortunately, I have no solution - for this test VM I can skipapt-get dist-upgrade
.
– RasmusW
Apr 30 '17 at 20:53
@RasmusW, check out Chris' answer below. This fixed it for me.
– larslovlie
May 2 '17 at 7:51
I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.
– RasmusW
Apr 30 '17 at 19:38
I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.
– RasmusW
Apr 30 '17 at 19:38
In
/sys/bus/pci/devices/
I have a device called 2f36c0b8:00:00.0
. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci
stops working after apt-get dist-upgrade
and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade
.– RasmusW
Apr 30 '17 at 20:53
In
/sys/bus/pci/devices/
I have a device called 2f36c0b8:00:00.0
. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out that lspci
stops working after apt-get dist-upgrade
and reboot. Unfortunately, I have no solution - for this test VM I can skip apt-get dist-upgrade
.– RasmusW
Apr 30 '17 at 20:53
@RasmusW, check out Chris' answer below. This fixed it for me.
– larslovlie
May 2 '17 at 7:51
@RasmusW, check out Chris' answer below. This fixed it for me.
– larslovlie
May 2 '17 at 7:51
add a comment |
3 Answers
3
active
oldest
votes
I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:
Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:
root@pd-nvtest2:~# lspci
lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory
They suggest backing up the OS drive, running
apt-get remove linux-image-4.4.0-75-generic
and then
update-grub
Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.
2
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
add a comment |
Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.
[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
add a comment |
On Azure VM this seems to be an issue with LIS on RedHat 7.5
Update Azure LIS for the VM and it should fix the issue.
wget https://aka.ms/lis
tar xvzf lis
cd LISISO
sudo ./install.sh
sudo reboot
New contributor
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f909991%2flspci-returns-cannot-open-sys-bus-pci-devices-xxxxx-resource-no-such-file-or%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:
Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:
root@pd-nvtest2:~# lspci
lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory
They suggest backing up the OS drive, running
apt-get remove linux-image-4.4.0-75-generic
and then
update-grub
Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.
2
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
add a comment |
I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:
Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:
root@pd-nvtest2:~# lspci
lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory
They suggest backing up the OS drive, running
apt-get remove linux-image-4.4.0-75-generic
and then
update-grub
Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.
2
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
add a comment |
I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:
Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:
root@pd-nvtest2:~# lspci
lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory
They suggest backing up the OS drive, running
apt-get remove linux-image-4.4.0-75-generic
and then
update-grub
Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.
I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:
Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs.
Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:
root@pd-nvtest2:~# lspci
lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory
They suggest backing up the OS drive, running
apt-get remove linux-image-4.4.0-75-generic
and then
update-grub
Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.
answered May 2 '17 at 6:28
Chris GormanChris Gorman
361
361
2
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
add a comment |
2
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
2
2
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
Thank you very much, this worked for me, although I'm using Ubuntu 16.10 on Azure NV6. I did the following sudo apt-get remove linux-image-4.8.0-49-generic and this removed 4.8.0-49 and installed 4.8.0-51, which made nvidia-smi and lspci work again!
– larslovlie
May 2 '17 at 7:47
add a comment |
Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.
[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
add a comment |
Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.
[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
add a comment |
Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.
[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/
Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.
[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/
answered Apr 30 '17 at 0:27
EvanEvan
12
12
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
add a comment |
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
This is complete non-sense.
– larslovlie
May 2 '17 at 7:47
add a comment |
On Azure VM this seems to be an issue with LIS on RedHat 7.5
Update Azure LIS for the VM and it should fix the issue.
wget https://aka.ms/lis
tar xvzf lis
cd LISISO
sudo ./install.sh
sudo reboot
New contributor
add a comment |
On Azure VM this seems to be an issue with LIS on RedHat 7.5
Update Azure LIS for the VM and it should fix the issue.
wget https://aka.ms/lis
tar xvzf lis
cd LISISO
sudo ./install.sh
sudo reboot
New contributor
add a comment |
On Azure VM this seems to be an issue with LIS on RedHat 7.5
Update Azure LIS for the VM and it should fix the issue.
wget https://aka.ms/lis
tar xvzf lis
cd LISISO
sudo ./install.sh
sudo reboot
New contributor
On Azure VM this seems to be an issue with LIS on RedHat 7.5
Update Azure LIS for the VM and it should fix the issue.
wget https://aka.ms/lis
tar xvzf lis
cd LISISO
sudo ./install.sh
sudo reboot
New contributor
edited Mar 11 at 19:58
cmak.fr
2,1541021
2,1541021
New contributor
answered Mar 11 at 18:47
Sham SVSham SV
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f909991%2flspci-returns-cannot-open-sys-bus-pci-devices-xxxxx-resource-no-such-file-or%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm seeing the same problem right now on a newly deployed NV6 VM on Azure. It also does not detect the Tesla M60 GPU on that machine. It worked fine on a VM I deployed a few days ago.
– RasmusW
Apr 30 '17 at 19:38
In
/sys/bus/pci/devices/
I have a device called2f36c0b8:00:00.0
. Except for the first 4 hex digits, that is the device ID that lspci complains about. I've tried deploying a new instance, and have found out thatlspci
stops working afterapt-get dist-upgrade
and reboot. Unfortunately, I have no solution - for this test VM I can skipapt-get dist-upgrade
.– RasmusW
Apr 30 '17 at 20:53
@RasmusW, check out Chris' answer below. This fixed it for me.
– larslovlie
May 2 '17 at 7:51