tonglin0325的个人主页

Ubuntu16.04安装cuda和pytorch

1.安装cuda#

参考:Ubuntu下安装CUDA

pytorch可以不依赖GPU运行,但是如果需要使用NVIDIA的GPU,则需要安装cuda

查看是否安装cuda

1
2
3
4
lintong@master:~$ nvcc -V
程序“nvcc”尚未安装。 您可以使用以下命令安装:
sudo apt install nvidia-cuda-toolkit

查看GPU型号,GPU型号是GTX1050Ti

1
2
3
4
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

查看是否安装NVIDIA GPU的驱动,驱动的版本是430.64,最高能支持到的cuda版本是10.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
nvidia-smi
Sun Oct 23 20:27:21 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A |
| 40% 29C P8 N/A / 100W | 370MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1106 G /usr/lib/xorg/Xorg 259MiB |
| 0 28186 G compiz 106MiB |
| 0 28455 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 1MiB |
+-----------------------------------------------------------------------------+

去官方下载runfile来安装cuda

1
2
https://developer.nvidia.com/cuda-toolkit-archive

 

安装,选择continue

accept

去除driver选项,然后选择install

安装完成

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
sudo sh cuda_10.1.243_418.87.00_linux.run
===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.1/
Samples: Installed in /home/lintong/, but missing recommended libraries

Please make sure that
- PATH includes /usr/local/cuda-10.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

 在~/.bashrc或者/etc/profile中添加,然后source

1
2
3
4
# cuda
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

验证是否安装成功

1
2
3
4
5
6
lintong@master:~/下载$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

禁用 Nouveau,编译/etc/modprobe.d/blacklist.conf,添加

1
2
3
blacklist nouveau
options nouveau modeset=0

若下面命令没有任何输出,则说明禁用成功

1
2
lsmod | grep nouveau

更新并重启

1
2
3
sudo update-initramfs -u
sudo reboot

2.安装nvidia驱动#

重启后发现nvidia驱动掉了,nvidia-smi命令无法正常工作,导致ubuntu的图形界面无法登入,所以要使用terminal再次安装nvidia驱动

关闭图形界面

1
2
sudo service lightdm stop

卸载原有的驱动

1
2
sudo apt-get remove nvidia-*

下载最新的nvidia驱动,这里的版本是515.76

1
2
https://www.nvidia.com/Download/index.aspx

下载和安装nvidia驱动

1
2
3
4
wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/515.76/NVIDIA-Linux-x86_64-515.76.run
sudo chmod +x ./NVIDIA-Linux-x86_64-515.76.run
sudo ./NVIDIA-Linux-x86_64-515.76.run -no-x-check -no-nouveau-check -no-opengl-files

安装过程如何选择

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1. There appears to already be a driver installed on your system (version:      
515.76). As part of installing this driver (version: 515.76), the existing
driver will be uninstalled. Are you sure you want to continue?
Continue installation Abort installation
(选择 Coninue,如果是重装的话)
2. The distribution-provided pre-install script failed! Are you sure you want
to continue?
Continue installation Abort installation
(选择 Cotinue)
3. Would you like to register the kernel module sources with DKMS? This will
allow DKMS to automatically build a new module, if you install a different
kernel later.
Yes No
(这里选 No)
4. Install NVIDIA's 32-bit compatibility libraries?
Yes No
(这里选 No)
5. Installation of the kernel module for the NVIDIA Accelerated Graphics Driver
for Linux-x86_64 (version 515.76) is now complete.
OK
6.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up.
Yes No
(这里选 Yes)

reboot重启或者启动图形界面

1
2
sudo service lightdm start

安装成功,ubuntu图形界面也恢复正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
nvidia-smi
Tue Oct 25 23:51:22 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 40% 36C P0 N/A / 100W | 371MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1106 G /usr/lib/xorg/Xorg 369MiB |
+-----------------------------------------------------------------------------+

 

3.安装cuDNN#

参考:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux

cuDNN是GPU 加速的深度神经网络基元库,官网地址,下载的时候需要注册nvidia账号

1
2
https://developer.nvidia.com/rdp/cudnn-archive

下载的文件:cudnn-linux-x86_64-8.5.0.96_cuda10-archive.tar.xz

安装

1
2
3
4
5
sudo tar  -xvf cudnn-linux-x86_64-8.5.0.96_cuda10-archive.tar.xz
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

下载的文件:cudnn-local-repo-ubuntu1604-8.5.0.96_1.0-1_amd64.deb

安装

1
2
sudo dpkg -i ./cudnn-local-repo-ubuntu1604-8.5.0.96_1.0-1_amd64.deb

验证cudnn是否安装成功

1
2
3
4
5
6
7
8
python3.6
Python 3.6.13 (default, Feb 20 2021, 21:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.backends import cudnn
>>> print(cudnn.is_available())
True

4.安装pytorch#

pytorch官方安装文档

1
2
https://pytorch.org/get-started/locally/

使用pytorch进行验证GPU是否可用

如果遇到下面报错的话,说明nvidia驱动的版本过低,则需要重新安装最新的版本,这里是由于安装了430.64的低版本,重新安装515.76的最新版本后就不会报错了

1
2
3
4
5
6
7
8
9
10
python3.6
Python 3.6.13 (default, Feb 20 2021, 21:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/home/lintong/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
False

安装515.76版本后

1
2
3
4
5
6
7
8
9
10
11
12
python3.6
Python 3.6.13 (default, Feb 20 2021, 21:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.device_count())
1
>>> print(torch.cuda.get_device_name(0))
'NVIDIA GeForce GTX 1050 Ti'