Start TensorFlow with GPU Support on your Manjaro Linux
When you want to get into deep learning, you cannot avoid learning some of the TensorFlow.
TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.
And today, I’m gonna make a tutorial about how to run TensorFlow with GPU on your Manjaro or Arch Linux.
Verify your GPU is CUDA Enabled
$ lspci | grep -i nvidia
Then find whether your GPU is on CUDA GPUs or not. If it is on the list, then you are OK to go on the next steps.
If you see nothing from the last command, but you are sure that you have Nvidia GPU installed on your local machine. Maybe you did not install the driver for your GPU.
To Install the driver for your Nvidia GPU, go to Unix Drivers Archive and find you model.
For example, I have GeForce GTX 970 on my MSI GS60 2QE. It is a CUDA-Enabled GeForce Products. Then I find out that the drivers version it needs is 390.67.
Then I simply install the right version driver through pacman:
$ sudo pacman -S mhwd-nvidia-390xx
You may install other dependencies like nvidia-utils, etc.
Finally, you run the first command, if you see the output matches your GPU, then you are good to go.
Install CUDA
Currently TensorFlow requires CUDA 8. 0 or higher. And you can look for the details here CUDA Toolkit and CUDA Toolkit Documentation.
But the regular Arch Linux repository contains CUDA 9.2, which is currenlty not compatible to TensorFlow 1.4.
There exists an user contributed AUR with CUDA 8.0, so install it via yaourt:
$ yaourt -S --noconfirm cuda-8.0
And it will be installed to the directory /opt/cuda/
PS: If you want to install CUDA 9.0, because it is not maintained on official repository anymore, you may want to download it directly here.
But I found out that to install CUDA 9.0, your gcc version has to be 6.x. While gcc6 is not maintained on official Arch Linux repository either, to instalk from AUR and compile it on your machine may spent 4 to 5 hours.
So for convenience, this tutorial will be focused on CUDA 8.0.
Install cuDNN
Currently, the latest cuDNN version is 7.14. But you cannot find the Linux library of cuDNN v 7.14 on cuDNN Download.
To use cuDNN v7 you need download v7.05 for CUDA 8.0.
Now move to the direcotry contains your downloaded cuDNN and extract it:
$ cd ~/Downloads
$ tar -xvzf cudnn-8.0-linux-x64-v7.tgz
After you have extracted the file it will create /cuda folder.
Copy the cuDNN files into the CUDA Toolkit directories:
$ sudo cp ~/Downloads/cuda/include/cudnn.h /opt/cuda/include
$ sudo cp ~/Downloads/cuda/lib64/libcudnn* /opt/cuda/lib64
Then change the cuDNN file permissions:
$ sudo chmod a+r /opt/cuda/lib64/libcudnn*
Install Tensor Flow GPU
It is very simple to install TensorFlow GPU in your system environment:
$ pip3 install tensorflow-gpu==1.4.1
Here we need to specify the tensorflow-gpu version to 1.4.1. Because the TensorFlow GPU later than 1.5 will only be worked with CUDA 9.0.
Set Environment Variable
Set the LD_LIBRARY_PATH and CUDA_HOME environment variables by adding the commands below to your .bashrc or root .zshrc file depending on your shell.
Or you can use following commands to add them temporarily(these assume CUDA installation is in /opt/cuda/):
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/cuda/lib64"
$ export CUDA_HOME=/opt/cuda/
Validate your installation
In the terminal:
$ python
>> import tensorflow as tf
>> hello = tf.constant('Hello, TensorFlow!')
>> sess = tf.Session()
>> print(sess.run(hello).decode())
If the system outputs the following, then you are ready to begin writing TensorFlow programs:
>> Hello, TensorFlow!
PS: If you followed the instruction as TensorFlow, you may get your output like:
>> b'Hello, TensorFlow!'
Because it is a bytestring. decode
method as I suggested will return the string.
Switch to Nvidia GPU
In most laptop with Nvidia GPU on the market, they usually have Intel GPU install, too.
So to run the TensorFlow with GPU you need to switch to Nvidia GPU:
$ sudo tee /proc/acpi/bbswitch <<< ON
If you want to switch back:
$ sudo rmmod nvidia_uvm
$ sudo rmmod nvidia
$ sudo tee /proc/acpi/bbswitch <<< OFF
To check your TensorFlow is actually running with Nvidia GPU support, please download GPU Support and run in your environment:
$ python tensorflow-gpu.py
You will get similar result as follwing if your GPU working:
2018-07-15 08:59:50.168749: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-15 08:59:51.499067: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-15 08:59:51.499451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 970M major: 5 minor: 2 memoryClockRate(GHz): 1.038
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.89GiB
2018-07-15 08:59:51.499473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 970M, pci bus id: 0000:01:00.0, compute capability: 5.2)
[[22. 28.]
[49. 64.]]
If your GPU has not been switched to Nvdia, you may get an error.
Other useful dependencies
$ sudo pacman -S --noconfirm wget python-pip
$ pip install wheel
Fix pip after uninstalling TensorFlow
Install the TensorFlow GPU in your local environment is easy but risky.
You pip may stop working and throws:
TypeError: parse() got an unexpected keyword argument 'transport_encoding'
when trying to install new packages.
This is caused by version conflict of html5lib after uninstalling the TensorFlow.
To fix this error, download https://github.com/html5lib/html5lib-python/tree/master/html5lib
Then overwrite all the files within html5lib folder in “/usr/lib/python3.6/site-packages/html5lib” with the files you downloaded.