Machine learningPython

使用CNN對MNIST手寫數據集進行訓練(CPU或GPU運算)

電腦硬體

處理器(CPU):Intel® Core™ i5-10300H CPU @ 2.50GHz
顯示晶片(GPU):NVIDIA® GeForce GTX 1650 Ti

Python程式環境

  • Tensorflow-gpu-1.15.0
  • Keras 2.3.1

MNIST手寫數據集

mnist為手寫數字0-9的黑白圖片數據集。
圖片與對應標籤方面,數量為training data:55000筆,validation data:5000; 僅有圖片的test data:10000筆; 每張黑白圖片大小為[1,28,28,1](張,寬,長,通道),有784個像素(pixels),每個像素值都是數字0-255(0黑255白)。
標籤與圖片的手寫數字對應,如圖片內為手寫數字5,標籤就設5,0-9種手寫圖片就有10個標籤,可用list儲存,如圖片1,5,8,6,4對應標籤就是label=[1,5,8,6,4]

虛擬碼

  1. 下載mnist數據集,分成training data 跟test data
  2. 先處理圖片,將其轉換成列表形式,並透過標準化將所有像素值變成0-1之間
  3. 標籤處理成one hot encoding 形式
  4. 建立CNN網路
  5. 編譯模型
  6. 訓練模型
  7. 衡量模型訓練誤差

CNN網路架構

問題

因為下載的是Tensorflow-gpu-1.15.0,相對的版本使用Keras 2.3.1,若直接跑Keras寫的CNN程式碼,會產生以下錯誤(部份節錄)

Epoch 1/1
2020-08-14 18:01:49.576948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-14 18:01:49.775817: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-14 18:01:50.056149: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-08-14 18:01:50.062654: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File “test.py", line 49, in
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1)
File “/home/leodflag5/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File “/home/leodflag5/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File “/home/leodflag5/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in call
run_metadata=self.run_metadata)
File “/home/leodflag5/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1472, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[metrics/accuracy/Identity/_91]]
0 successful operations.
0 derived errors ignored.

關閉GPU,使用CPU運算

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

使用GPU運算

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
session=tf.InteractiveSession(config=config)

CPU運算時間

GPU 運算時間

大約差4倍,GPU還有使用CUDA加速運算