Image Classification with Keras
Classification of the MNIST Digit and Fashion data sets
This file is intended for use in Google Colabs (or a GPU based environment) because it will pretty much die with CPU-only
Image Processing with Keras
The notes in here are based on this YouTube series from Jeff Heaton, further information on CNN's however can also be found in this series from Stanford
# Set Colab TF to 2.x
try:
%tensorflow_version 2.x
COLAB = True
print("Note: using Google CoLab")
except:
print("Note: not using Google CoLab")
COLAB = False
Overview
For processing images we'll use PIL
which enables you to use images directly in Python
To install PIL
use:
pip install pillow
We'll also use the requests
package to get images from the internet via an HTTP request
IMAGE_URL = 'https://upload.wikimedia.org/wikipedia/commons/9/92/Brookings.jpg'
%matplotlib inline
import pandas as pd
import numpy as np
import requests
from io import BytesIO
from matplotlib.pyplot import imshow
from PIL import Image
We can import an image with:
response = requests.get(IMAGE_URL)
img = Image.open(BytesIO(response.content))
img.load()
img
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1157x744 at 0x7FC6AFFC40F0>
Each image that we import has it's pixels in an array grouped by position as well as color list. So for an image that's 3x3 pixels, we have an array like so:
[
[[r, g, b], [r, g, b], [r, g, b]],
[[r, g, b], [r, g, b], [r, g, b]],
[[r, g, b], [r, g, b], [r, g, b]],
]
We can see our data set by converting the image to an array:
np.array(img)
array([[[ 86, 133, 177],
, [ 85, 132, 176],
, [ 84, 133, 176],
, ...,
, [ 94, 128, 153],
, [ 91, 128, 155],
, [ 94, 129, 169]],
,
, [[ 86, 133, 177],
, [ 88, 135, 179],
, [ 88, 137, 180],
, ...,
, [ 96, 133, 159],
, [ 92, 136, 165],
, [ 99, 141, 183]],
,
, [[ 83, 130, 174],
, [ 87, 134, 178],
, [ 89, 138, 181],
, ...,
, [108, 150, 175],
, [100, 149, 179],
, [ 97, 144, 186]],
,
, ...,
,
, [[127, 77, 76],
, [131, 81, 80],
, [128, 80, 76],
, ...,
, [ 4, 10, 10],
, [ 2, 11, 10],
, [ 2, 11, 10]],
,
, [[132, 81, 77],
, [129, 80, 75],
, [124, 75, 70],
, ...,
, [ 4, 10, 10],
, [ 3, 12, 11],
, [ 3, 12, 11]],
,
, [[140, 90, 83],
, [137, 87, 80],
, [130, 81, 74],
, ...,
, [ 11, 17, 17],
, [ 10, 19, 18],
, [ 10, 19, 18]]], dtype=uint8)
Using the PIL
library we can also generate images from an array of data, for example a 64x64 pixel image can be created whith the following:
w, h = 64, 64
data = np.zeros((h, w, 3), dtype=np.uint8)
def assign_pixels(rgb, row_start, col_start):
for row in range(32):
for col in range(32):
data[row + row_start, col + col_start] = rgb
# yellow
assign_pixels([255, 255, 0], 0, 0)
# red
assign_pixels([255, 0, 0], 32, 0)
# blue
assign_pixels([0, 0, 255], 0, 32)
#green
assign_pixels([0, 255, 0], 32, 32)
img = Image.fromarray(data, 'RGB')
img
<PIL.Image.Image image mode=RGB size=64x64 at 0x7FC6AFB402B0>
Using a combation of reading, writing, and processing using PIL
and numpy
. When using images some preprocessing tasks we may want to do are:
- Size and Shape normalization
- Greyscaling
- Flatting of Image data to 1D array
- Normalizing pixel values from
0 -> 255
to-126 -> 126
Computer Vision
When processing computer vision we can make use of something like Colabs
to ensure that we have a GPU to run on, otherwise these tasks can take a very long time, when setting up we'll use the Python with GPU
configuration on Colab
When using image type data there are some distinctions to when we use NN's for other tasks:
- Usually classification
- Input is now 3 dimensional - heght, width, colour
- Data is not transformed, no Z-scores or Dummy Variables
- Processing is much slower
- Different Layer types such as Dense, Convolutional, and Max Pooling
- Data will be in the form of image files and not CSV (TF provides some mechanisms to support with this)
Some common ML DB's are the MNIST Digits and MNIST Fashion data which have the same data scructures as well as the CIFAR data which is used for ResNet training
Convolutional Neural Networks
A Convolution Layer is a layer type that's able to scan across the previous layer, this allows it to identify features that are positioned relative to other features
In a Convolution Layer some of the things we need to specify are:
- Number of filters
- Filter size
- Stride
- Padding
- Activation Function
Max Pooling
After a colvolution we may want to subsample the previous Convolution layer in order to either connect to an output based on a Dense layer or pass it into another Convolution Layer to identify even higher order features
Max pooling layers help us to decrease resolution
MNIST Digit Dataset
Importing Data
We can import the MNIST dataset from TF to use like so:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display
import tensorflow.keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(f"Training: X {X_train.shape} Y {y_train.shape}")
print(f"Testing : X {X_test.shape} Y {y_test.shape}")
Based on the above we can see that we have a set of images with a size of 28x28. We can view the raw data for one of these with:
pd.DataFrame(X_train[0])
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 18 | 18 | 18 | 126 | 136 | 175 | 26 | 166 | 255 | 247 | 127 | 0 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 36 | 94 | 154 | 170 | 253 | 253 | 253 | 253 | 253 | 225 | 172 | 253 | 242 | 195 | 64 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 49 | 238 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 251 | 93 | 82 | 82 | 56 | 39 | 0 | 0 | 0 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 219 | 253 | 253 | 253 | 253 | 253 | 198 | 182 | 247 | 241 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 156 | 107 | 253 | 253 | 205 | 11 | 0 | 43 | 154 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 1 | 154 | 253 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 139 | 253 | 190 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 190 | 253 | 70 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 35 | 241 | 225 | 160 | 108 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 81 | 240 | 253 | 253 | 119 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 45 | 186 | 253 | 253 | 150 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 93 | 252 | 253 | 187 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 249 | 253 | 249 | 64 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 46 | 130 | 183 | 253 | 253 | 207 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39 | 148 | 229 | 253 | 253 | 253 | 250 | 182 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 114 | 221 | 253 | 253 | 253 | 253 | 201 | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 66 | 213 | 253 | 253 | 253 | 253 | 198 | 81 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
22 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 171 | 219 | 253 | 253 | 253 | 253 | 195 | 80 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23 | 0 | 0 | 0 | 0 | 55 | 172 | 226 | 253 | 253 | 253 | 253 | 244 | 133 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
24 | 0 | 0 | 0 | 0 | 136 | 253 | 253 | 253 | 212 | 135 | 132 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Or as an image using plt.imshow
:
plt.imshow(X_train[0], cmap='gray', interpolation='nearest')
<matplotlib.image.AxesImage at 0x7fc62a5c62b0>
<Figure size 432x288 with 1 Axes>
Training a Network
Preprocessing
Before training a network we'll do some preprocessing to format the data into somehting we can use directly by our network:
batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28
# the below may be necessary to reshape the data based on the Keras backend
# for example there could be different image format requirements for TF vs
# another library that Keras is compatible with
if K.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
# normalize the X
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255
# categorize the y
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)
input_shape
(28, 28, 1)
X_train_norm.shape, X_test_norm.shape
((60000, 28, 28, 1), (10000, 28, 28, 1))
y_train_cat.shape, y_test_cat.shape
((60000, 10), (10000, 10))
Train Model
- Define Sequential Model
- Create a few Conv2D layers
- Use a MaxPooling2D layer to reduce the resolution
- Flatten the data to pass to a Dense Layer
- Use a Dense Layer
- Add some Dropout
- Add the output Dense Layer
model = Sequential()
model.add(Conv2D(
64, # number of filters
(3, 3), # kernal size
activation='relu', # activation function
input_shape=input_shape # input shape
))
model.add(Conv2D(
64,
(3, 3),
activation='relu'
))
model.add(MaxPooling2D(
pool_size=(2, 2)
))
model.add(Dropout(0.25))
model.add(Flatten()) # always need to flatten when moving from Conv Layer
model.add(Dense(
num_classes,
activation='softmax'
))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model.summary()
Next, we can fit the model. Additionally we have some code to see how long the overall runtime will be for the training
import time
print(f"Start: {time.ctime()}")
model.fit(
X_train_norm, y_train_cat,
batch_size=batch_size,
epochs=epochs,
verbose=2,
validation_data=(X_test_norm, y_test_cat)
)
print(f"End: {time.ctime()}")
Evaluate Accuracy
Next we'll evaluate the accuracy of the models using our normal method:
score = model.evaluate(
X_test_norm,
y_test_cat,
verbose=0
)
print(f"Loss : {score[0]}")
print(f"Accuracy: {score[1]}")
MNIST Fashion Dataset
Import Data
from tensorflow.keras.datasets import fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
print(f"Training: X {X_train.shape} Y {y_train.shape}")
print(f"Testing : X {X_test.shape} Y {y_test.shape}")
The Fashion Dataset pretty much works as a drop-in for the Digits dataset, we can just copy all the data from the above as-is and we should be able to train the model
View Data
pd.DataFrame(X_train[0])
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 13 | 73 | 0 | 0 | 1 | 4 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 36 | 136 | 127 | 62 | 54 | 0 | 0 | 0 | 1 | 3 | 4 | 0 | 0 | 3 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 102 | 204 | 176 | 134 | 144 | 123 | 23 | 0 | 0 | 0 | 0 | 12 | 10 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 155 | 236 | 207 | 178 | 107 | 156 | 161 | 109 | 64 | 23 | 77 | 130 | 72 | 15 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 69 | 207 | 223 | 218 | 216 | 216 | 163 | 127 | 121 | 122 | 146 | 141 | 88 | 172 | 66 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 200 | 232 | 232 | 233 | 229 | 223 | 223 | 215 | 213 | 164 | 127 | 123 | 196 | 229 | 0 |
9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 183 | 225 | 216 | 223 | 228 | 235 | 227 | 224 | 222 | 224 | 221 | 223 | 245 | 173 | 0 |
10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 193 | 228 | 218 | 213 | 198 | 180 | 212 | 210 | 211 | 213 | 223 | 220 | 243 | 202 | 0 |
11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 0 | 12 | 219 | 220 | 212 | 218 | 192 | 169 | 227 | 208 | 218 | 224 | 212 | 226 | 197 | 209 | 52 |
12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 99 | 244 | 222 | 220 | 218 | 203 | 198 | 221 | 215 | 213 | 222 | 220 | 245 | 119 | 167 | 56 |
13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 55 | 236 | 228 | 230 | 228 | 240 | 232 | 213 | 218 | 223 | 234 | 217 | 217 | 209 | 92 | 0 |
14 | 0 | 0 | 1 | 4 | 6 | 7 | 2 | 0 | 0 | 0 | 0 | 0 | 237 | 226 | 217 | 223 | 222 | 219 | 222 | 221 | 216 | 223 | 229 | 215 | 218 | 255 | 77 | 0 |
15 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 62 | 145 | 204 | 228 | 207 | 213 | 221 | 218 | 208 | 211 | 218 | 224 | 223 | 219 | 215 | 224 | 244 | 159 | 0 |
16 | 0 | 0 | 0 | 0 | 18 | 44 | 82 | 107 | 189 | 228 | 220 | 222 | 217 | 226 | 200 | 205 | 211 | 230 | 224 | 234 | 176 | 188 | 250 | 248 | 233 | 238 | 215 | 0 |
17 | 0 | 57 | 187 | 208 | 224 | 221 | 224 | 208 | 204 | 214 | 208 | 209 | 200 | 159 | 245 | 193 | 206 | 223 | 255 | 255 | 221 | 234 | 221 | 211 | 220 | 232 | 246 | 0 |
18 | 3 | 202 | 228 | 224 | 221 | 211 | 211 | 214 | 205 | 205 | 205 | 220 | 240 | 80 | 150 | 255 | 229 | 221 | 188 | 154 | 191 | 210 | 204 | 209 | 222 | 228 | 225 | 0 |
19 | 98 | 233 | 198 | 210 | 222 | 229 | 229 | 234 | 249 | 220 | 194 | 215 | 217 | 241 | 65 | 73 | 106 | 117 | 168 | 219 | 221 | 215 | 217 | 223 | 223 | 224 | 229 | 29 |
20 | 75 | 204 | 212 | 204 | 193 | 205 | 211 | 225 | 216 | 185 | 197 | 206 | 198 | 213 | 240 | 195 | 227 | 245 | 239 | 223 | 218 | 212 | 209 | 222 | 220 | 221 | 230 | 67 |
21 | 48 | 203 | 183 | 194 | 213 | 197 | 185 | 190 | 194 | 192 | 202 | 214 | 219 | 221 | 220 | 236 | 225 | 216 | 199 | 206 | 186 | 181 | 177 | 172 | 181 | 205 | 206 | 115 |
22 | 0 | 122 | 219 | 193 | 179 | 171 | 183 | 196 | 204 | 210 | 213 | 207 | 211 | 210 | 200 | 196 | 194 | 191 | 195 | 191 | 198 | 192 | 176 | 156 | 167 | 177 | 210 | 92 |
23 | 0 | 0 | 74 | 189 | 212 | 191 | 175 | 172 | 175 | 181 | 185 | 188 | 189 | 188 | 193 | 198 | 204 | 209 | 210 | 210 | 211 | 188 | 188 | 194 | 192 | 216 | 170 | 0 |
24 | 2 | 0 | 0 | 0 | 66 | 200 | 222 | 237 | 239 | 242 | 246 | 243 | 244 | 221 | 220 | 193 | 191 | 179 | 182 | 182 | 181 | 176 | 166 | 168 | 99 | 58 | 0 | 0 |
25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 40 | 61 | 44 | 72 | 41 | 35 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
plt.imshow(X_train[0], cmap='gray', interpolation='nearest')
<matplotlib.image.AxesImage at 0x7fc607fb3ef0>
<Figure size 432x288 with 1 Axes>
Preprocess Data
batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28
# the below may be necessary to reshape the data based on the Keras backend
# for example there could be different image format requirements for TF vs
# another library that Keras is compatible with
if K.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
# normalize the X
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255
# categorize the y
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)
input_shape
(28, 28, 1)
X_train_norm.shape, X_test_norm.shape
((60000, 28, 28, 1), (10000, 28, 28, 1))
y_train_cat.shape, y_test_cat.shape
((60000, 10), (10000, 10))
Define Model
model = Sequential()
model.add(Conv2D(
64, # number of filters
(3, 3), # kernal size
activation='relu', # activation function
input_shape=input_shape # input shape
))
model.add(Conv2D(
64,
(3, 3),
activation='relu'
))
model.add(MaxPooling2D(
pool_size=(2, 2)
))
model.add(Dropout(0.25))
model.add(Flatten()) # always need to flatten when moving from Conv Layer
model.add(Dense(
num_classes,
activation='softmax'
))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model.summary()
Train Model
import time
print(f"Start: {time.ctime()}")
model.fit(
X_train_norm, y_train_cat,
batch_size=batch_size,
epochs=epochs,
verbose=2,
validation_data=(X_test_norm, y_test_cat)
)
print(f"End: {time.ctime()}")
Evaluate Model
score = model.evaluate(
X_test_norm,
y_test_cat,
verbose=0
)
print(f"Loss : {score[0]}")
print(f"Accuracy: {score[1]}")
ResNets in Keras
A Risidual Layer, also known as a skip layer, allows us to add some previous output as an additional input to another layer. This enables our network to go deeper than they would under normal circumstances while still showing a potential improvement in the output
We can look at implementing a ResNet using the CIFAR Dataset
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import os
from six.moves import cPickle
import tensorflow.keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense, Conv2D, BatchNormalization, Activation
from tensorflow.keras.layers import AveragePooling2D, Input, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
Import the Data
from tensorflow.keras.datasets import cifar10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
(X_train.shape, X_test.shape), (y_train.shape, y_test.shape)
(((50000, 32, 32, 3), (10000, 32, 32, 3)), ((50000, 1), (10000, 1)))
plt.imshow(X_train[0], cmap='gray', interpolation='nearest')
<matplotlib.image.AxesImage at 0x7fc60719bdd8>
<Figure size 432x288 with 1 Axes>
Constants for Training
# Training parameters
BATCH_SIZE = 32 # orig paper trained all networks with batch_size=128
EPOCHS = 200 # 200
USE_AUGMENTATION = True
NUM_CLASSES = np.unique(y_train).shape[0] # 10
COLORS = X_train.shape[3]
# Subtracting pixel mean improves accuracy
# This centers the pixel values around 0
SUBTRACT_PIXEL_MEAN = True
# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
VERSION = 1
# Computed depth from supplied model parameter n
if VERSION == 1:
DEPTH = COLORS * 6 + 2
elif version == 2:
DEPTH = COLORS * 9 + 2
Defining the ResNet Functions
The different ResNet functions based on the two papers can be seen defined below. They both make use of the common resnet_layer
function definition
The papers are:
- ResNet v1: K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385,2015.
- ResNet v2: He, K., Zhang, X., Ren, S., & Sun, J. (2016, October). Identity mappings in deep residual networks. In European conference on computer vision (pp. 630-645). Springer, Cham.
The difference between the two is that V2 makes use of batch normalization before each weight layer
ResNet Layer Definition
def lr_schedule(epoch):
"""Learning Rate Schedule
Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
Called automatically every epoch as part of callbacks during training.
# Arguments
epoch (int): The number of epochs
# Returns
lr (float32): learning rate
"""
lr = 1e-3
if epoch > 180:
lr *= 0.5e-3
elif epoch > 160:
lr *= 1e-3
elif epoch > 120:
lr *= 1e-2
elif epoch > 80:
lr *= 1e-1
print('Learning rate: ', lr)
return lr
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='relu',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
ResNet v1
def resnet_v1(input_shape, depth, num_classes=10):
"""ResNet Version 1 Model builder [a]
Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
Last ReLU is after the shortcut connection.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filters is
doubled. Within each stage, the layers have the same number filters and the
same number of filters.
Features maps sizes:
stage 0: 32x32, 16
stage 1: 16x16, 32
stage 2: 8x8, 64
The Number of parameters is approx the same as Table 6 of [a]:
ResNet20 0.27M
ResNet32 0.46M
ResNet44 0.66M
ResNet56 0.85M
ResNet110 1.7M
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 6 != 0:
raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
# Start model definition.
num_filters = 16
num_res_blocks = int((depth - 2) / 6)
inputs = Input(shape=input_shape)
x = resnet_layer(inputs=inputs)
# Instantiate the stack of residual units
for stack in range(3):
for res_block in range(num_res_blocks):
strides = 1
if stack > 0 and res_block == 0: # first layer but not first stack
strides = 2 # downsample
y = resnet_layer(inputs=x,
num_filters=num_filters,
strides=strides)
y = resnet_layer(inputs=y,
num_filters=num_filters,
activation=None)
if stack > 0 and res_block == 0: # first layer but not first stack
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = tensorflow.keras.layers.add([x, y])
x = Activation('relu')(x)
num_filters *= 2
# Add classifier on top.
# v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
ResNet v2
def resnet_v2(input_shape, depth, num_classes=10):
"""ResNet Version 2 Model builder [b]
Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
bottleneck layer
First shortcut connection per layer is 1 x 1 Conv2D.
Second and onwards shortcut connection is identity.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filter maps is
doubled. Within each stage, the layers have the same number filters and the
same filter map sizes.
Features maps sizes:
conv1 : 32x32, 16
stage 0: 32x32, 64
stage 1: 16x16, 128
stage 2: 8x8, 256
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'relu'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = tensorflow.keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
Normalize Data
# Input image dimensions
input_shape = X_train.shape[1:]
# Normalize data
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255
if SUBTRACT_PIXEL_MEAN:
X_train_mean = np.mean(X_train, axis=0)
X_train_norm -= X_train_mean
X_test_norm -= X_train_mean
# Categorize target
y_train_cat = to_categorical(y_train, NUM_CLASSES)
y_test_cat = to_categorical(y_test, NUM_CLASSES)
Define Model Based on Version
if VERSION == 2:
model = resnet_v2(input_shape=input_shape, depth=DEPTH)
else:
model = resnet_v1(input_shape=input_shape, depth=DEPTH)
model.compile(
loss='categorical_crossentropy',
optimizer=Adam(lr=lr_schedule(0)),
metrics=['accuracy']
)
model.summary()
Train Model
# Prepare callbacks for model saving and for learning rate adjustment.
lr_scheduler = LearningRateScheduler(lr_schedule)
lr_reducer = ReduceLROnPlateau(
factor=np.sqrt(0.1),
cooldown=0,
patience=5,
min_lr=0.5e-6
)
callbacks = [lr_reducer, lr_scheduler]
In the below section we have a choice to use image augmentation which will apply random transformations like resizing and moving around the image so the model does not overfit, it's not really doing anything more complicated than that
import time
print(f"Start: {time.ctime()}")
# Run training, with or without data augmentation.
if not USE_AUGMENTATION:
print('Not using data augmentation.')
model.fit(
X_train_norm, y_train_cat,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(X_test_norm, y_test_cat),
shuffle=True,
callbacks=callbacks
)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
# set input mean to 0 over the dataset
featurewise_center=False,
# set each sample mean to 0
samplewise_center=False,
# divide inputs by std of dataset
featurewise_std_normalization=False,
# divide each input by its std
samplewise_std_normalization=False,
# apply ZCA whitening
zca_whitening=False,
# epsilon for ZCA whitening
zca_epsilon=1e-06,
# randomly rotate images in the range (deg 0 to 180)
rotation_range=0,
# randomly shift images horizontally
width_shift_range=0.1,
# randomly shift images vertically
height_shift_range=0.1,
# set range for random shear
shear_range=0.,
# set range for random zoom
zoom_range=0.,
# set range for random channel shifts
channel_shift_range=0.,
# set mode for filling points outside the input boundaries
fill_mode='nearest',
# value used for fill_mode = "constant"
cval=0.,
# randomly flip images
horizontal_flip=True,
# randomly flip images
vertical_flip=False,
# set rescaling factor (applied before any other transformation)
rescale=None,
# set function that will be applied on each input
preprocessing_function=None,
# image data format, either "channels_first" or "channels_last"
data_format=None,
# fraction of images reserved for validation (strictly between 0 and 1)
validation_split=0.0)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(X_train_norm)
model.fit_generator(
datagen.flow(
X_train_norm,
y_train_cat,
batch_size=BATCH_SIZE
),
validation_data=(X_test_norm, y_test_cat),
epochs=EPOCHS,
verbose=0,
workers=1,
callbacks=callbacks,
use_multiprocessing=False
)
print(f"End: {time.ctime()}")
Evaluate the Model
scores = model.evaluate(X_test_norm, y_test_cat, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
Using your Own Images with Keras
When we're using common datasets, e.g. from Keras, we have certain convenience methods for accessing and working with the data, like we see below
from tensorflow.keras.datasets import cifar10
import numpy as np
# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train.shape
(50000, 32, 32, 3)
x_train[0]
array([[[ 59, 62, 63],
, [ 43, 46, 45],
, [ 50, 48, 43],
, ...,
, [158, 132, 108],
, [152, 125, 102],
, [148, 124, 103]],
,
, [[ 16, 20, 20],
, [ 0, 0, 0],
, [ 18, 8, 0],
, ...,
, [123, 88, 55],
, [119, 83, 50],
, [122, 87, 57]],
,
, [[ 25, 24, 21],
, [ 16, 7, 0],
, [ 49, 27, 8],
, ...,
, [118, 84, 50],
, [120, 84, 50],
, [109, 73, 42]],
,
, ...,
,
, [[208, 170, 96],
, [201, 153, 34],
, [198, 161, 26],
, ...,
, [160, 133, 70],
, [ 56, 31, 7],
, [ 53, 34, 20]],
,
, [[180, 139, 96],
, [173, 123, 42],
, [186, 144, 30],
, ...,
, [184, 148, 94],
, [ 97, 62, 34],
, [ 83, 53, 34]],
,
, [[177, 144, 116],
, [168, 129, 94],
, [179, 142, 87],
, ...,
, [216, 184, 140],
, [151, 118, 84],
, [123, 92, 72]]], dtype=uint8)
We can see some 32x32
images with a colour depth of 3
(0 - 255)
Usually when using any image training we try to resize/structure our images in a standard size so that we can handle the data consistently
Sometimes we would want to change our RGB values to be between 0 and 1 or -1 and 1 during some preprocessing step
Transforming Images
We can make use of the make_square
function below to convert an image to a square. the version below will simply crop off a part of the image in whatever direction the image is longer
%matplotlib inline
from PIL import Image, ImageFile
from matplotlib.pyplot import imshow
import requests
import numpy as np
from io import BytesIO
from IPython.display import display, HTML
IMAGE_WIDTH = 200
IMAGE_HEIGHT = 200
IMAGE_CHANNELS = 3
images = [
"https://upload.wikimedia.org/wikipedia/commons/9/92/Brookings.jpg",
"https://upload.wikimedia.org/wikipedia/commons/f/ff/"\
"WashU_Graham_Chapel.JPG",
"https://upload.wikimedia.org/wikipedia/commons/9/9e/SeigleHall.jpg",
"https://upload.wikimedia.org/wikipedia/commons/a/aa/WUSTLKnight.jpg",
"https://upload.wikimedia.org/wikipedia/commons/3/32/WashUABhall.jpg",
"https://upload.wikimedia.org/wikipedia/commons/c/c0/Brown_Hall.jpg",
"https://upload.wikimedia.org/wikipedia/commons/f/f4/South40.jpg"
]
"""
Trim an image's edges in the longer direction to convert it to a square
"""
def make_square(img):
cols,rows = img.size
if rows>cols:
pad = (rows-cols)/2
img = img.crop((pad,0,cols,cols))
else:
pad = (cols-rows)/2
img = img.crop((0,pad,rows,rows))
return img
Next we will download all the images, convert them to a square, and resize to be our set IMAGE_HEIGHT
and IMAGE_WIDTH
training_data = []
for url in images:
ImageFile.LOAD_TRUNCATED_IMAGES = False
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img.load()
img = make_square(img)
img = img.resize((IMAGE_WIDTH,IMAGE_HEIGHT),Image.ANTIALIAS)
training_data.append(np.asarray(img))
Once we've resized the images we will have a list of arrays, what we do next is transform the list into an array of arrays using the np.array
function. The training data is then divided by 127.5
and subtracted by 1 to normalize between -1 and 1
training_data = np.array(training_data) / 127.5 - 1.
training_data.shape
(7, 200, 200, 3)
training_data[0]
array([[[-0.12156863, 0.2627451 , 0.59215686],
, [-0.12156863, 0.2627451 , 0.59215686],
, [-0.10588235, 0.27058824, 0.59215686],
, ...,
, [-0.3254902 , -0.05098039, 0.27058824],
, [-0.61568627, -0.33333333, 0.01960784],
, [-0.40392157, -0.11372549, 0.20784314]],
,
, [[-0.16862745, 0.23921569, 0.59215686],
, [-0.14509804, 0.2627451 , 0.61568627],
, [-0.11372549, 0.27058824, 0.59215686],
, ...,
, [-0.41176471, -0.25490196, -0.01960784],
, [-0.4745098 , -0.31764706, -0.02745098],
, [-0.81176471, -0.70196078, -0.42745098]],
,
, [[-0.15294118, 0.24705882, 0.60784314],
, [-0.1372549 , 0.2627451 , 0.62352941],
, [-0.10588235, 0.27058824, 0.6 ],
, ...,
, [-0.35686275, -0.15294118, 0.06666667],
, [-0.60784314, -0.37254902, -0.09803922],
, [-0.05882353, 0.18431373, 0.42745098]],
,
, ...,
,
, [[-0.00392157, -0.39607843, -0.43529412],
, [-0.01960784, -0.37254902, -0.45882353],
, [-0.05882353, -0.37254902, -0.49019608],
, ...,
, [-0.4745098 , -0.78039216, -0.7254902 ],
, [-0.56078431, -0.77254902, -0.77254902],
, [-0.56078431, -0.76470588, -0.73333333]],
,
, [[ 0.05098039, -0.33333333, -0.34117647],
, [ 0.01960784, -0.30980392, -0.38823529],
, [-0.05098039, -0.31764706, -0.42745098],
, ...,
, [-0.64705882, -0.81960784, -0.74901961],
, [-0.70980392, -0.85882353, -0.78039216],
, [-0.79607843, -0.81960784, -0.75686275]],
,
, [[ 0.00392157, -0.38039216, -0.41960784],
, [-0.00392157, -0.34117647, -0.39607843],
, [-0.05098039, -0.34901961, -0.41176471],
, ...,
, [-0.81960784, -0.89019608, -0.75686275],
, [-0.74901961, -0.84313725, -0.71764706],
, [-0.85098039, -0.86666667, -0.74901961]]])
It can also be useful to save the data object for future use. For high dimensional data CSVs don't work, and for large datasets pickle
can be problematic. numpy
has a way to save binary data to disk with the np.save
method:
print("Saving training image binary...")
np.save("training",training_data) # Saves as "training_data.npy"
print("Done.")
DarkNet and YOLO
YOLO = You Only Look Once
YOLO allows us to recognize multiple objects within the same CNN and is trained using a CNN which outputs a bunch of different layers and with bounding boxes and labels. The CNN makes multiple different sections/predictions but only looks at the input data one
DarkNet is the original implementation of YOLO for C, DarkFlow is the version that can be used from Python
Installing DarkFlow
Because we are using Google CoLabs we need to first mount a folder to our Google Drive:
try:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
COLAB = True
print("Note: using Google CoLab")
%tensorflow_version 2.x
except:
print("Note: not using Google CoLab")
COLAB = False
Next, install the dependency via pip
import sys
!{sys.executable} -m pip install git+https://github.com/zzh8829/yolov3-tf2.git@master
Import the Weights
Since we aren't trying to retrain the YOLO model we can just import the preconfigured weights from the following giles:
import tensorflow as tf
import os
if COLAB:
ROOT = '/content/drive/My Drive/Colab Notebooks'
else:
ROOT = os.path.join(os.getcwd(),'data')
filename_darknet_weights = tf.keras.utils.get_file(
os.path.join(ROOT,'yolov3.weights'),
origin='https://pjreddie.com/media/files/yolov3.weights')
TINY = False
filename_convert_script = tf.keras.utils.get_file(
os.path.join(os.getcwd(),'convert.py'),
origin='https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/convert.py')
filename_classes = tf.keras.utils.get_file(
os.path.join(ROOT,'coco.names'),
origin='https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/data/coco.names')
filename_converted_weights = os.path.join(ROOT,'yolov3.tf')
Once we have downloaded the weights we need to transform them into a version that can be used by tensorflow:
import sys
!{sys.executable} "{filename_convert_script}" --weights "{filename_darknet_weights}" --output "{filename_converted_weights}"
Delete the Conversion Script
Sicne we no longer need to conversion script this can be deleted:
import os
os.remove(filename_convert_script)
Running DarkFlow
Prereqs:
cython
andopencv
To use the DarkFlow library we need to do the following:
- Import all needed packages
- Define the YOLO configuration using Keras
flags
- Scan for available devices to selectively use GPU
Import Packages
import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (YoloV3, YoloV3Tiny)
from yolov3_tf2.dataset import transform_images, load_tfrecord_dataset
from yolov3_tf2.utils import draw_outputs
import sys
from PIL import Image, ImageFile
import requests
Set Keras Flags
# Flags are used to define several options for YOLO.
flags.DEFINE_string('classes', filename_classes, 'path to classes file')
flags.DEFINE_string('weights', filename_converted_weights, 'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_string('tfrecord', None, 'tfrecord instead of image')
flags.DEFINE_integer('num_classes', 80, 'number of classes in the model')
FLAGS([sys.argv[0]])
['/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py']
Scan for Device with GPU
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
Making Predictions
To make a prediction we can do the following:
- Create an instance of
YoloV3
- Load the weights and classes
- Get an image to predict
- Preprocess image
- Make a Prediction
- Preview the Output over the Image
Create Yolo Instance
if FLAGS.tiny:
yolo = YoloV3Tiny(classes=FLAGS.num_classes)
else:
yolo = YoloV3(classes=FLAGS.num_classes)
FLAGS.yolo_score_threshold = 0.5
Load Weights
yolo.load_weights(FLAGS.weights).expect_partial()
class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
Download Image
url = "https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/cook.jpg"
response = requests.get(url)
img_raw = tf.image.decode_image(response.content, channels=3)
Preprocess Image
img = tf.expand_dims(img_raw, 0)
img = transform_images(img, FLAGS.size)
Make Prediction
boxes, scores, classes, nums = yolo(img)
print('detections:')
for i in range(nums[0]):
cls = class_names[int(classes[0][i])]
score = np.array(scores[0][i])
box = np.array(boxes[0][i])
print(f"\t{cls}, {score}, {box}")
Overlay Predictions
img = img_raw.numpy()
img = draw_outputs(img, (boxes, scores, classes, nums), class_names)
#cv2.imwrite(FLAGS.output, img) # Save the image
display(Image.fromarray(img, 'RGB')) # Display the image
<PIL.Image.Image image mode=RGB size=240x320 at 0x7F5F6C63CCC0>
Generative Adversarial Networks (GANs)
GANs are pairs of neural networks in which:
- Generator - One network generates data, starts working with random seed data
- Discriminator - Another tries to guess whether or not the data is real, this is trained on real data
The Generator tries to create data that fools the Discriminator
In general it is easier to train the Generator than the Discriminator
We have to sort of train the two independent of each other and not modify it one after the other. We should not train them together as this will lead to each just trying to fool the other and not actually give us anything usable
We pass random seeds into the generator and it outputs images. These images are passed to the Discriminator during new rounds of traning as the fake images. When training the Discriminator we will pass in images from the traning set (real) and images from the generator (fake) and the role of the discriminator will be to correctly and confidently differentiate between the real and fake images
The ideal training case is where our generator creates images that are so realistic that our discriminator can no longer figure out what's real or fake
Overall, the distributions in the way the generator learns the data will begin to resemble the trends in the actual data over time
Implementing a Simple GAN with Keras
Import Packages
import tensorflow as tf
from tensorflow.keras.layers import Input, Reshape, Dropout, Dense
from tensorflow.keras.layers import Flatten, BatchNormalization
from tensorflow.keras.layers import Activation, ZeroPadding2D
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import UpSampling2D, Conv2D
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
import numpy as np
from PIL import Image
from tqdm import tqdm
import os
import time
import matplotlib.pyplot as plt
Init Constants
Some constants that we're using to train the GAN are GENERATE_RES
which is the resolution factor, and DATA_PATH
which is where the files are stored
# Generation resolution - Must be square
# Training data is also scaled to this.
# Note GENERATE_RES 4 or higher
# will blow Google CoLab's memory and have not
# been tested extensivly.
GENERATE_RES = 3 # Generation resolution factor
# (1=32, 2=64, 3=96, 4=128, etc.)
GENERATE_SQUARE = 32 * GENERATE_RES # rows/cols (should be square)
IMAGE_CHANNELS = 3
# Preview image
PREVIEW_ROWS = 4
PREVIEW_COLS = 7
PREVIEW_MARGIN = 16
# Size vector to generate images from
SEED_SIZE = 100
# Configuration
DATA_PATH = '/content/drive/My Drive/Colab Notebooks'
EPOCHS = 50
BATCH_SIZE = 32
BUFFER_SIZE = 60000
Download the Files
Download the files from Kaggle and save it to the data_path/face_images
directory
Import the Downloaded Files
def hms_string(sec_elapsed):
h = int(sec_elapsed / (60 * 60))
m = int((sec_elapsed % (60 * 60)) / 60)
s = sec_elapsed % 60
return "{}:{:>02}:{:>05.2f}".format(h, m, s)
training_binary_path = os.path.join(DATA_PATH,
f'training_data_{GENERATE_SQUARE}_{GENERATE_SQUARE}.npy')
print(f"Looking for file: {training_binary_path}")
if not os.path.isfile(training_binary_path):
start = time.time()
print("Loading training images...")
training_data = []
faces_path = os.path.join(DATA_PATH, 'face_images')
if not os.path.exists(faces_path):
os.mkdir()
for filename in tqdm(os.listdir(faces_path)):
path = os.path.join(faces_path,filename)
image = Image.open(path).resize((GENERATE_SQUARE,
GENERATE_SQUARE),Image.ANTIALIAS)
training_data.append(np.asarray(image))
training_data = np.reshape(training_data,(-1,GENERATE_SQUARE,
GENERATE_SQUARE,IMAGE_CHANNELS))
training_data = training_data.astype(np.float32)
training_data = training_data / 127.5 - 1.
print("Saving training image binary...")
np.save(training_binary_path,training_data)
elapsed = time.time()-start
print (f'Image preprocess time: {hms_string(elapsed)}')
else:
print("Loading previous training pickle...")
training_data = np.load(training_binary_path)
training_data
array([[[[-0.19999999, 0.23921573, -0.5294118 ],
, [-0.20784312, 0.22352946, -0.5294118 ],
, [-0.21568626, 0.21568632, -0.52156866],
, ...,
, [-0.10588235, 0.23921573, -0.5058824 ],
, [-0.09019607, 0.2313726 , -0.5058824 ],
, [-0.09803921, 0.2313726 , -0.5058824 ]],
,
, [[-0.20784312, 0.21568632, -0.5372549 ],
, [-0.20784312, 0.21568632, -0.5294118 ],
, [-0.19999999, 0.22352946, -0.5137255 ],
, ...,
, [-0.11372548, 0.2313726 , -0.5137255 ],
, [-0.12941176, 0.19215691, -0.54509807],
, [-0.12941176, 0.19215691, -0.54509807]],
,
, [[-0.2235294 , 0.20000005, -0.5529412 ],
, [-0.20784312, 0.21568632, -0.5294118 ],
, [-0.19215685, 0.22352946, -0.5137255 ],
, ...,
, [-0.12941176, 0.21568632, -0.5294118 ],
, [-0.10588235, 0.21568632, -0.52156866],
, [-0.10588235, 0.21568632, -0.52156866]],
,
, ...,
,
, [[-0.3333333 , -0.00392157, -0.5921569 ],
, [-0.3490196 , -0.01960784, -0.5921569 ],
, [-0.35686272, -0.02745098, -0.5921569 ],
, ...,
, [-0.372549 , -0.35686272, -0.4588235 ],
, [-0.23137254, -0.2235294 , -0.3490196 ],
, [-0.4588235 , -0.45098037, -0.5764706 ]],
,
, [[-0.35686272, -0.02745098, -0.60784316],
, [-0.3333333 , -0.00392157, -0.5764706 ],
, [-0.3333333 , -0.00392157, -0.5686275 ],
, ...,
, [-0.4823529 , -0.47450978, -0.5764706 ],
, [-0.31764704, -0.32549018, -0.44313723],
, [-0.3098039 , -0.3098039 , -0.42745095]],
,
, [[-0.3490196 , -0.01960784, -0.6 ],
, [-0.35686272, -0.02745098, -0.6 ],
, [-0.36470586, -0.03529412, -0.6 ],
, ...,
, [-0.5764706 , -0.5686275 , -0.67058825],
, [-0.4823529 , -0.4980392 , -0.6156863 ],
, [-0.29411763, -0.3098039 , -0.42745095]]],
,
,
, [[[-0.19999999, 0.19215691, -0.5058824 ],
, [-0.19215685, 0.20000005, -0.4980392 ],
, [-0.19215685, 0.20000005, -0.4980392 ],
, ...,
, [-0.10588235, 0.254902 , -0.52156866],
, [-0.11372548, 0.24705887, -0.5372549 ],
, [-0.12156862, 0.23921573, -0.54509807]],
,
, [[-0.19215685, 0.20000005, -0.4980392 ],
, [-0.18431371, 0.20784318, -0.49019605],
, [-0.19215685, 0.20000005, -0.4980392 ],
, ...,
, [-0.11372548, 0.24705887, -0.5294118 ],
, [-0.1372549 , 0.22352946, -0.54509807],
, [-0.1372549 , 0.22352946, -0.54509807]],
,
, [[-0.1607843 , 0.2313726 , -0.46666664],
, [-0.16862744, 0.22352946, -0.47450978],
, [-0.18431371, 0.20784318, -0.49019605],
, ...,
, [-0.1372549 , 0.22352946, -0.5529412 ],
, [-0.12941176, 0.22352946, -0.5137255 ],
, [-0.12156862, 0.2313726 , -0.5058824 ]],
,
, ...,
,
, [[-0.32549018, -0.02745098, -0.58431375],
, [-0.32549018, -0.01176471, -0.5921569 ],
, [-0.3333333 , -0.00392157, -0.6156863 ],
, ...,
, [-0.6627451 , -0.6862745 , -0.7254902 ],
, [-0.21568626, -0.21568626, -0.3098039 ],
, [-0.30196077, -0.30196077, -0.38823527]],
,
, [[-0.32549018, -0.03529412, -0.5764706 ],
, [-0.3098039 , -0.02745098, -0.5686275 ],
, [-0.32549018, -0.01960784, -0.5921569 ],
, ...,
, [-0.6627451 , -0.6862745 , -0.7254902 ],
, [-0.40392154, -0.40392154, -0.4980392 ],
, [-0.29411763, -0.29411763, -0.38823527]],
,
, [[-0.30196077, -0.02745098, -0.54509807],
, [-0.32549018, -0.04313725, -0.5764706 ],
, [-0.34117645, -0.05098039, -0.5921569 ],
, ...,
, [-0.5294118 , -0.56078434, -0.5921569 ],
, [-0.58431375, -0.58431375, -0.6784314 ],
, [-0.42745095, -0.42745095, -0.52156866]]],
,
,
, [[[-0.19215685, 0.20000005, -0.4980392 ],
, [-0.18431371, 0.20784318, -0.49019605],
, [-0.20784312, 0.18431377, -0.5137255 ],
, ...,
, [-0.09019607, 0.2313726 , -0.4980392 ],
, [-0.10588235, 0.24705887, -0.47450978],
, [-0.12941176, 0.22352946, -0.4980392 ]],
,
, [[-0.1607843 , 0.2313726 , -0.46666664],
, [-0.1607843 , 0.2313726 , -0.46666664],
, [-0.18431371, 0.20784318, -0.49019605],
, ...,
, [-0.08235294, 0.24705887, -0.4980392 ],
, [-0.12941176, 0.2313726 , -0.5137255 ],
, [-0.1372549 , 0.21568632, -0.52156866]],
,
, [[-0.16862744, 0.22352946, -0.47450978],
, [-0.16862744, 0.22352946, -0.47450978],
, [-0.19215685, 0.20000005, -0.4980392 ],
, ...,
, [-0.09803921, 0.2313726 , -0.54509807],
, [-0.12941176, 0.2313726 , -0.54509807],
, [-0.12156862, 0.23921573, -0.5294118 ]],
,
, ...,
,
, [[-0.3098039 , 0.00392163, -0.54509807],
, [-0.32549018, -0.01176471, -0.5764706 ],
, [-0.34117645, -0.02745098, -0.60784316],
, ...,
, [-0.45098037, -0.46666664, -0.56078434],
, [-0.20784312, -0.2235294 , -0.3490196 ],
, [-0.31764704, -0.3333333 , -0.45098037]],
,
, [[-0.3098039 , 0.01176476, -0.54509807],
, [-0.32549018, 0.00392163, -0.5686275 ],
, [-0.3490196 , -0.01960784, -0.6 ],
, ...,
, [-0.56078434, -0.5764706 , -0.67058825],
, [-0.4352941 , -0.45098037, -0.56078434],
, [-0.27843136, -0.29411763, -0.40392154]],
,
, [[-0.34117645, -0.01176471, -0.5764706 ],
, [-0.3490196 , -0.01176471, -0.5921569 ],
, [-0.3490196 , -0.01960784, -0.6156863 ],
, ...,
, [-0.54509807, -0.56078434, -0.654902 ],
, [-0.6156863 , -0.6313726 , -0.7254902 ],
, [-0.41176468, -0.42745095, -0.5294118 ]]],
,
,
, ...,
,
,
, [[[-0.19215685, 0.19215691, -0.5294118 ],
, [-0.20784312, 0.17647064, -0.54509807],
, [-0.21568626, 0.1686275 , -0.5529412 ],
, ...,
, [-0.1372549 , 0.20784318, -0.5294118 ],
, [-0.12941176, 0.20000005, -0.47450978],
, [-0.12941176, 0.20000005, -0.47450978]],
,
, [[-0.20784312, 0.17647064, -0.5294118 ],
, [-0.19999999, 0.18431377, -0.52156866],
, [-0.20784312, 0.17647064, -0.5294118 ],
, ...,
, [-0.1372549 , 0.20784318, -0.5294118 ],
, [-0.11372548, 0.20000005, -0.4980392 ],
, [-0.11372548, 0.20000005, -0.4980392 ]],
,
, [[-0.20784312, 0.1686275 , -0.5137255 ],
, [-0.19215685, 0.18431377, -0.4980392 ],
, [-0.19215685, 0.18431377, -0.49019605],
, ...,
, [-0.10588235, 0.23921573, -0.4980392 ],
, [-0.09803921, 0.20784318, -0.52156866],
, [-0.09803921, 0.20000005, -0.52156866]],
,
, ...,
,
, [[-0.34117645, -0.01960784, -0.5372549 ],
, [-0.3490196 , -0.02745098, -0.56078434],
, [-0.3490196 , -0.02745098, -0.5764706 ],
, ...,
, [-0.5921569 , -0.5921569 , -0.6627451 ],
, [-0.21568626, -0.25490195, -0.38039213],
, [-0.26274508, -0.30196077, -0.4352941 ]],
,
, [[-0.35686272, -0.05098039, -0.5686275 ],
, [-0.35686272, -0.05098039, -0.58431375],
, [-0.36470586, -0.05098039, -0.6 ],
, ...,
, [-0.60784316, -0.60784316, -0.6784314 ],
, [-0.40392154, -0.42745095, -0.56078434],
, [-0.29411763, -0.31764704, -0.45098037]],
,
, [[-0.3490196 , -0.04313725, -0.5686275 ],
, [-0.3490196 , -0.04313725, -0.5764706 ],
, [-0.35686272, -0.04313725, -0.6 ],
, ...,
, [-0.58431375, -0.58431375, -0.654902 ],
, [-0.6 , -0.62352943, -0.75686276],
, [-0.41176468, -0.42745095, -0.5686275 ]]],
,
,
, [[[-0.20784312, 0.18431377, -0.5137255 ],
, [-0.20784312, 0.18431377, -0.5137255 ],
, [-0.20784312, 0.18431377, -0.5137255 ],
, ...,
, [-0.12941176, 0.2313726 , -0.56078434],
, [-0.11372548, 0.20784318, -0.56078434],
, [-0.11372548, 0.20784318, -0.56078434]],
,
, [[-0.20784312, 0.18431377, -0.5137255 ],
, [-0.20784312, 0.18431377, -0.5137255 ],
, [-0.20784312, 0.18431377, -0.5137255 ],
, ...,
, [-0.10588235, 0.254902 , -0.5372549 ],
, [-0.12156862, 0.21568632, -0.5529412 ],
, [-0.12156862, 0.21568632, -0.5529412 ]],
,
, [[-0.19215685, 0.20000005, -0.4980392 ],
, [-0.19999999, 0.19215691, -0.4980392 ],
, [-0.20784312, 0.18431377, -0.5137255 ],
, ...,
, [-0.12156862, 0.24705887, -0.5529412 ],
, [-0.12941176, 0.23921573, -0.5294118 ],
, [-0.12941176, 0.23921573, -0.5372549 ]],
,
, ...,
,
, [[-0.32549018, -0.01176471, -0.56078434],
, [-0.3490196 , -0.03529412, -0.5921569 ],
, [-0.35686272, -0.04313725, -0.62352943],
, ...,
, [-0.84313726, -0.827451 , -0.9372549 ],
, [-0.372549 , -0.38823527, -0.4980392 ],
, [-0.19999999, -0.21568626, -0.32549018]],
,
, [[-0.3490196 , -0.02745098, -0.5921569 ],
, [-0.3490196 , -0.03529412, -0.6 ],
, [-0.35686272, -0.03529412, -0.6313726 ],
, ...,
, [-0.79607844, -0.78039217, -0.8901961 ],
, [-0.47450978, -0.49019605, -0.6156863 ],
, [-0.34117645, -0.35686272, -0.47450978]],
,
, [[-0.34117645, -0.01176471, -0.5921569 ],
, [-0.3333333 , -0.00392157, -0.5921569 ],
, [-0.34117645, -0.01960784, -0.62352943],
, ...,
, [-0.5921569 , -0.58431375, -0.69411767],
, [-0.56078434, -0.5686275 , -0.70980394],
, [-0.5529412 , -0.56078434, -0.7019608 ]]],
,
,
, [[[-0.23137254, 0.15294123, -0.54509807],
, [-0.19999999, 0.18431377, -0.5137255 ],
, [-0.18431371, 0.20000005, -0.4980392 ],
, ...,
, [-0.12941176, 0.20784318, -0.5529412 ],
, [-0.12941176, 0.22352946, -0.4823529 ],
, [-0.12941176, 0.22352946, -0.4823529 ]],
,
, [[-0.17647058, 0.20784318, -0.49019605],
, [-0.18431371, 0.20000005, -0.5058824 ],
, [-0.19215685, 0.19215691, -0.5058824 ],
, ...,
, [-0.12156862, 0.21568632, -0.5529412 ],
, [-0.1372549 , 0.21568632, -0.5137255 ],
, [-0.1372549 , 0.21568632, -0.5137255 ]],
,
, [[-0.16862744, 0.21568632, -0.4823529 ],
, [-0.17647058, 0.20784318, -0.49019605],
, [-0.18431371, 0.20000005, -0.4980392 ],
, ...,
, [-0.12156862, 0.20784318, -0.56078434],
, [-0.15294117, 0.20784318, -0.5529412 ],
, [-0.15294117, 0.20784318, -0.5529412 ]],
,
, ...,
,
, [[-0.3490196 , -0.03529412, -0.58431375],
, [-0.3490196 , -0.03529412, -0.58431375],
, [-0.3490196 , -0.03529412, -0.5764706 ],
, ...,
, [-0.58431375, -0.58431375, -0.6313726 ],
, [-0.25490195, -0.30196077, -0.38823527],
, [-0.24705881, -0.29411763, -0.372549 ]],
,
, [[-0.3490196 , -0.03529412, -0.5921569 ],
, [-0.35686272, -0.04313725, -0.5921569 ],
, [-0.372549 , -0.05098039, -0.6 ],
, ...,
, [-0.60784316, -0.6 , -0.64705884],
, [-0.47450978, -0.5058824 , -0.5764706 ],
, [-0.31764704, -0.3490196 , -0.42745095]],
,
, [[-0.35686272, -0.03529412, -0.5921569 ],
, [-0.36470586, -0.04313725, -0.60784316],
, [-0.38823527, -0.06666666, -0.60784316],
, ...,
, [-0.5294118 , -0.52156866, -0.5686275 ],
, [-0.60784316, -0.6392157 , -0.70980394],
, [-0.4823529 , -0.5058824 , -0.5764706 ]]]], dtype=float32)
train_dataset = tf.data.Dataset.from_tensor_slices(training_data).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
Define a function to Build the Generator and Discriminator
def build_generator(seed_size, channels):
model = Sequential()
model.add(Dense(4*4*256,activation="relu",input_dim=seed_size))
model.add(Reshape((4,4,256)))
model.add(UpSampling2D())
model.add(Conv2D(256,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(256,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
# Output resolution, additional upsampling
model.add(UpSampling2D())
model.add(Conv2D(128,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
if GENERATE_RES>1:
model.add(UpSampling2D(size=(GENERATE_RES,GENERATE_RES)))
model.add(Conv2D(128,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
# Final CNN layer
model.add(Conv2D(channels,kernel_size=3,padding="same"))
model.add(Activation("tanh"))
return model
def build_discriminator(image_shape):
model = Sequential()
model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=image_shape,
padding="same"))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
model.add(ZeroPadding2D(padding=((0,1),(0,1))))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(512, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
return model
Also define a function for saving the images that are generated
def save_images(cnt,noise):
image_array = np.full((
PREVIEW_MARGIN + (PREVIEW_ROWS * (GENERATE_SQUARE+PREVIEW_MARGIN)),
PREVIEW_MARGIN + (PREVIEW_COLS * (GENERATE_SQUARE+PREVIEW_MARGIN)), 3),
255, dtype=np.uint8)
generated_images = generator.predict(noise)
generated_images = 0.5 * generated_images + 0.5
image_count = 0
for row in range(PREVIEW_ROWS):
for col in range(PREVIEW_COLS):
r = row * (GENERATE_SQUARE+16) + PREVIEW_MARGIN
c = col * (GENERATE_SQUARE+16) + PREVIEW_MARGIN
image_array[r:r+GENERATE_SQUARE,c:c+GENERATE_SQUARE] = generated_images[image_count] * 255
image_count += 1
output_path = os.path.join(DATA_PATH,'output')
if not os.path.exists(output_path):
os.makedirs(output_path)
filename = os.path.join(output_path,f"train-{cnt}.png")
im = Image.fromarray(image_array)
im.save(filename)
Generate a Test Image using the Noise
generator = build_generator(SEED_SIZE, IMAGE_CHANNELS)
noise = tf.random.normal([1, SEED_SIZE])
generated_image = generator(noise, training=False)
plt.imshow(generated_image[0, :, :, 0])
<matplotlib.image.AxesImage at 0x7f5f6934fe10>
<Figure size 432x288 with 1 Axes>
image_shape = (GENERATE_SQUARE,GENERATE_SQUARE,IMAGE_CHANNELS)
discriminator = build_discriminator(image_shape)
decision = discriminator(generated_image)
print (decision)
# This method returns a helper function to compute cross entropy loss
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
Define Optimizers for the two networks
generator_optimizer = tf.keras.optimizers.Adam(1.5e-4,0.5)
discriminator_optimizer = tf.keras.optimizers.Adam(1.5e-4,0.5)
Define a Train Step
Based on the GAN in the Keras Documentation
The Train step uses GradientTape
to enable the networks to train at the same time but separately. This allows us to apply the weight updates on our own so that we can handle it manually instead of having TF apply it automatically to the network
# Notice the use of `tf.function`
# This annotation causes the function to be "compiled".
@tf.function
def train_step(images):
seed = tf.random.normal([BATCH_SIZE, SEED_SIZE])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(seed, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
return gen_loss,disc_loss
Define the Tranining Process
def train(dataset, epochs):
fixed_seed = np.random.normal(0, 1, (PREVIEW_ROWS * PREVIEW_COLS, SEED_SIZE))
start = time.time()
for epoch in range(epochs):
epoch_start = time.time()
gen_loss_list = []
disc_loss_list = []
for image_batch in dataset:
t = train_step(image_batch)
gen_loss_list.append(t[0])
disc_loss_list.append(t[1])
g_loss = sum(gen_loss_list) / len(gen_loss_list)
d_loss = sum(disc_loss_list) / len(disc_loss_list)
epoch_elapsed = time.time()-epoch_start
print (f'Epoch {epoch+1}, gen loss={g_loss},disc loss={d_loss},'\
' {hms_string(epoch_elapsed)}')
save_images(epoch,fixed_seed)
elapsed = time.time()-start
print (f'Training time: {hms_string(elapsed)}')
Train the Model
train(train_dataset, EPOCHS)
generator.save(os.path.join(DATA_PATH,"face_generator.h5"))