WEEK1 : CNN in TensorFlow (Cats and Dogs)

2021. 1. 10. 16:39

1. 라이브러리 로드

import os
import zipfile
import random
import tensorflow as tf
import shutil
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from shutil import copyfile
from os import getcwd

2. ImageDataGenerator 사용하기 위해서 파일 구조 변경하기

2-1. 데이터 압축 해제

path_cats_and_dogs = f"{getcwd()}/../tmp2/cats-and-dogs.zip"
shutil.rmtree('/tmp')

local_zip = path_cats_and_dogs
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')
zip_ref.close()

2-2. 각 디렉토리에 몇 개의 강아지, 고양이 이미지가 존재하는지 확인

print(len(os.listdir('/tmp/PetImages/Cat/')))
print(len(os.listdir('/tmp/PetImages/Dog/')))

# Expected Output:
# 1500
# 1500

2-3. 디렉토리 만들기

디렉토리의 구조는 다음과 같다

cats-v-dogs
- training
  - dogs
  - cats
- testing
  - dogs
  - cats

# Use os.mkdir to create your directories
# You will need a directory for cats-v-dogs, and subdirectories for training
# and testing. These in turn will need subdirectories for 'cats' and 'dogs'
try:
    os.mkdir("/tmp/cats-v-dogs")
    os.mkdir("/tmp/cats-v-dogs/training")
    os.mkdir("/tmp/cats-v-dogs/training/dogs")
    os.mkdir("/tmp/cats-v-dogs/training/cats")
    os.mkdir("/tmp/cats-v-dogs/testing")
    os.mkdir("/tmp/cats-v-dogs/testing/dogs")
    os.mkdir("/tmp/cats-v-dogs/testing/cats")
except OSError:
    pass

2-4. 이미지 복사하기

Cat 폴더에 있던 고양이 이미지를 training/cats 디렉토리와 testing/cats 디렉토리에 나누어 저장
Dog 폴더에 있던 강아지 이미지를 training/dogs 디렉토리와 testing/dogs 디렉토리에 나누어 저장

def split_data(SOURCE, TRAINING, TESTING, SPLIT_SIZE):
# YOUR CODE STARTS HERE
# YOUR CODE ENDS HERE
    file = os.listdir(SOURCE)
    random.sample(file, len(file))
    
    num_of_train = int(len(file)*SPLIT_SIZE)
    num_of_test = len(file)-num_of_train
    
    for i in range(num_of_train):
        file_name = file.pop()
        copyfile(os.path.join(SOURCE, file_name), os.path.join(TRAINING, file_name))
    for i in range(num_of_test):
        file_name = file.pop()
        copyfile(os.path.join(SOURCE, file_name), os.path.join(TESTING, file_name))

CAT_SOURCE_DIR = "/tmp/PetImages/Cat/"
TRAINING_CATS_DIR = "/tmp/cats-v-dogs/training/cats/"
TESTING_CATS_DIR = "/tmp/cats-v-dogs/testing/cats/"
DOG_SOURCE_DIR = "/tmp/PetImages/Dog/"
TRAINING_DOGS_DIR = "/tmp/cats-v-dogs/training/dogs/"
TESTING_DOGS_DIR = "/tmp/cats-v-dogs/testing/dogs/"

split_size = .9
split_data(CAT_SOURCE_DIR, TRAINING_CATS_DIR, TESTING_CATS_DIR, split_size)
split_data(DOG_SOURCE_DIR, TRAINING_DOGS_DIR, TESTING_DOGS_DIR, split_size)

2-5. train / test 할 강아지, 고양이 이미지 수 확인

print(len(os.listdir('/tmp/cats-v-dogs/training/cats/')))
print(len(os.listdir('/tmp/cats-v-dogs/training/dogs/')))
print(len(os.listdir('/tmp/cats-v-dogs/testing/cats/')))
print(len(os.listdir('/tmp/cats-v-dogs/testing/dogs/')))

# Expected output:
# 1350
# 1350
# 150
# 150

3. 모델 정의하기

여기서는 convolutional layer를 사용하였다.
여기서 주의할 점은 input_shape 지정해주는 것이다.
강아지와 고양이를 분류하는 이진 분류 문제이기 때문에 마지막 레이어의 node 수는 1개로, activation function으로는 sigmoid를 사용해주었다.
이진 분류 문제이기 때문에 loss는 binary_crossentropy를 사용하였다.

# DEFINE A KERAS MODEL TO CLASSIFY CATS V DOGS
# USE AT LEAST 3 CONVOLUTION LAYERS
model = tf.keras.models.Sequential([
# YOUR CODE HERE
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPool2D(2, 2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu'),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=RMSprop(lr=0.001), loss='binary_crossentropy', metrics=['acc'])

4. data generator

ImageDataGenerator 만든다 (학습이 더 잘 될 수 있도록 rescale=1./255를 설정해 0~1 사이의 값으로 정규화 해주었다.)
train_datagen과 validation_datagen에서의 target_size는 무조건 동일해야 한다.
class_mode는 이진 분류 문제이기 때문에 'binary'로 설정해주었다. (실제 고양이, 강아지 이 2개의 하위폴더만 존재)

TRAINING_DIR = "/tmp/cats-v-dogs/training"
train_datagen = ImageDataGenerator(rescale=1./255)

# NOTE: YOU MUST USE A BATCH SIZE OF 10 (batch_size=10) FOR THE 
# TRAIN GENERATOR.
train_generator = train_datagen.flow_from_directory(
    TRAINING_DIR,
    target_size=(300, 300),
    batch_size = 10,
    class_mode='binary'
)

VALIDATION_DIR = "/tmp/cats-v-dogs/testing"
validation_datagen = ImageDataGenerator(rescale=1./255)

# NOTE: YOU MUST USE A BACTH SIZE OF 10 (batch_size=10) FOR THE 
# VALIDATION GENERATOR.
validation_generator = validation_datagen.flow_from_directory(
    VALIDATION_DIR,
    target_size=(300, 300),
    batch_size=10,
    class_mode='binary'
)


# Expected Output:
# Found 2700 images belonging to 2 classes.
# Found 300 images belonging to 2 classes.

5. 모델 학습

모델 학습을 위해 train data, epochs 지정해준다.
추가로 validation score 확인하기 위해 validation data를 지정해주었다.

history = model.fit_generator(train_generator,
                              epochs=2,
                              verbose=1,
                              validation_data=validation_generator)

6. 모델 분석

train data에서의 accuracy, loss 그리고 validation data에서의 accuracy, loss를 이용하여 모델을 개선한다.
이 값을 이용해 현재 overfitting이 일어나고 있는지, underfitting이 일어나고 있는지,
어떠한 클래스를 잘 예측하지 못하는지 등을 파악할 수 있다.

# PLOT LOSS AND ACCURACY
%matplotlib inline

import matplotlib.image  as mpimg
import matplotlib.pyplot as plt

#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
acc=history.history['acc']
val_acc=history.history['val_acc']
loss=history.history['loss']
val_loss=history.history['val_loss']

epochs=range(len(acc)) # Get number of epochs

#------------------------------------------------
# Plot training and validation accuracy per epoch
#------------------------------------------------
plt.plot(epochs, acc, 'r', "Training Accuracy")
plt.plot(epochs, val_acc, 'b', "Validation Accuracy")
plt.title('Training and validation accuracy')
plt.figure()

#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot(epochs, loss, 'r', "Training Loss")
plt.plot(epochs, val_loss, 'b', "Validation Loss")


plt.title('Training and validation loss')

# Desired output. Charts with training and validation metrics. No crash :)

저작자표시 비영리 변경금지

'🙂 > Coursera_TF' 카테고리의 다른 글

WEEK3 : CNN in TensorFlow (transfer learning, dropout) (0)	2021.01.10
WEEK2 : CNN in TensorFlow (data augmentation) (0)	2021.01.10
WEEK4 : Introduction to TensorFlow for Artificial Intelligence (ImageDataGenerator) (0)	2021.01.10
WEEK3 : Introduction to TensorFlow for Artificial Intelligence (CNN, Conv2D, MaxPool2D) (0)	2021.01.07
WEEK2 : Introduction to TensorFlow for Artificial Intelligence (fashion_mnist, callback) (0)	2021.01.04

순간 기록

WEEK1 : CNN in TensorFlow (Cats and Dogs)

1. 라이브러리 로드

2. ImageDataGenerator 사용하기 위해서 파일 구조 변경하기

3. 모델 정의하기

4. data generator

5. 모델 학습

6. 모델 분석

'🙂 > Coursera_TF' 카테고리의 다른 글

+ Recent posts

티스토리툴바