DL(딥러닝) : 데이터 증강 (Augmentation) 학습

DEEP LEARNING/Deep Learning Library

DL(딥러닝) : 데이터 증강 (Augmentation) 학습

신강희 2024. 4. 30. 17:53

< Image Augmentation >

# 데이터 증강(Augmentation)은 기계 학습에서 널리 사용되는 기술 중 하나로, 기존의 데이터를 변형하거나 조작하여 새로운 데이터를 생성하는 과정을 말한다.

# 이는 모델의 일반화 성능을 향상시키고, 과적합을 줄이며, 데이터의 다양성을 증가시키는 데 사용

<예시를 통해서 코딩해 보자>

Cats v Dogs 로 다음처럼 모델링 하고, 학습시켜본다.

4 convolutional layers with 32, 64, 128 and 128 convolutions

train for 100 epochs

# 리눅스에서 이미지 파일을 불러오는 명령어 wget

# 구글은 리눅스 서버 환경임

!wget --no-check-certificate \

https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip \

-O /tmp/cats_and_dogs_filtered.zip

--2024-04-18 07:34:00-- https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.207, 2607:f8b0:4023:c0d::cf, 2607:f8b0:4023:c03::cf Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.207|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 68606236 (65M) [application/zip] Saving to: ‘/tmp/cats_and_dogs_filtered.zip’ /tmp/cats_and_dogs_ 100%[===================>] 65.43M 33.9MB/s in 1.9s 2024-04-18 07:34:02 (33.9 MB/s) - ‘/tmp/cats_and_dogs_filtered.zip’ saved [68606236/68606236]

# 파이썬을 이용해서 압축을 풀어야 한다

import zipfile

file = zipfile.ZipFile('/tmp/cats_and_dogs_filtered.zip')

# extractall('/tmp') 파라미터 안에 지정한 경로에 압축을 모두 풀어주는 명령어

file.extractall('/tmp')

base_dir = '/tmp/cats_and_dogs_filtered'

train_dir = '/tmp/cats_and_dogs_filtered/train'

val_dir = '/tmp/cats_and_dogs_filtered/validation'

import tensorflow as tf

from keras.preprocessing.image import ImageDataGenerator

from keras.models import Sequential

from keras.layers import Conv2D, MaxPooling2D

from keras.layers import Flatten, Dense

def build_model() :

model = Sequential()

model.add( Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))

model.add( MaxPooling2D(2, 2))

model.add( Conv2D(64, (3, 3), activation='relu'))

model.add( MaxPooling2D(2, 2))

model.add( Conv2D(128, (3, 3), activation='relu'))

model.add( MaxPooling2D(2, 2))

model.add( Conv2D(128, (3, 3), activation='relu'))

model.add( MaxPooling2D(2, 2))

model.add(Flatten())

model.add( Dense(512, activation='relu'))

model.add( Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss= 'binary_crossentropy', metrics=['accuracy'])

return model

model = build_model()

# 파일로 존재하는 데이터를, 메모리에 준비

# ImageDataGenerator 라이브러리를 이용하면 된다.

# 이 코드가, 파일을 넘파이로 바꿔서 메모리로 만드는 코드 : 피쳐 스케일링해서 (rescale=)

ImageDataGenerator(rescale= 1/255)

<keras.src.preprocessing.image.ImageDataGenerator at 0x792538e75b70>

# 이미지 증강까지 해서 메모리에 데이터 생성하는 방법 : 파라미터안에 코드를 추가한다

train_datagen = ImageDataGenerator(rescale= 1/255,

rotation_range= 40,

width_shift_range= 0.2,

height_shift_range= 0.2,

shear_range= 0.2,

zoom_range= 0.2,

horizontal_flip= True,

fill_mode= 'nearest')

# target_size는 위에서 만든 모델의 input_shape과 일치해야 한다. 행과 열만 rgb값은 상관없다.

train_generator = train_datagen.flow_from_directory(train_dir,

target_size=(150, 150),

class_mode= 'binary')

Found 2000 images belonging to 2 classes.

# X_train, y_train => train_generator 는 두개의 train 정보를 다 가지고 있다.

val_datagen = ImageDataGenerator(rescale= 1/255)

val_generator = val_datagen.flow_from_directory(val_dir,

target_size= (150, 150),

class_mode= 'binary')

Found 1000 images belonging to 2 classes.

# 학습중에 얼리스탑 시키기 위해 파라미터에 넣을 변수 생성

early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# train_generator이 두개의 train을 다 가지고 있기 때문에 이것 한개만 적어주면 된다.

epoch_history = model.fit(train_generator,

epochs=20,

validation_data= val_generator,

callbacks= [early_stop])

ㄴ 학습진행에 시간이 소요된다.

다음게시글에 이어서 추가 학습후에 예측까지 진행해 보자.

728x90

'DEEP LEARNING > Deep Learning Library' 카테고리의 다른 글

DL(딥러닝) : Facebook의 Prophet 활용한 시계열 데이터 학습 & 예측 (2)	2024.05.01
DL(딥러닝) : Augmentation로 학습된 AI Transfer Learning & Fine Tunning (2)	2024.05.01
DL(딥러닝) : CNN (Conv2D, MaxPooling2D, Flatten, Dense) 필터링으로 정확도 높이기 (0)	2024.04.30
DL(딥러닝) : Tensflow의 Fashion-MNIST 활용(DNN) (2) (0)	2024.04.30
DL(딥러닝) : Tensflow의 Fashion-MNIST 활용(DNN) (1) (0)	2024.04.30

현재글DL(딥러닝) : 데이터 증강 (Augmentation) 학습

실습 데이터 다운 Git : https://github.com/sorktjrrb/

EC2, 딥러닝, pandas, dl, python pandas, AWS Lambda, android studio, ML, streamlit, RESTful API, mysql, 안드로이드 스튜디오, java, docker, 데이터 분석, mysql connector, 머신러닝, mysql workbench, AWS, Python,

Today :
Yesterday :

Byte의 발자취