Data Science/FullStackDeepLearning

[FSDL] Pre-Lab 02b: Training a CNN on Synthetic Handwriting Data

김 기승 2023. 6. 26. 14:19

Why convolutions?

가장 기본적인 신경망인 다층 퍼셉트론(MLP)은 파라미터화된 선형 변환과 비선형 변환을 번갈아 가며 구축한다.
이 조합은 고정된 크기의 배열을 받아들이고 고정된 크기의 배열을 반환하는 함수만 있다면 임의의 복잡도를 가진 함수를 표현할 수 있다.

def any_function_you_can_imagine(x: torch.Tensor["A"]) -> torch.Tensor["B"]:
    return some_mlp_that_might_be_impractically_huge(x)

하지만 모든 함수에 해당 type signature가 있는 것은 아니다.
예를 들어, 크기가 다른 이미지의 콘텐츠를 식별하는 문제는 MLP로 해결할 수 없다.
더 큰 문제는 MLP가 너무 일반적이어서 효율적이지 않다는 점이다.
각 레이어는 입력에 비정형 행렬을 적용한다.
그러나 우리가 적용하고자 하는 대부분의 데이터는 고도로 정형화되어 있으며, 이러한 구조를 활용하면 모델을 더 효율적으로 만들 수 있다.
비정형 모델을 사용하면 원칙적으로 어떤 함수든 학습할 수 있다는 점이 매력적으로 보일 수 있다.
하지만 대부분의 함수는 상식에서 크게 벗어난다.
데이터에서 학습하고자 하는 함수의 종류에 대한 몇 가지 가정을 모델의 아키텍처에 encode하는 것이 유용하다.

Convolutions are the local, translation-equivariant linear transforms

데이터에서 가장 일반적인 유형의 구조 중 하나는 ”지역성(locality)”으로, 픽셀을 이해하거나 예측하는 데 가장 관련성이 높은 정보는 해당 픽셀 주변의 픽셀이다.
지역성은 물리 세계의 기본적인 특징이므로 사진이나 오디오 녹음과 같은 물리적 관찰로부터 추출한 데이터에서 나타난다.
지역성은 입력의 가장 의미 있는 선형 변환이 모든 항목에 똑같이 큰 가중치를 갖는 것이 아니라 서로 가까운 소수의 항목에만 큰 가중치를 갖는다는 것을 의미한다.
입력의 가장 의미 있는 선형 변환은 모든 항목에 똑같은 가중치를 적용하는 것이 아니라 가까운 소수의 항목에만 큰 가중치를 적용하는 것이다.

import torch

generic_linear_transform = torch.randn(8, 1)
print("generic:", generic_linear_transform, sep="\n")

local_linear_transform = torch.tensor([
    [0, 0, 0] + [random.random(), random.random(), random.random()] + [0, 0]]).T
print("local:", local_linear_transform, sep="\n")

generic:
tensor([[ 1.6399],
        [-0.1946],
        [ 0.1188],
        [-1.3777],
        [-0.0793],
        [ 1.1056],
        [ 0.0333],
        [-0.3400]])
local:
tensor([[0.0000],
        [0.0000],
        [0.0000],
        [0.1917],
        [0.6447],
        [0.7482],
        [0.0000],
        [0.0000]])

일반적으로 관찰되는 또 다른 유형의 구조는 "변환 등가성(translation equivariance)"으로, 왼쪽 상단 픽셀 위치는 그 자체로 오른쪽 하단 위치나 이미지 중앙의 위치와 의미 있게 다르지 않습니다.
절대적인 관계보다 상대적인 관계가 더 중요합니다.
이미지에서 번역 등가성이 발생하는 이유는 일반적으로 이미지를 촬영할 때 특권적인 유리한 점이 없기 때문이다.
왼쪽이나 오른쪽으로 몇 피트 떨어진 곳에 서서 이미지를 찍어도 원근감의 변화에 따라 이미지의 모든 내용이 달라질 수 있다.
변환 등가성은 입력의 한 위치에서 의미 있는 선형 변환이 다른 모든 지점에서도 의미 있을 가능성이 높다는 것을 의미한다.
왼쪽 하단에서 유용한 datapoint에서의 선형 변환에 대해 학습한 다음, 오른쪽 상단에서 유용한 다른 datapoint에 적용할 수 있다.

generic_linear_transform = torch.arange(8)[:, None]
print("generic:", generic_linear_transform, sep="\n")

equivariant_linear_transform = torch.stack([torch.roll(generic_linear_transform[:, 0], ii) for ii in range(8)], dim=1)
print("translation invariant:", equivariant_linear_transform, sep="\n")

generic:
tensor([[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7]])
translation invariant:
tensor([[0, 7, 6, 5, 4, 3, 2, 1],
        [1, 0, 7, 6, 5, 4, 3, 2],
        [2, 1, 0, 7, 6, 5, 4, 3],
        [3, 2, 1, 0, 7, 6, 5, 4],
        [4, 3, 2, 1, 0, 7, 6, 5],
        [5, 4, 3, 2, 1, 0, 7, 6],
        [6, 5, 4, 3, 2, 1, 0, 7],
        [7, 6, 5, 4, 3, 2, 1, 0]])

translation invariant인 선형 변환, convolution이라고 한다.
translation equivariance와 translation invariant에 자세한 설명은 문서를 참조.
선형 변환의 가중치가 서로 가까운 몇 개를 제외하고 대부분 0이면 해당 컨볼루션은 “커널”을 갖는다고 한다.

# the equivalent of torch.nn.Linear, but for a 1-dimensional convolution
conv_layer = torch.nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3)

conv_layer.weight  # aka kernel

# Output :
# Parameter containing:
# tensor([[[-0.4220, -0.5430,  0.1351]]], requires_grad=True)

일반적인 행렬 곱셈을 사용하여 커널을 입력에 적용하는 대신, 해당 커널을 반복해서 입력 위로 “sliding”하여 출력을 생성한다.
모든 컨볼루션 커널은 입력에 행렬을 곱하여 출력을 생성할 수 있는 동등한 행렬 형태를 갖는다.

conv_kernel_as_vector = torch.hstack([conv_layer.weight[0][0], torch.zeros(5)])
conv_layer_as_matrix = torch.stack([torch.roll(conv_kernel_as_vector, ii) for ii in range(8)], dim=0)
print("convolution matrix:", conv_layer_as_matrix, sep="\n")

convolution matrix:
tensor([[-0.4220, -0.5430,  0.1351,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000, -0.4220, -0.5430,  0.1351,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000, -0.4220, -0.5430,  0.1351,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -0.4220, -0.5430,  0.1351,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000, -0.4220, -0.5430,  0.1351,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.4220, -0.5430,  0.1351],
        [ 0.1351,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.4220, -0.5430],
        [-0.5430,  0.1351,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.4220]],
       grad_fn=<StackBackward0>)

We apply two-dimensional convolutions to images.

텍스트 인식기를 구축할 때 우리는 이미지로 작업하고 있다.
이미지에는 왼쪽/오른쪽, 위/아래라는 두 가지 차원의 translation equivariance이 있다.
따라서 2차원 컨볼루션을 사용하며, torch.nn에서 nn.Conv2d 레이어로 인스턴스화된다.
이미지용 컨볼루션 신경망은 매우 널리 사용되므로 신경망 맥락에서 "convolution"이라는 용어를 한정어 없이 사용할 때 2차원 컨볼루션을 의미하는 것으로 받아들일 수 있다.
Linear 레이어가 고정된 크기의 벡터 배치를 받아 고정된 크기의 벡터 배치를 반환하는 반면, Conv2d 레이어는 2차원 스택 피처 맵을 받아 2차원 스택 피처 맵 배치를 반환한다.
[torchtyping](https://github.com/patrick-kidger/torchtyping)에 기반한 pseudocode type은 다음과 같다.

StackedFeatureMapIn = torch.Tensor["batch", "in_channels", "in_height", "in_width"]
StackedFeatureMapOut = torch.Tensor["batch", "out_channels", "out_height", "out_width"]
def same_convolution_2d(x: StackedFeatureMapIn) -> StackedFeatureMapOut:

“map”은 공간을 연상시키는 의미로, 피처 맵은 피처가 공간적으로 어디에 있는지 알려준다.
RGB 이미지는 피처 맵이 쌓인 것이다.
세 개의 피처 맵으로 구성됩니다.
첫 번째는 '빨간색' 피처가 있는 위치, 두 번째는 '녹색', 세 번째는 '파란색'을 알려준다.

채널 수가 N개인 스택 특징 맵에 컨볼루션 레이어를 적용하면 채널 수가 N개인 스택 특징 맵을 다시 얻게 된다.
이러한 출력도 피처 맵의 스택이므로 다른 컨볼루션 레이어에 완벽히 허용되는 입력이 된다.
즉, 일반 선형 레이어를 함께 구성한 것처럼 컨볼루션 레이어를 함께 구성할 수 있다.
다시 선형 컨볼루션 사이에 비선형 함수를 엮어 컨볼루션 신경망, 즉 CNN을 생성한다.

Convolutional neural networks build up visual understanding layer by layer.

Q1. 이 피처 맵의 채널에 표시된 빨간색/녹색/파란색 레이블은 무엇을 의미하는가?
Q2. 네트워크의 15번째 계층의 32번째 채널에서 특정 위치의 활성화가 높다는 것은 무엇을 의미하는가?
자동으로 답을 알아낼 수 있는 보장된 방법은 없으며, 사람이 해석할 수 있는 결과라는 보장도 없다.
OpenAI의 Clarity 팀은 수년 동안 사진으로 학습된 최첨단 합성곱 신경망을 "reverse engineering”한 결과, 이러한 채널 중 상당수가 직접 해석할 수 있다는 사실을 발견했다.
예를 들어, 2014년 ImageNet 초대형 시각 인식 챌린지의 우승자인 InceptionV1이라고도 불리는 GoogLeNet에 이미지를 통과시키면 이미지를 바로 해석할 수 있다는 사실을 발견했다.
초기 레이어(왼쪽)의 채널은 "고주파 power” 또는 "45도 흑백 모서리”와 같은 단순한 개념의 맵 역할을 하고, 이후 레이어(오른쪽)의 채널은 “원”과 같은 점점 더 추상적인 개념의 피처 맵 역할을 하며, 결국에는 "늘어진 둥근 귀” 또는 “뾰족한 귀”와 같이 점점 더 복잡해진다.

Circuits Thread 블로그 게시물 시리즈에서 Open AI Clarity 팀은 가중치에 대한 면밀한 검토와 직접적인 실험을 결합하여 이러한 상위 수준의 기능이 GoogLeNet에서 어떻게 구성되는지에 대한 이해를 구축했다.
예를 들어, 처음 5개의 레이어에 있는 거의 모든 채널에 대해 합리적인 해석을 제공할 수 있다.
아래는 코드는 “가중치 탐색기”로, 기본적으로 이 탐색기는 conv2d1 레이어의 52번째 채널에서 시작하여 더 작은 위상에 민감한 필터로부터 큰 위상 불변 Gabor 필터를 구성한다.
이 채널은 커브 및 텍스처 감지기를 구성하는 데 사용되며, 이미지를 클릭하면 해당 채널의 가중치 탐색기 페이지로 이동하거나 layer 및 idx 인수를 변경할 수 있다.
자세한 내용은 InceptionV1의 초기 비전 블로그 포스팅을 참조
샘플 이미지의 활성화를 포함하여 더욱 풍부한 대화형 뷰를 보려면 "OpenAI 현미경에서 해당 뉴런 보기" 링크를 클릭
이 탐색기와 함께 제공되는 Circuits Thread에는 특히 컨볼루션 네트워크와 시각 지각 전반에 대한 직관을 개발하는 데 매우 유용한 경험적 관찰, 이론적 추측, 지혜가 있다.

Applying convolutions to handwritten characters: `CNN`s on `EMNIST`

text_recognizer.models에서 CNN 클래스를 로드하면 모델을 인스턴스화하는 데 data_config가 필요하다.
이것은 일반적인 제약 조건이 아니라 텍스트 인식 라이브러리의 구현 세부 사항이다.
하지만 데이터셋과 모델은 일반적으로 결합되어 있으므로 구성 정보를 공유하는 것이 일반적이다.

The `EMNIST` Handwritten Character Dataset

이전 lab에서와 마찬가지로 여기에서도 MNIST를 사용한다.
하지만 궁극적으로는 필기 텍스트 인식 시스템을 구축하는 것이 목표이므로 숫자뿐만 아니라 문자와 구두점도 처리해야 한다.
그래서 대신 문자와 구두점을 포함하는 EMNIST, 즉 확장된 MNIST를 사용한다.

import text_recognizer.data

emnist = text_recognizer.data.EMNIST()  # configure
print(emnist.__doc__)

손으로 쓴 문자 및 숫자의 데이터 세트이다.
EMNIST 데이터셋은 NIST 특수 데이터베이스 19에서 파생된 필기 문자 숫자 세트이다.
MNIST 데이터셋과 직접적으로 일치하는 28x28 픽셀 이미지 형식 및 데이터셋 구조로 변환되었다.
EMNIST ByClass: 814,255자, 62개의 불균형 클래스이다.
이 데이터셋을 바로 사용할 수 있도록 디스크에 다운로드하고,
더 빠르게 로드할 수 있도록 다시 포맷하고,
training, validation, test로 분할하는 데 필요한 모든 코드를 캡슐화하기 위해 PyTorch Lightning DataModule을 구축했다.

emnist.prepare_data()  # download, save to disk
emnist.setup()  # create torch.utils.data.Datasets, do train/val split

이 데이터셋은 리포지토리 폴더 내부의 디스크에 저장되지만 버전 관리에서는 추적되지 않는다.
git은 소스 코드 및 기타 텍스트 파일의 버전 관리에는 적합하지만 대용량 바이너리 데이터에는 적합하지 않다.
따라서 메타데이터만 추적하고 버전 관리한다.

!echo {emnist.data_dirname()}
!ls {emnist.data_dirname()}
!ls {emnist.data_dirname() / "raw" / "emnist"}

이 클래스에는 해당 메타데이터와 기본적인 설명 통계를 빠르게 살펴볼 수 있는 출력 메서드가 포함되어 있다.

emnist

EMNIST Dataset
Num classes: 83
Mapping: ['<B>', '<S>', '<E>', '<P>', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ', '!', '"', '#', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '?']
Dims: (1, 28, 28)
Train/val/test sizes: 260212, 65054, 53988
Batch x stats: (torch.Size([128, 1, 28, 28]), torch.float32, tensor(0.), tensor(0.1730), tensor(0.3309), tensor(1.))
Batch y stats: (torch.Size([128]), torch.int64, tensor(4), tensor(61))

.prepare_data와 .setup을 실행했기 때문에 올바른 메서드를 호출하면 이 DataModule이 DataLoader를 제공할 준비가 되어 있다고 볼 수 있다.
Lab 2a에서 설명한 것처럼 PyTorch Lightning API를 사용하면 Trainer 클래스 자체를 사용하지 않을 때도 이러한 종류의 편리함을 제공한다.

xs, ys = next(iter(emnist.train_dataloader()))

Putting convolutions in a `torch.nn.Module`

데이터와 data_config를 가지고 모델을 인스턴스화할 수 있다.

data_config = emnist.config()

cnn = text_recognizer.models.CNN(data_config)
cnn  # reveals the nn.Modules attached to our nn.Module

CNN(
  (conv1): ConvBlock(
    (conv): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu): ReLU()
  )
  (conv2): ConvBlock(
    (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu): ReLU()
  )
  (dropout): Dropout(p=0.25, inplace=False)
  (max_pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=12544, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=83, bias=True)
)

입력에 대해 이 네트워크를 실행할 수는 있지만 학습 없이는 올바른 출력을 생성할 것으로 기대할 수 없다.
.forward 메서드를 검사해 nn.Modules가 어떻게 사용되는지 확인할 수 있다.
1989년 [LeNet](https://doi.org/10.1162%2Fneco.1989.1.4.541) 아키텍처나 2012년 [AlexNet](https://doi.org/10.1145%2F3065386) 아키텍처와 유사한 “downsampling”을 적용하는 “pooling” 레이어를 사용하여 컨볼루션에 이어 비선형성을 적용한다.
최종 분류는 MLP가 수행한다.
MLP에 전달할 벡터를 얻기 위해 먼저 torch.flatten을 적용한다.

torch.flatten(torch.Tensor([[1, 2], [3, 4]]))

Design considerations for CNNs

AlexNet이 출시된 이후 10년간 CNN의 엔지니어링과 혁신에 대한 열기가 뜨거웠는데, dilated convolutions, residual connections, 배치 정규화 등이 2015년에만 나왔고 지금도 계속 연구 중이다.
”Devil is in the details”(악마는 디테일에 있다.)
일반적으로 DNN, 특히 CNN의 발전은 대부분 진화를 거듭해 왔다.
그러나 기존 아키텍처만큼 효과적인 새로운 아키텍처를 설계하기가 매우 어려울 수 있다.
직접 설계하는 것보다 기존 아키텍처를 수정하고 변형하는 것이 더 낫다.
해당 분야에 대해 잘 알지 못한다면 작업의 기반이 될 아키텍처를 처음 찾을 때 신뢰할 수 있는 aggregator(e.g. Torch IMage Models, timm)나 GitHub의 Papers With Code, 특히 컴퓨터 비전 섹션을 참조하는 것이 가장 좋다.
컴퓨터 비전에 관한 최신 Kaggle 경진 대회의 순위표를 확인하여 보다 상향식 접근 방식을 취할 수도 있다.

Shapes and padding

CNN의 .forward에는 shape을 변경하는 각 라인 뒤에 예상되는 텐서 모양을 나타내는 주석이 포함되어 있다.
shape을 추적하고 올바르게 처리하는 것은 CNN, 특히 고정된 모양 텐서에서만 작동할 수 있는 MLP를 포함하는 LeNet/AlexNet과 같은 아키텍처에 있어서 중요하다.
다양한 컨볼루션을 지원하는 경우 도형 연산은 매우 빠르게 복잡해진다.
shape에 있어서의 문제를 피하기 가장 쉬운 방법은 padding과 stride과 같은 컨볼루션 파라미터를 선택하여 컨볼루션 전후의 모양을 동일하게 유지하는 것이다.
kernel_size=3, stride=1에 padding=1로 설정하면 모양을 동일하게 유지할 수 있다.
단위 보폭과 홀수 커널 크기에서 입력을 동일한 크기로 유지하는 패딩은 kernel_size // 2이다.
shape이 변하면 텐서가 차지하는 GPU 메모리의 양도 변한다.
블록 내에서 크기를 고정하면 한 축의 변화를 제거할 수 있다.
풀링 레이어를 적용한 후 커널 수를 적절한 비율로 늘리면 전체 텐서 크기와 메모리 사용량을 일정하게 유지할 수 있다.

Parameters, computation, and bottlenecks

각 레이어의 요소(element)의 수(number)를 검토하면 한 레이어에 다른 모든 레이어보다 훨씬 많은 항목(entries)이 있는 것을 알 수 있다.

[p.numel() for p in cnn.parameters()]  # conv weight + bias, conv weight + bias, fc weight + bias, fc weight + bias

# Output : [576, 64, 36864, 64, 1605632, 128, 10624, 83]

가장 큰 레이어는 일반적으로 컨볼루션 컴포넌트와 MLP 컴포넌트 사이에 있는 레이어이다.

biggest_layer = [p for p in cnn.parameters() if p.numel() == max(p.numel() for p in cnn.parameters())][0]
biggest_layer.shape, cnn.fc_input_dim

# Output : (torch.Size([128, 12544]), 12544)

이 계층은 네트워크를 디스크에 저장하는 데 드는 비용에 지배적이다.
따라서 DropOut(드롭아웃)과 같은 정규화 기법과 pruning(가지 치기)와 같은 성능 최적화의 일반적인 대상이 된다.
하지만, 해당 레이어에 매개변수가 가장 많다고 해서 대부분의 계산 시간이 해당 레이어에서 소비된다는 의미는 아니다.
컨볼루션은 동일한 파라미터를 반복해서 재사용하므로, 해당 레이어가 수행하는 총 FLOP 수는 파라미터가 더 많은 레이어가 수행하는 것보다 훨씬 더 높을 수 있다.
- FLOP : 절대적인 연산량 (곱하기, 더하기 등)의 횟수

# for the Linear layers, number of multiplications per input == nparams
cnn.fc1.weight.numel()

# Output : 1605632

# for the Conv2D layers, it's more complicated

def approx_conv_multiplications(kernel_shape, input_size=(64, 28, 28)):  # this is a rough and dirty approximation
    num_kernel_elements = 1
    for dimension in kernel_shape[-3:]:
        num_kernel_elements *= dimension
    num_input_channels, num_kernels = input_size[0], kernel_shape[0]
    num_spatial_applications = ((input_size[1] - kernel_shape[-2] + 1) * (input_size[2] - kernel_shape[-1] + 1))
    mutliplications_per_kernel = num_spatial_applications * num_kernel_elements * num_input_channels
    return mutliplications_per_kernel * num_kernels

approx_conv_multiplications(cnn.conv2.conv.weight.shape)

# Output : 1594884096

# ratio of multiplications in the convolution to multiplications in the fully-connected layer is huge!
approx_conv_multiplications(cnn.conv2.conv.weight.shape) // cnn.fc1.weight.numel()

# Output : 993

컴퓨팅 하드웨어와 문제 특성에 따라 MLP 구성 요소 또는 컨볼루션 구성 요소가 중요한 병목 현상이 될 수 있다.
모델을 브라우저로 '유선'으로 전송할 때와 같이 메모리에 제약이 있는 경우에는 MLP 구성 요소가 병목 현상이 발생할 가능성이 높지만,
low-power(저전력) edge device(엣지 디바이스)에서 모델을 실행하거나,
엄격한 low-latency(저지연) 요구 사항이 있는
애플리케이션에서 모델을 실행할 때와 같이 컴퓨팅에 제약이 있는 경우에는 컨볼루션 구성 요소가 병목 현상이 발생할 가능성이 높다.
- Edge device : 데이터를 발생하는 기기로 데이터를 생성 또는 수집하는 사물 인터넷(IoT) 센서부터 비디오/감시 카메라, 인터넷에 연결된 가전 기기, 스마트폰과 같은 스마트 기기 등을 포함한다.

Training a `CNN` on `EMNIST` with the Lightning `Trainer` and `run_experiment`

“Lab 02a”에서 자세히 다룬 PyTorch Lightning을 사용하겠다.
이 리포지토리(training/run_experiment.py)에 있는 모델과 데이터셋을 사용하여 PyTorch Lightning으로 훈련하는 명령줄 인터페이스를 구현하는 간단한 스크립트를 제공한다.
pl.Trainer의 arguments가 우선이고 그 수가 많기 때문에, Model 또는 LitModel에 대해 구성 가능한 항목을 확인하려면 도움말 메시지의 마지막 수십 줄이 필요하다.

!python training/run_experiment.py --help --model_class CNN --data_class EMNIST  | tail -n 25

2023-06-23 04:02:50.347465: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
                        .. deprecated:: v1.5 Trainer argument
                        ``terminate_on_nan`` was deprecated in v1.5 and will
                        be removed in 1.7. Please use ``detect_anomaly``
                        instead.

Data Args:
  --batch_size BATCH_SIZE
                        Number of examples to operate on per forward step.
                        Default is 128.
  --num_workers NUM_WORKERS
                        Number of additional processes to load data. Default
                        is 4.

Model Args:
  --conv_dim CONV_DIM
  --fc_dim FC_DIM
  --fc_dropout FC_DROPOUT

LitModel Args:
  --optimizer OPTIMIZER
                        optimizer class from torch.optim
  --lr LR
  --one_cycle_max_lr ONE_CYCLE_MAX_LR
  --one_cycle_total_steps ONE_CYCLE_TOTAL_STEPS
  --loss LOSS           loss function from torch.nn.functional

run_experiment.py 파일은 모듈로 가져올 수도 있으므로 노트북에서 내용을 검사하고 구성 함수를 사용해 볼 수 있다. (run_experiment.py 스크립트는 따로 설명하지 않겠다. 전체적인 로직을 파악하는 것에 초점을 맞추기 위함이다.)

import training.run_experiment

print(training.run_experiment.main.__doc__)

Run an experiment.

    Sample command:
    ```
    python training/run_experiment.py --max_epochs=3 --gpus='0,' --num_workers=20 --model_class=MLP --data_class=MNIST
    ```

    For basic help documentation, run the command
    ```
    python training/run_experiment.py --help
    ```

    The available command line args differ depending on some of the arguments, including --model_class and --data_class.

    To see which command line args are available and read their documentation, provide values for those arguments
    before invoking --help, like so:
    ```
    python training/run_experiment.py --model_class=MLP --data_class=MNIST --help

이제 기본 인수를 사용하여 EMNIST에서 CNN에 대한 training 작업을 살펴보자.

gpus = int(torch.cuda.is_available())  # use GPUs if they're available

%run training/run_experiment.py --model_class CNN --data_class EMNIST --gpus {gpus}

WARNING:pytorch_lightning.loggers.tensorboard:Missing logger folder: training/logs/lightning_logs
INFO:pytorch_lightning.utilities.rank_zero:Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True, used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
| Name           | Type      | Params
---------------------------------------------
0 | model          | CNN       | 1.7 M 
1 | model.conv1    | ConvBlock | 640   
2 | model.conv2    | ConvBlock | 36.9 K
3 | model.dropout  | Dropout   | 0     
4 | model.max_pool | MaxPool2d | 0     
5 | model.fc1      | Linear    | 1.6 M 
6 | model.fc2      | Linear    | 10.7 K
7 | train_acc      | Accuracy  | 0     
8 | val_acc        | Accuracy  | 0     
9 | test_acc       | Accuracy  | 0     
---------------------------------------------
1.7 M     Trainable params
0         Non-trainable params
1.7 M     Total params
6.616     Total estimated model params size (MB)
Epoch 0: 100%
2542/2542 [00:31<00:00, 81.16it/s, loss=0.609, v_num=0, validation/loss=0.574, validation/acc=0.787]
INFO:pytorch_lightning.utilities.rank_zero:Best model saved at: /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_0/epoch=0000-validation.loss=0.574.ckpt
INFO:pytorch_lightning.utilities.rank_zero:Restoring states from the checkpoint path at /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_0/epoch=0000-validation.loss=0.574.ckpt
INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loaded model weights from checkpoint at /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_0/epoch=0000-validation.loss=0.574.ckpt
Testing DataLoader 0: 100%
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc          │    0.7834333777427673     │
│         test/loss         │    0.5791165828704834     │
└───────────────────────────┴───────────────────────────┘

가장 먼저 표시되는 것은 Lightning의 몇 가지 logger 메시지와 사용 가능한 하드웨어에 대한 몇 가지 정보이다.
그런 다음 모듈 이름, 매개변수 수, 모델 디스크 크기에 대한 정보를 포함한 모델 요약이 출력된다.
torchmetrics도 nn.Modules이므로 여기에 출력된다. (자세한 내용은 “Lab 02a”를 참조)
또한, training, validation 및 test 셋의 정확도를 추적하고 있다.
터미널에 "validation sanity check"를 언급하는 quick 메시지를 확인 할 수 있다.
- PyTorch Lightning은 첫 번째 training 에포크 전에 모델에 몇 개의 validation 데이터 배치를 실행시킨다.
이는 validation 루프가 처음 트리거되는 첫 번째 에포크가 끝날 시점, training 실행이 중단되는 것을 방지할 수 있으며, 때로는 training 시작 시 빠르게 중단되어 training에 몇 시간이 걸리는 경우도 있다.
검사를 끄려면 --num_sanity_val_steps=0을 사용해야 한다.
그러면 처리량 및 loss와 같은 metric과 함께 training 에포크의 진행 상황을 나타내는 막대가 출력된다.
첫 번째(그리고 유일한) 에포크가 끝나면 validation 세트에서 모델이 실행되고 집계된 loss와 정확도가 콘솔에 출력된다.
training이 끝나면, Trainer.test를 호출하여 test 세트의 성능을 확인한다.
일반적으로 테스트 정확도는 75~80% 정도이다.
훈련 중에 PyTorch Lightning은 training을 다시 시작하는 데 사용할 수 있는 체크포인트(파일 확장자 .ckpt)를 저장한다.
run_experiment의 마지막 줄 출력은 validation 세트에서 가장 우수한 성능을 가진 모델이 저장된 위치를 나타낸다.
체크포인트 동작은 ModelCheckpoint 콜백을 사용하여 구성된다.
이러한 체크포인트에는 모델 가중치가 포함된다.
이를 사용하면 노트북에서 모델을 잃어버려도 실험을 해 볼 수 있다.

# we use a sequence of bash commands to get the latest checkpoint's filename
#  by hand, you can just copy and paste it

list_all_log_files = "find training/logs/lightning_logs"  # find avoids issues with \n in filenames
filter_to_ckpts = "grep \.ckpt$"  # regex match on end of line
sort_version_descending = "sort -Vr"  # uses "version" sorting (-V) and reverses (-r)
take_first = "head -n 1"  # the first n elements, n=1

latest_ckpt, = ! {list_all_log_files} | {filter_to_ckpts} | {sort_version_descending} | {take_first}
latest_ckpt

모델을 다시 빌드하려면 run_experiment 스크립트의 몇 가지 구현 세부 사항을 고려해야 한다.
파싱된 args를 사용하여 데이터와 모델을 빌드한 다음, 이 세 가지(체크포인트, 인수, 모델)를 모두 사용하여 LightningModule을 빌드한다.
load_from_checkpoint 메서드를 사용하여 체크포인트에서 모든 LightningModule을 다시 인스턴스화할 수 있지만, 모델을 다시 로드하려면 인수를 다시 생성하고 전달해야 한다.
이 과정을 자동화 하는 것을 나중에 살펴볼 것이다.

import training.util
from argparse import Namespace

# if you change around model/data args in the command above, add them here
#  tip: define the arguments as variables, like we've done for gpus
#       and then add those variables to this dict so you don't need to
#       remember to update/copy+paste

args = Namespace(**{
    "model_class": "CNN",
    "data_class": "EMNIST"})

_, cnn = training.util.setup_data_and_model_from_args(args)

reloaded_model = text_recognizer.lit_models.BaseLitModel.load_from_checkpoint(
   latest_ckpt, args=args, model=cnn)

모델이 다시 로드되면 몇 가지 샘플 데이터에서 모델을 실행하고
그 결과를 확인할 수 있다.
이제 체크포인트에서 유망한 학습을 계속할 수 있다.
아래 코드를 실행하면, 학습한 모델을 다른 에포크에 대해 학습한다.
training loss가 이전 실행에서 끝났던 지점 근처에서 시작된다는 점에 유의해야 한다.
체크포인트의 클라우드 저장소와 함께 연동되면 더 저렴한 타입의 클라우드 인스턴스를 사용할 수 있다.

latest_ckpt, = ! {list_all_log_files} | {filter_to_ckpts} | {sort_version_descending} | {take_first}

# and we can change the training hyperparameters, like batch size
%run training/run_experiment.py --model_class CNN --data_class EMNIST --gpus {gpus} \
  --batch_size 64 --load_checkpoint {latest_ckpt}

INFO:pytorch_lightning.utilities.rank_zero:Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True, used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name           | Type      | Params
---------------------------------------------
0 | model          | CNN       | 1.7 M 
1 | model.conv1    | ConvBlock | 640   
2 | model.conv2    | ConvBlock | 36.9 K
3 | model.dropout  | Dropout   | 0     
4 | model.max_pool | MaxPool2d | 0     
5 | model.fc1      | Linear    | 1.6 M 
6 | model.fc2      | Linear    | 10.7 K
7 | train_acc      | Accuracy  | 0     
8 | val_acc        | Accuracy  | 0     
9 | test_acc       | Accuracy  | 0     
---------------------------------------------
1.7 M     Trainable params
0         Non-trainable params
1.7 M     Total params
6.616     Total estimated model params size (MB)
Epoch 0: 100%
5083/5083 [00:46<00:00, 108.95it/s, loss=0.575, v_num=1, validation/loss=0.540, validation/acc=0.794]
INFO:pytorch_lightning.utilities.rank_zero:Best model saved at: /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_1/epoch=0000-validation.loss=0.540.ckpt
INFO:pytorch_lightning.utilities.rank_zero:Restoring states from the checkpoint path at /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_1/epoch=0000-validation.loss=0.540.ckpt
INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loaded model weights from checkpoint at /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_1/epoch=0000-validation.loss=0.540.ckpt
Testing DataLoader 0: 100%
844/844 [00:05<00:00, 152.79it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc          │    0.7914906740188599     │
│         test/loss         │    0.5436267256736755     │
└───────────────────────────┴───────────────────────────┘

Creating lines of text from handwritten characters: `EMNISTLines`

이제까지의 과정을 요약해보면, 모델과 데이터에 대한 학습 파이프라인이 있고, 이를 사용하여 loss를 줄이고 작업을 수행한다.
하지만 우리가 해결하고 있는 문제는 중앙에 위치한 고대비의, 고립된 문자를 처리하는 방법을 학습하는 것뿐이다.
텍스트 인식 애플리케이션에서 이 기능을 사용하려면 먼저 이미지에서 이와 같은 문자를 추출하는 요소가 필요하다.
이 작업은 이제까지의 과정보다 더 어렵고 게다가 두 개의 개별 요소로 분리하는 것은 “end-to-end”로 작동하는 딥러닝의 정신에 어긋난다.
여러 문자에서 텍스트 라인들을 만들어 모델을 위한 데이터를 합성함으로써 사실감을 부여한다.
합성 데이터는 일반적으로 제한된 실제 데이터를 보강하는 데 유용하다.
우리가 데이터를 생성했기 때문에 구성별로 레이블도 알고 있는 상황이다.
가짜 손글씨를 만들기 위해 실제 손글씨와 실제 텍스트라는 두 가지를 결합한다.
[natural language tool kit](https://www.nltk.org/) 라이브러리에서 제공하는 브라운 말뭉치(Brown Corpus)에서 가짜 텍스트를 생성한다.
먼저 해당 말뭉치를 다운로드한다.

from text_recognizer.data.sentence_generator import SentenceGenerator

sentence_generator = SentenceGenerator()

SentenceGenerator.__doc__

[nltk_data] Downloading package brown to /content/fsdl-text-
[nltk_data]     recognizer-2022-labs/data/downloaded/nltk...
[nltk_data]   Unzipping corpora/brown.zip.
Generate text sentences using the Brown corpus.

SentenceGenerator를 사용하여 말뭉치에서 짧은 텍스트 snippet을 생성할 수 있다.

print(*[sentence_generator.generate(max_length=16) for _ in range(4)], sep="\n")

# Output :

# phase
# Within
# either
# the

다른 DataModule을 사용하여 EMNIST에서 필요한 손글씨 문자를 골라내어 생성된 텍스트가 포함된 이미지에 붙인다.

emnist_lines = text_recognizer.data.EMNISTLines()  # configure
emnist_lines.__doc__

EMNIST Lines dataset: synthetic handwriting lines dataset made from EMNIST characters.

emnist_lines.prepare_data()  # download, save to disk
emnist_lines.setup()  # create torch.utils.data.Datasets, do train/val split
emnist_lines

EMNISTLinesDataset generating data for train...
EMNISTLinesDataset generating data for val...
EMNISTLinesDataset generating data for test...
EMNISTLinesDataset loading data from HDF5...
EMNIST Lines Dataset
Min overlap: 0
Max overlap: 0.33
Num classes: 83
Dims: (1, 28, 896)
Output dims: (32, 1)
Train/val/test sizes: 10000, 2000, 2000
Batch x stats: (torch.Size([128, 1, 28, 896]), torch.float32, 0.0, 0.07364349067211151, 0.23208004236221313, 1.0)
Batch y stats: (torch.Size([128, 32]), torch.int64, 3, 66)

데이터 준비(Data Preparation)를 구성하기 위해 LightningDataModule 인터페이스를 사용하고 있으므로 이제 배치를 가져와서 일부 데이터를 살펴볼 수 있다.

line_xs, line_ys = next(iter(emnist_lines.val_dataloader()))
line_xs.shape, line_ys.shape

# Output : (torch.Size([128, 1, 28, 896]), torch.Size([128, 32]))

def read_line_labels(labels):
    return [emnist_lines.mapping[label] for label in labels]

idx = random.randint(0, len(line_xs) - 1)

print("-".join(read_line_labels(line_ys[idx])))
wandb.Image(line_xs[idx]).image

# Output : o-n-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>

결과물은 마치 랜섬노트(Ransom note)처럼 보이며 한 줄이라도 글자가 겹치지 않고, 한 문자가 snippet에 두 번 이상 나타나면 같은 글씨가 반복된다.

Applying CNNs to handwritten text: `LineCNNSimple`

LineCNNSimple 클래스는 CNN 클래스를 기반으로 하며 이 데이터셋에 적용할 수 있다.

line_cnn = text_recognizer.models.LineCNNSimple(emnist_lines.config())
line_cnn

LineCNNSimple(
  (cnn): CNN(
    (conv1): ConvBlock(
      (conv): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu): ReLU()
    )
    (conv2): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu): ReLU()
    )
    (dropout): Dropout(p=0.25, inplace=False)
    (max_pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (fc1): Linear(in_features=12544, out_features=128, bias=True)
    (fc2): Linear(in_features=128, out_features=83, bias=True)
  )
)

nn.Modules는 거의 동일하게 보이지만 사용되는 방식이 다르며, .forward 메서드를 살펴보면 이를 알 수 있다.

def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Apply the LineCNN to an input image and return logits.

        Parameters
        ----------
        x
            (B, C, H, W) input image with H equal to IMAGE_SIZE

        Returns
        -------
        torch.Tensor
            (B, C, S) logits, where S is the length of the sequence and C is the number of classes
            S can be computed from W and CHAR_WIDTH
            C is self.num_classes
        """
        B, _C, H, W = x.shape
        assert H == IMAGE_SIZE  # Make sure we can use our CNN class

        # Compute number of windows
        S = math.floor((W - self.WW) / self.WS + 1)

        # NOTE: type_as properly sets device
        activations = torch.zeros((B, self.num_classes, S)).type_as(x)
        for s in range(S):
            start_w = self.WS * s
            end_w = start_w + self.WW
            window = x[:, :, :, start_w:end_w]  # -> (B, C, H, self.WW)
            activations[:, :, s] = self.cnn(window)

        if self.limit_output_length:
            # S might not match ground truth, so let's only take enough activations as are expected
            activations = activations[:, :, : self.output_length]
        return activations

정사각형 이미지에서 작동하는 CNN을 우리의 기다란 이미지에 반복적으로 적용하고, 매번 창 크기(window size)만큼 슬라이드한다.
입력 이미지로 네트워크를 효과적으로 컨볼루션한다.
합성 데이터와 마찬가지로 조잡하지만 시작하기에는 충분하다.

idx = random.randint(0, len(line_xs) - 1)

outs, = line_cnn(line_xs[idx:idx+1])
preds = torch.argmax(outs, 0)

print("-".join(read_line_labels(preds)))
wandb.Image(line_xs[idx]).image

# Output : N-H-K-N-K-K-N-K-K-H-H-H-N-H-N-H-H-q-N-q-N-N-N-N-H-H-H-H-N-K-q-H

이제 기본 파라미터로 학습을 실행해 보자.

%run training/run_experiment.py --model_class LineCNNSimple --data_class EMNISTLines \
  --batch_size 32 --gpus {gpus} --max_epochs 2

INFO:pytorch_lightning.utilities.rank_zero:Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True, used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
EMNISTLinesDataset loading data from HDF5...
INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name      | Type          | Params
--------------------------------------------
0 | model     | LineCNNSimple | 1.7 M 
1 | model.cnn | CNN           | 1.7 M 
2 | train_acc | Accuracy      | 0     
3 | val_acc   | Accuracy      | 0     
4 | test_acc  | Accuracy      | 0     
--------------------------------------------
1.7 M     Trainable params
0         Non-trainable params
1.7 M     Total params
6.616     Total estimated model params size (MB)
Epoch 1: 100%
376/376 [00:26<00:00, 13.95it/s, loss=1.45, v_num=2, validation/loss=1.380, validation/acc=0.689]
INFO:pytorch_lightning.utilities.rank_zero:Best model saved at: /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_2/epoch=0001-validation.loss=1.376.ckpt
INFO:pytorch_lightning.utilities.rank_zero:Restoring states from the checkpoint path at /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_2/epoch=0001-validation.loss=1.376.ckpt
INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loaded model weights from checkpoint at /content/fsdl-text-recognizer-2022-labs/lab02/training/logs/lightning_logs/version_2/epoch=0001-validation.loss=1.376.ckpt
EMNISTLinesDataset loading data from HDF5...
Testing DataLoader 0: 100%
63/63 [00:01<00:00, 34.43it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc          │    0.6743437647819519     │
│         test/loss         │    1.4315539598464966     │
└───────────────────────────┴───────────────────────────┘

65~70% 범위의 테스트 정확도를 확인할 수 있다.

# if you change around model/data args in the command above, add them here
#  tip: define the arguments as variables, like we've done for gpus
#       and then add those variables to this dict so you don't need to
#       remember to update/copy+paste

args = Namespace(**{
    "model_class": "LineCNNSimple",
    "data_class": "EMNISTLines"})

_, line_cnn = training.util.setup_data_and_model_from_args(args)

latest_ckpt, = ! {list_all_log_files} | {filter_to_ckpts} | {sort_version_descending} | {take_first}
print(latest_ckpt)

reloaded_lines_model = text_recognizer.lit_models.BaseLitModel.load_from_checkpoint(
   latest_ckpt, args=args, model=line_cnn)

# Output : training/logs/lightning_logs/version_2/epoch=0001-validation.loss=1.376.ckpt

idx = random.randint(0, len(line_xs) - 1)

outs, = reloaded_lines_model(line_xs[idx:idx+1])
preds = torch.argmax(outs, 0)

print("-".join(read_line_labels(preds)))
wandb.Image(line_xs[idx]).image

# Output : e-u-e-e-t-t-n-e- -n-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>-<P>

일반적으로 대부분의 문자를 잘못 예측하였고, 모델이 데이터셋에서 가장 일반적인 문자(e.g. e)를 예측하는 것을 선호하는 등 품질이 매우 낮은 예측을 확인할 수 있다.
그러나 주어진 줄에 있는 문자 중 상당수가 패딩 문자 <P>라는 점에 주목해야 한다.
항상 <P>를 예측하는 모델은 약 50%의 정확도를 달성할 수도 있다.

padding_token = emnist_lines.emnist.inverse_mapping["<P>"]
torch.sum(line_ys == padding_token) / line_ys.numel()

# Output : tensor(0.5359)

이 특정 문제를 처리하기 위해 분류 metric을 조정하는 방법이 있다.
일반적으로 기준 성능을 0으로 하고 완벽한 성능을 1로 하여 숫자를 명확하게 해석할 수 있는 metric을 찾는 것이 좋다.
하지만 수시로 모델의 동작을 실제로 살펴보는 것도 중요하다.
Metric은 하나의 숫자이므로 모델의 행동에 대한 수많은 정보를 놓칠 수 있으며, 그 중 일부는 깊은 관련이 있을 수 있다.

[출처] https://fullstackdeeplearning.com/

'Data Science > FullStackDeepLearning' 카테고리의 다른 글

[FSDL] Pre-Lab 03: Transformers and Paragraphs (0)	2023.07.06
[FSDL] Pre-Lab 02a: PyTorch Lightning (0)	2023.06.23
[FSDL] Pre-Lab 01: Deep Neural Networks in PyTorch (0)	2023.06.23

현재글[FSDL] Pre-Lab 02b: Training a CNN on Synthetic Handwriting Data

기승이의 개발 일지

낭만있는 개발자가 되고싶어요

Spring, 썸머테크, 백엔드, 자바, pyTorch, fullstackdeeplearning, deeplearning, Baekjoon, 딥러닝, jsonwebtoken, fullstackdeeplearning2022, MLOps, spring boot, 풀스택딥러닝, java, 스프링, MVC, FastAPI, 당근마켓, 서버,

Today :
Yesterday :

기승이의 개발 일지

[FSDL] Pre-Lab 02b: Training a CNN on Synthetic Handwriting Data

Why convolutions?

Convolutions are the local, translation-equivariant linear transforms

We apply two-dimensional convolutions to images.

Convolutional neural networks build up visual understanding layer by layer.

Applying convolutions to handwritten characters: `CNN`s on `EMNIST`

The `EMNIST` Handwritten Character Dataset

Putting convolutions in a `torch.nn.Module`

Design considerations for CNNs

Shapes and padding

Parameters, computation, and bottlenecks

Training a `CNN` on `EMNIST` with the Lightning `Trainer` and `run_experiment`

Creating lines of text from handwritten characters: `EMNISTLines`

Applying CNNs to handwritten text: `LineCNNSimple`

'Data Science > FullStackDeepLearning' 카테고리의 다른 글

'Data Science/FullStackDeepLearning'의 다른글

티스토리툴바

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

[FSDL] Pre-Lab 02b: Training a CNN on Synthetic Handwriting Data

Why convolutions?

Convolutions are the local, translation-equivariant linear transforms

We apply two-dimensional convolutions to images.

Convolutional neural networks build up visual understanding layer by layer.

Applying convolutions to handwritten characters: CNNs on EMNIST

The EMNIST Handwritten Character Dataset

Putting convolutions in a torch.nn.Module

Design considerations for CNNs

Shapes and padding

Parameters, computation, and bottlenecks

Training a CNN on EMNIST with the Lightning Trainer and run_experiment

Creating lines of text from handwritten characters: EMNISTLines

Applying CNNs to handwritten text: LineCNNSimple

'Data Science > FullStackDeepLearning' 카테고리의 다른 글

'Data Science/FullStackDeepLearning'의 다른글

관련글

티스토리툴바

Applying convolutions to handwritten characters: `CNN`s on `EMNIST`

The `EMNIST` Handwritten Character Dataset

Putting convolutions in a `torch.nn.Module`

Training a `CNN` on `EMNIST` with the Lightning `Trainer` and `run_experiment`

Creating lines of text from handwritten characters: `EMNISTLines`

Applying CNNs to handwritten text: `LineCNNSimple`