OCR for Korean language

less than 1 minute read

Published: February 16, 2024

What is OCR (Optical Character Recognition)?

사진 속에 있는 글자들을 인식해서 컴퓨터가 사용할 수 있는 텍스트로 변환하는 기술

preprocessing: 기울기 보정, 얼룩 제거, 손상된 이미지 복구, example: Histogram equlization
text detection: object detection -> confirm text exists, bounding box and angle
text recognition: text and coordinates, CRNN
restructuring: 검출된 좌표에 따라서 문자를 재배치하여 원래 이미지와 비슷한 구조로 생성

Test with 8 OCR services

https://github.com/tesseract-ocr/tesseract
https://github.com/JaidedAI/EasyOCR
https://cloud.google.com/vision/docs?hl=ko
https://aws.amazon.com/ko/pm/textract/?gclid=CjwKCAiA9ourBhAVEiwA3L5RFl6-iGNO26zqRjsiFk_ycVbJf5QiF5aVtYA0bEfB2Ttm5jsROaJkxBoC4BIQAvD_BwE&trk=ba68822c-4d74-4f28-b470-bb363c226519&sc_channel=ps&ef_id=CjwKCAiA9ourBhAVEiwA3L5RFl6-iGNO26zqRjsiFk_ycVbJf5QiF5aVtYA0bEfB2Ttm5jsROaJkxBoC4BIQAvD_BwE:G:s&s_kwcid=AL!4422!3!658520966096!!!g!!!19852661900!149878733980
https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview?view=doc-intel-4.0.0
https://www.ncloud.com/product/aiService/ocr
https://www.upstage.ai/document-ai/overview
https://github.com/PaddlePaddle/PaddleOCR

Comparisons (for Korean text recognition)

(Speed) GoogleCloudVision > Tesseract > Upstage > NaverClova
(Korean) NaverClova > Upstage > Azure Document > Google Vision
Recommendation: Azure Document, NaverClova

How to measure the OCR algorithm

Accuracy: comparision between original text and recognized text
Character Error Rate (CER): character by accuracy
Word Error Rate (WER): word by accuracy
check this link post

What is CRNN architecture?

layers: Convolution Layers + Recurrent Layers + Transcription Layers
backbone: VGG-16 model
CNN layer -> last 2 FC into CNN -> LSTM(RNN) layer
CRNN parameters: input shape (256, 32, 1), num_classes (87)
output shape: (Non, 62, 87)

Data preparation

VGG Image Annotator (https://gitlab.com/vgg/via)

Share on

Twitter Facebook LinkedIn

Explain Stable Diffusion

1 minute read

Published: January 07, 2024

생성형 AI에 문장을 만들어 내는 text generation, 그림을 만들어 내는 image genration, 음성/음악을 만들어내는 wave generation이 있다. 그 중 이미지 생성에 관해서는 문장으로부터 이미지를 만들어 내는 text to image 알고리즘과 하나의 이미지로부터 다른 이미지를 만들어 내는 image to image 알고리즘, 그림으로부터 관련된 문장을 만들어 내는 image to text 알고리즘, 마지막으로 그림에 특정 부분을 생성해서 채워주는 Inpainting 알고리즘이 있다. 특히나 문장으로부터 이미지를 생성해 주는 알고리즘이 많은 연구가 진행되고 있는데, 가장 유명한 알고리즘 혹은 서비스로 Midjourney, Dall-E, Stable Diffusion 등이 있다. 이 글에서는 Stable Diffusion XL 알고리즘에 대해 배워보고 그 구성요소를 이해함으로써 어떻게 prompt를 만들어서 원하는 것에 가까운 결과를 얻을 수 있을지 이야기해 보자.

MLOps Engineering: Curriculum for Success

3 minute read

Published: December 30, 2023

MLOps, or Machine Learning Operations, is a crucial field that focuses on streamlining the process of building, deploying, and maintaining machine learning models in production environments. As an MLOps engineer, you’ll need to have a strong understanding of various technologies, tools, and methodologies to be successful in your role. Here’s a comprehensive curriculum that covers all the essential topics for becoming an effective MLOps engineer:

Sung-Cheol Kim