11th October 2024

Optical Character Recognition, or OCR, might be an extremely helpful addition to any system, particularly when built-in with pc imaginative and prescient. OCR and pc imaginative and prescient can work on pictures however could possibly be much more highly effective when used on movies and video streams.

An annotated video of the ultimate results of working OCR on video

On this article, you’ll discover ways to use OCR fashions and mix them with pc imaginative and prescient instruments to construct functions that course of movies

Use OCR on Movies

For static video recordsdata, you possibly can run OCR on all of the picture frames inside the video file. On this information, we are going to determine the IDs of multimodal delivery containers in a delivery yard utilizing this strategy. 

On this instance, we now have a video of a number of delivery containers, every of which comprises a number of figuring out numbers. Utilizing this object detection mannequin from Roboflow Universe, we will determine the numbers of the ID utilizing OCR.

First, let’s arrange a callback perform to make use of later in order that we will run OCR on detections. For this instance, we are going to use EasyOCR. Study different OCR choices and the way they carry out on this weblog publish.

import easyocr
reader = easyocr.Reader(['en']) def run_ocr(body,field): cropped_frame = sv.crop_image(body,field) outcome = reader.readtext(cropped_frame,element=0) textual content = "".be a part of(outcome) textual content = re.sub(r'[^0-9]', '', textual content) return textual content

For our use case, we solely need the numbers that make up the ID, so we substitute something apart from numbers inside our detected textual content.

Subsequent, we use Supervision’s `process_video` perform to detect, then run OCR on our detections in a two-step course of. We will then annotate the body with a view to create an annotated video with the annotated textual content.

import supervision as sv def video_callback(body, i): detections = predict(body) detections = detections[detections.class_id == 2] labels = [] for detection in detections: detected_text = run_ocr(body,detection[0]) labels.append(detected_text) annotated_frame = body.copy() annotated_frame = sv.BoundingBoxAnnotator().annotate(annotated_frame,detections) annotated_frame = sv.LabelAnnotator().annotate(annotated_frame,detections,labels) return annotated_frame sv.process_video(VIDEO_PATH,"cargo_rawocr.mp4",video_callback)

With that, we get an annotated video with the detected textual content from every detection with a label.

Apply Monitoring to Determine Distinctive Gadgets in Movies

Though it will be potential to run OCR on each body, doing so could possibly be unnecessarily inefficient and expensive, in addition to being not particularly helpful in manufacturing use instances. 

Constructing on what we did earlier than, we are going to use the thing detection mannequin and OCR then incorporate object monitoring with customized code to “hyperlink” every recognized container with its corresponding ID and run OCR just a few instances, versus a number of tons of of instances. 

We will additionally use the good thing about each having a number of IDs throughout frames of video to construct consensus logic, making certain accuracy. 

👍

To enhance readability, some code that’s used to supply the ultimate output isn’t on this publish. The total code is offered in this Colab pocket book.

Utilizing the ByteTrack implementation in Supervision, we will observe every object throughout the video frames during which they seem. We additionally created a helper class that retains observe of OCR recognitions, ensuring that the ultimate textual content is the one that’s acknowledged most frequently throughout ten OCR makes an attempt.

import supervision as sv
import cv2
import numpy as np # Initalize ByteTrack
tracker = sv.ByteTrack() # Consensus monitoring utility
container_ids_tracker = Consensus() # Retains observe of the IDs of detected containers
container_ids = {} # A callback perform runs for every video body
def video_callback(body, i): detections = predict(body) detections = tracker.update_with_detections(detections) relevant_detections = detections[(detections.class_id == 1) | (detections.class_id == 2)] container_detections = detections[detections.class_id==1] id_detections = detections[detections.class_id==2] for i_idx, id_detection in enumerate(id_detections): id_box = id_detection[0] for c_idx, container_detection in enumerate(container_detections): # If an ID is inside a container, run OCR on it. if check_within(id_box, container_detection[0]): parent_container_id = container_detection[4] container_id_winner = container_ids_tracker.winner(parent_container_id) if container_id_winner: proceed ocr_result = ocr(body,id_box,id_detection[4]) container_ids_tracker.add_candidate(parent_container_id,ocr_result) # Video annotation label code... annotated_frame = body.copy() annotated_frame = sv.BoundingBoxAnnotator().annotate(annotated_frame,relevant_detections) # Extra video annotation code... return annotated_frame sv.process_video(VIDEO_PATH,"cargo_processed.mp4",video_callback)

Then, after working the code, we get a last annotated video, in addition to textual content knowledge on which containers had been current, which could possibly be useful for yard administration use instances.

An annotated video of the ultimate results of working OCR on video

Conclusion

On this information, we coated methods to use OCR together with pc imaginative and prescient on video, in addition to showcasing a small portion of the potential that OCR, pc imaginative and prescient, and video can have when mixed with further instruments like monitoring. 

If you want extra info on how Roboflow will help your corporation in organising a complete pc imaginative and prescient system, contact our gross sales crew.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.