Parsing resumes saves employers time and reduces the handbook effort required to study related details about a candidate (i.e. the years of expertise a candidate has, their earlier employers, and different pertinent info).
Nevertheless, conventional Optical Character Recognition (OCR) strategies typically face challenges with precisely recognizing textual content from two-column resumes, a format that’s more and more common.
On this weblog publish, we’ll present how laptop imaginative and prescient and YOLOv5 (You Solely Look As soon as) can be utilized to effectively section two-column resumes, enhancing OCR accuracy.
Understanding the Problem: Extracting Textual content from Resumes
The primary problem of this process is the presence of a number of columns in resumes, which may make it troublesome for conventional Optical Character Recognition (OCR) strategies to precisely extract textual content.
In recent times, laptop imaginative and prescient strategies have been proposed as an answer to this drawback. Nevertheless, most of those strategies are primarily based on rule-based approaches and are thus not versatile sufficient to deal with the variability of real-world resumes.
Our method is predicated on deep studying strategies, utilizing a neural community to determine totally different options. In our testing, now we have discovered our method to precisely section two-column resumes, no matter their format and formatting.
To construct an answer to the issue, we collected and annotated a dataset of three,541 two-column-resume pictures, skilled and evaluated a YOLO mannequin for the duty of segmenting resumes into 5 components. The outcomes of our experiments present that our method outperforms present strategies when it comes to accuracy and reliability.
Step 1: Acquire the Information
First, we wanted to gather a dataset of two-column resumes. This dataset shall be used to coach a pc imaginative and prescient mannequin to carry out two-column resume segmentation. In our case, we collected 1,000 two-column resume pictures utilizing internet scraping strategies.
Subsequent, we used Roboflow, an end-to-end platform for laptop imaginative and prescient, to label our dataset. Roboflow supplies instruments to assist us label our pictures shortly and precisely.
Step 2: Label the Information
Subsequent, we have to label the information in order that it may be used to coach the pc imaginative and prescient mannequin.
One efficient approach to label the information is utilizing the Roboflow platform. Roboflow Annotate means that you can add your pictures and label them utilizing a graphical person interface. The platform additionally supplies quite a lot of instruments to make the labeling course of extra environment friendly, reminiscent of automated annotation instruments and pre-defined label classes.

Completely different columns and resume sections got their very own bounding containers.
Step 3: Apply Preprocessing and Augmentation Steps
As soon as we had our labeled dataset, the subsequent step was so as to add augmented pictures utilizing a set of information augmentation strategies, reminiscent of flipping, rotation, and colour jittering. This lets us enhance the scale of our dataset and make it extra numerous, thus serving to the mannequin learn to determine totally different options in a picture.
Along with knowledge augmentation, Roboflow supplies picture preprocessing strategies, reminiscent of resizing, normalization, and cropping. We used these strategies to make sure that our pictures have been in a normal format, prepared to be used in coaching.
After finishing the labeling course of and making use of our desired preprocessing and augmentation, we may generate a dataset in a format suitable with common laptop imaginative and prescient frameworks, reminiscent of TensorFlow and PyTorch.
Step 3: Prepare the Mannequin
As soon as the labeled dataset was generated, we skilled a pc imaginative and prescient mannequin. On this case, YOLOv5 is used as the pc imaginative and prescient mannequin.
The Roboflow platform supplies a lot of instruments to assist with the coaching course of, reminiscent of a pre-trained mannequin primarily based on the Microsoft COCO dataset that may be fine-tuned to your particular dataset in addition to instruments for visualizing the coaching course of and evaluating the mannequin’s efficiency.
As soon as the coaching course of is full, the mannequin will be capable of precisely section two-column resumes into their particular person columns.
Step 4: Testing Mannequin Deployment
The next video demo reveals our mannequin in motion:
Now that we all know our mannequin performs as anticipated, the ultimate step is to deploy the mannequin in a manner that makes it accessible to others.
Roboflow Deploy supplies a lot of instruments to make this course of straightforward, reminiscent of a pre-built API that can be utilized to combine the mannequin into different functions and a web-based demo that permits customers to check the mannequin straight from their browser.
Moreover, Roboflow supplies a lot of instruments for monitoring the efficiency of the deployed mannequin, together with efficiency metrics reminiscent of precision, recall, and imply common precision (mAP).
With this deployment, we achieved a mAP of 73.1%, precision of 86.7%, and recall of 69.9%, demonstrating the effectiveness of utilizing laptop imaginative and prescient and YOLOv5 for two-column resume segmentation, and demonstrated that our method outperforms present strategies when it comes to accuracy and reliability.
Now that now we have the 2 separate columns, we are able to run OCR on each with out having to fret about knowledge being malformed on account of the OCR mannequin failing to know the two-column format.
Conclusion
On this weblog publish, now we have demonstrated that by leveraging the facility of laptop imaginative and prescient and deep studying, we are able to overcome the challenges posed by two-column resumes and extract related info from candidate resumes with larger ease and effectivity.