Apply OCR to a PDF in Power Automate

Introduction

This post describes how Power Automate actions from AquaForest can be used to apply OCR (Optical Character Recognition) to a PDF

The Scenario

Consider a scenario where a paper document (containing a cover page) has been scanned into a PDF (Figure 1) and emailed to a third party for processing. When the email is received by the third party, OCR processing is applied to the PDF. The data extracted from the PDF’s cover page, along with the PDF, are stored in SharePoint. The next section describes (at a high level) how to implement this process in Power Automate (starting from when the email is received by the third party)

(It’s assumed that the scanned PDF doesn’t already contain an OCR layer. That is, the contents of the PDF isn’t searchable.)

Figure 1

The Power Automate solution

The solution consists of two Power Automate flows

Flow #1 – Extract the PDF from an inbound email

1.) An email is received which triggers the flow (Figure 2)

2.) The flow iterates through the email attachments

Figure 2

3.) The PDF email attachments are stored in the SharePoint folder ‘OCR DropBox Documents’ (Figure 3)

Figure 3

Flow #2 – Apply OCR to the PDF

1.) The record created in ‘OCR DropBox Documents (Figure 3) triggers the second flow (Figure 4)

2.) The Aquaforest action ‘Get PDF properties’ (Figure 4) verifies that the PDF is not already indexed

Figure 4

3.) The action ‘OCR PDF or images’ (Figure 5) OCR’s the PDF (specifically the cover page)

Figure 5

4.) The action ‘Get data from PDF’ (Figure 6) retrieves the name-value pairs. That is ‘Customer Reference’, ‘Full Name’ & ‘Business Unit’ from the cover page (Figure 1)

Figure 6

5.) The action ‘Create item’ (Figure 7) stores the extracted data in the SharePoint folder ‘OCR Document Staging’ (Figure 8)

Figure 7

6.) The records created in ‘OCR Document Staging’ (Figure 8) are then available to be leveraged (e.g. Dynamics 365)

Figure 8

Note: the step to populate the Name column (Figure 8) isn’t described in this post

Further reading

Extract QR code data in Power Automate

References

https://www.aquaforest.com/wp/aquaforest-pdf-connector-get-data-from-pdf-from-image-only-text-searchable-pdfs/

https://en.wikipedia.org/wiki/Optical_character_recognition