img

On-Premise OCR


Eagle-eyed partners would have spotted the on-prem OCR extension pack on the reseller portal – it comes hot on the heels of the Advanced Print Deploy option.

No need to go into what OCR is, if you are here, we can assume you know!

In version 19 of PaperCut MF, the talented development team introduced a locally hosted (on-premise) OCR option:

On-premise OCR extension pack options

With the introduction of On-premise OCR, PaperCut caters for businesses that don’t want documents leaving the internal company infrastructure and as always, it’s super easy to setup. The local OCR option gives you the same functionality as PaperCut’s cloud-hosted scanning service, but all OCR processing is completed on your local server.

It’s a perfect choice if latency and large scan sizes are a problem; and if any of your security policies mean the cloud isn’t for you just yet (and we won’t judge you for that!).  

You can activate it once installed across your entire device fleet with one click, and it works with both scan to email and any local folder scan actions. Also, just because we know you would ask, it does indeed use Tesseract as the OCR engine.

On-premise OCR installation

The installation comprises of downloading the application and installing it on the server of your choice. Once you’ve done this, enter the IP address (if the server is running a static IP) or the server’s hostname and click add to finish the setup.

If you run into issues you can check the logs in:[install_path]/data/logs – oh, and you will need to open port 9198 as well (SSL/TLS).

Top Tip: We would not recommend putting the OCR software on the same server running PaperCut MF – OCR can be fairly CPU intensive and it may have a negative impact on other services.

on-premise OCR installation options

The server specifications for running on-premise OCR are as follows:

  • At least 10 GB available disk space
  • 512 MB available memory
  • Running a 64-bit edition of Microsoft Windows
on-premise OCR installation guide

Once installed, you can keep an eye on what is happening by going to https://OCRSERVER:9181/api-b/v1/status

Fine-tuning the performance

The approach to tuning the performance of an OCR server depends on whether it is on a standalone system or co-located with other services.

By default, the OCR server processes two jobs in parallel, and they are processed with a normal CPU priority. You can change the default number of parallel jobs by modifying the configuration file at: 

[ocr-server-path]/data/config/config.toml

After making changes to the config file, you’ll need to restart the PaperCut OCR Windows service.

Tuning for installation on a standalone system

When installing the OCR server on a standalone system, it’s a good idea to maximise the number of jobs that can be processed in parallel to achieve the best performance.

The ideal number depends on many factors, like the type and size of the documents being processed and system architecture. A reasonable starting point is to use the total number of virtual CPUs (or cores x threads on a “bare metal” system), minus two.

Put another way, if you want to process four OCR jobs in parallel and you are installing OCR on a virtual machine, give it six virtual CPUs. To make this change, in the config.toml file remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

Set the MaxJobsInParallel line to MaxJobsInParallel = 4 and as before, restart the PaperCut OCR Windows service.

Lots of scans in a busy system? No worries! The local OCR option supports multiple servers, offering all the good things like scalability, redundancy, performance gains and load balancing. 

The (near) future

The gift that keeps on giving; the OCR pack will be expanded in the future to split, compress, despeckle etc. jobs. All that super clever post-processing your customers demand.

Keep an eye out for updates; any questions just let us know. Or, if you know a customer who would like to try it out you can request a trial below!


Got a customer that wants to give it a go?

Request a trial license below!

Latest News from Paul Williams
img

On-Premise OCR

Eagle-eyed partners would have spotted the on-prem OCR extension pack on the r...

Written by: Paul Williams

More
img

PaperCut database tips

We're going to delve into a topic that regularly crops up via emails and call...

Written by: Paul Williams

More

img

Fastek Consultancy have worked on many large scale PaperCut projects with Selectec over the last few years. The level of expertise and support from Selectec is of the highest standard. Fastek have been approached by a number of other software companies wanting to work with us on machine management projects and we have turned them all down, purely because of the strength of our relationship with Selectec

Gary Whitaker Fastek Consultancy
Back to Top ↑