The text cleaner cleans up any text that has been produced by an OCR process. It checks for excessive character runs and other configured text content restraints.
Select all the methods you want to use from the Cleaning Process dropdown.
Remove Long Strings - Strings over a certain length will be removed from the text.
Remove Null Characters - Any blank characters will be removed.
Remove Non ASCII Characters - Any characters not in the American Standard Code for Information Interchange will be removed.
OCR Cleanup - Improve the accuracy of your OCR process by defining rules for cleaning.
Remove Blank Lines - Any blank lines will be removed.
When selected a Maximum Continuous Characters must be set.
Any text longer than this with no spaces or delimiters will be removed.
Enter any delimiters to be used other than full stops. These will be used to determine the length of a sentence.
In OCR Cleanup Dictionary File path enter the dictionary Path yo use when word checking.
Aiimi can provide a dictionary set if required.
Check ignore proper nouns to ignore the ignore their spelling within an OCR.
Check Only Clean If OCR Metadata Present to only check documents that have passed OCR.
Within Words to Ignore for OCR Cleanup, enter any words that should be ignored from the spellcheck.
There is no limit to the number of words you can add.
You can remove and edit words in the list using the edit or delete buttons next to the word.
Select Show Advanced Options
Define the maximum number of items to process concurrently in Bounded Capacity.
Define the maximum number of items that can be queued.
Limiting either of these will reduce the memory use but increase the time taken.