I’ve been researching family history, following leads in the National Library of Australia’s ‘Trove’ digital newspaper collection. One of the great things about this collection is that in addition to the scanned images of the newspapers, they’ve also been converted to text using OCR (optical character recognition).
Anyone who’s scanned text using OCR will know that the resulting text is hit and miss — the accuracy depends on the state of the original document/image, the OCR software, and the settings you use when converting. And so it is with these images — some are good, some are great, some are just woeful. Anyone can correct the resulting text, and with many people doing just that, over time the text becomes more readable — and most importantly, correctly represented in the indexes used for searching.
As I’m doing family history research, I’m looking for dates of birth, marriage, and death, so some of the entries I read are heartbreaking. Others are just plain funny because the OCR has incorrectly translated certain letter forms to other letters — e.g. F to P, i to l, H to II, S to B, 8 to S etc.
Although this one was not part of my family, I corrected it anyway. This is before…