Check online my new article on «Content Reconstruction of Parliamentary Questions» (31 July 2017).
The Hellenic Parliament stores parliamentary questions using a combination of metadata extracted manually from the original text as well as the scanned document as an image file. Consequently, broad access and study of the parliamentary questions are limited as there is no principal access to the original content. A combined process was designed in order to fully reconstruct the original content of the parliamentary questions using the available metadata, which were extracted during the archivation phase, and a modified mass Optical Character Recognition (OCR) process. Postcorrection of OCR results and quality controls of extracted text are paramount to ensure that the text output matches the one from the original document. The results from the OCR process are joined with the metadata and allow the full description of the original document.
Please cite as follows:
Fitsilis, Fotios, Saalfeld, Thomas, Schwemmer, Carsten, Content Reconstruction of Parliamentary Questions, SCIECONF Proceedings, vol. 5, issue 1, pp. 107 – 112, 2017