Data


200 tables— imported from ten large statistical data sites, most with a geo-political orientation, in the US and abroad. The tables vary sufficiently in size, complexity, and configuration to test alternative or complementary methods of table interpretation. Most of the tables were extracted from HTML pages. They are in MS-Excel format and include source, table titles, and footnotes. The tables were collected in 2009 by Ramana C. Jandhyala as part of his MS TANGO-related research at RPI.


Kilbarchan book and txt files for each page— The Kilbarchan book was typeset from a transcribed manuscript of original entries in the Kilbarchan Parish Record handwritten by parish vicars. The book was scanned and OCR'd yielding characters with bounding boxes which were then rendered as left-justified lines of text with heuristically set spacing between words and letters in words.


Miller book and txt files for each page— The Miller book is a typewritten copy from original Miller Funeral Home burial records. The book was scanned and OCR'd yielding characters with bounding boxes which were then rendered as left-justified lines of text with heuristically set spacing between words and letters in words.


Ely book and txt files for each page— The Ely book was typeset by the publisher. The book was scanned and OCR'd yielding characters with bounding boxes which were then rendered as left-justified lines of text with heuristically set spacing between words and letters in words.