Publications and Reports

  • Inter-Generational Family Reconstitution with Enriched Ontologies by David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield, July 2019 (.pdf)

  • Green Information Extraction from Family Books by George Nagy, June 2019 (.pdf). Included in GreenBook.zip are the code, the ground truth files, the input and output files, and errata. The Ely, Kilbarchan, and Miller books along with OCR'd text files of all pages are here.

  • Ontological Document Reading: An Experience Report by D.W. Embley, S.W. Liddle, D.W. Lonsdale, and S.N. Woodfield, March 2018 (.pdf)

  • Models and Algorithms for Regularizing Heterogeneous Web Tables by David W. Embley, Mukkai Krishnamoorthy, George Nagy, and Sharad Seth, April 2015 (.docx)

  • Clustering header categories extracted from web tables by George Nagy, David W. Embley, Mukkai Krishnamoorthy, and Sharad Seth, November 2014 (.docx)

  • Transforming web tables to a relational database by David W. Embley, George Nagy, and Sharad Seth, May 2014 (.pdf)

  • Transforming web tables to a relational database by David W. Embley, George Nagy, and Sharad Seth, May 2014 (.pdf)

  • End-to-End Conversion of HTML Tables for Populating a Relational Database by George Nagy, Sharad Seth, and David W. Embley, December 2013 (.pdf)

  • Document Analysis Issues in Reading optical Scan Ballots by Daniel Lopresti, George Nagy, and Elisa Barney Smith, December 2009 (.pdf)

  • Final NSF Report, August 2009. (.pdf)

  • Theoretical Foundations for Enabling a Web of Knowledge by David W. Embley and Andrew Zitzelberger, August 2009 (.pdf)

  • Interactive Conversion of Large Web Tables by Raghav Padmanabhan, Ramana C. Jandhyala, Mukkai Krishnamoorthy, George Nagy, Shared Seth, and William Silversmith, July 2009 (.pdf)

  • KBB: A Knowledge-Bundle Builder for Research Studies by David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Arron Steward, and Cui Tao, May 2009. (.pdf)

  • Table Abstraction Tool by Raghav Krishna Padmanabhan, Masters Thesis, May 2009. (.pdf) Data: Source Tables), Data Analysis).

  • From Tessellations to Table Interpretation by Ramana C. Jandhyala, Mukkai Krishnamoorthy, George Nagy, Raghav Padmanabhan, Shared Seth, and William Silversmith, May 2009. (.pdf)

  • FOCIH: Form-based Ontology Creation and Information Harvesting by Cui Tao, David W. Embley, and Stephen W. Liddle, April 2009. (.pdf)

  • Conceptual Modeling for a Web of Knowledge by David W. Embley, Stephen W. Liddle, and Cui Tao, March 2009. (.pdf)

  • Domain-Independent Data Extraction: Person Names by Carl Christensen and Deryle Lonsdale, March 2009. (.doc)

  • Enabling a Web of Knowledge by Cui Tao, David W. Embley, and Stephen W. Liddle, January 2009. (.pdf)

  • Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages by Cui Tao, PhD Dissertation, December 2008. (.pdf)

  • Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time by Wai Yin Mok, Joseph Fong, and David W. Embley, November 2008. (revised manuscript) (.pdf)

  • A Conceptual-Model-Based Computational Alembic for a Web of Knowledge by D.W. Embley, S.W. Liddle, D. Lonsdale, G. Nagy, Y. Tijerino, R. Clawson, J. Crabtree, Y. Ding, P. Jha, Z. Lian, S. Lynn, R.K. Padmanabhan, J. Peters, C. Tao, R. Watts, C. Woodbury, and A. Zitzelberger, October 2008. (.pdf)

  • Automatic Hidden-Web Table Interpretation, Conceptualization, and Semantic Annotation by Cui Tao and David W. Embley, October 2008. (revised manuscript) (.pdf)

  • Annual NSF Report, August 2008. (.pdf)

  • Semantically Conceptualizing and Annotating Tables by Stephen Lynn and David W. Embley, July 2008. Proceedings of the 3rd Asian Semantic Web Conference (ASWC2008) (submitted manuscript) (.pdf)

  • Wang Notation Tool: A Layout Independent Representation of Tables by Piyushee Jha, Masters Thesis, May 2008. (.pdf)

  • Query by Table by Raghav K. Padmanabhan and George Nagy, ICPR08, April 2008. (.pdf)

  • Wang Notation Tool: Layout Independent Representation of Tables by Piyushee Jha and George Nagy, ICPR08, April 2008. (.pdf)

  • Foundational Data Modeling and Schema Transformations for XML Data Engineering by Reema Al-Kamha, David W. Embley, and Stephen W. Liddle, UNISCON08, April 2008. (.pdf)

  • Multi-character Field Recognition for Arabic and Chinese Handwriting, by Daniel Lopresti, George Nagy, Sharad Seth, and Xiaoli Zhang, April 2008, book chapter, (.pdf)

  • Adaptive and Interactive Approaches to Document Analysis by George Nagy and Sriharsha Veeramachaneni, April 2008, invited book chapter, (.pdf)

  • Automating Mini-Ontology Generation from Canonical Tables by Stephen Lynn, Masters Thesis, April 2008. (.pdf)

  • A Tool to Support Ontology Creation Based on Incremental Mini-Ontology Merging, by Zonghui Lian, Masters Thesis, March 2008. (.pdf)

  • Automatic Generation of Ontologies from Canonicalized Web Tables, by Stephen Lynn and David W. Embley, March 2008, (submitted manuscript). (.pdf)

  • Reusing Ontologies and Language Components for Ontology Generation, by Deryle W. Lonsdale, David W. Embley, Yihong Ding, Li Xu, and Martin Hepp, March 2008, Data & Knowledge Engineering. (.pdf)

  • Automatic Hidden-Web Table Interpretation, Conceptualization, and Semantic Annotation, by Cui Tao and David W. Embley, January 2008, Data & Knowledge Engineering, submitted manuscript. (.pdf)

  • Ontology Aware Software Service Agents: Meeting Ordinary User Needs on the Semantic Web, PhD Dissertation by Muhammed J. Al-Muhammed, July 2007. (.pdf)

  • 2007 BYU Annual NSF Report, July 2007. (.pdf)

  • Generating Ontologies via Language Components and Ontology Reuse by Yihong Ding, Deryle Lonsdale, David W. Embley, Martin Hepp, and Li Xu, June 2007, Proceedings of the 12th International Conference on Applications of Natural Language to Information Systems (NLDB'07). (.pdf)

  • Conceptual XML for Systems Analysis by Reema Al-Kamha, PhD Dissertation, June 2007. (.pdf)

  • 2007 RPI Annual NSF Report, June 2007. (.pdf)

  • Report #2: TANGO by Raghav K. Padmanabhan, June 2007. (.pdf)

  • Report #1: TANGO by Raghav K. Padmanabhan, June 2007. (.pdf)

  • Interactive Wang Notation Tool for Web Tables by Piyushee Jha, May 2007. (.pdf, .doc)

  • Seed-based Generation of Personalized Bio-Ontologies for Information Extraction, by Cui Tao and David W. Embley, May 2007, Proceedings of the First International Conference on Conceptual Modelling for Life Sciences Applications (CLMSA'07). (.pdf)

  • Automatic Hidden-Web Table Interpretation by Sibling Page Comparison, by Cui Tao and David W. Embley, April 2007, Proceedings of the 26th International Conference on Conceptual Modeling (ER'07). (.pdf)

  • Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation by Yihong Ding, David W. Embley, and Stephen W. Liddle, April 2007, ONISW 2007, submitted manuscript. (.pdf)

  • Bringing Web Principles to Services: Ontology-Based Web Services, by Muhammed J. Al-Muhammed, David W. Embley, Stephen W. Liddle, and Yuri A. Tijerino, April 2007, SWSP 2007. (.pdf)

  • Augmenting Traditional Conceptual Models to Accommodate XML Structural Constructs, by Reema Al-Kamha, David W. Embley, and Stephen W. Liddle, April 2007, ER'07 submitted manuscript. (.pdf)

  • Ontology-Based constraint Recognition for Free-Form Service Requests, by Muhammed Al-Muhammed and David W. Embley, April 2007, ICDE'07. (.pdf)

  • A Composite Approach to Automating Direct and indirect Schema Mappings, by Li Xu and David W. Embley, December 2006, Information Systems, submitted version. (.pdf)

  • Toward Making Online Biological Data Machine Understandable, by Cui Tao, November 2006, The 5th International Semantic Web Conference (ISWC 2006) Doctoral Consortium, (.pdf)

  • HTML Table Interpretation by Sibling Page Comparison in the Molecular Biology Domain , by Cui Tao and David W. Embley, October, 2006, 3rd Biotechnology and Bioinformatics Symposium (BIOT 2006). (.pdf)

  • Digitizing, Coding, Annotating, Disseminating, and Preserving Documents by George Nagy, December 2006 International Workshop on Research Issues in Digital Libraries (IWRID'06). (.pdf)

  • In Search of Meaning for Time Series Subsequence Clustering: Matching Algorithms Based on a New Distance Measure by Dina Goldin, Ricardo Mardales, and George Nagy, November 2006 ACM Conference on Information Knowledge Management (CIKM'06). (.pdf)

  • Automatic Creation and Simplified Querying of Semantic Web content: An Approach Based on Information-Extraction Ontologies, by Yihong Ding, David W. Embley, and Stephen W. Liddle, September 2006, ASWC'06. (.pdf)

  • 2006 RPI Annual NSF Report, August 2006. (.pdf)

  • 2006 BYU Annual NSF Report, August 2006. (.pdf)

  • Resolving Underconstrained and Overconstrained Systems of Conjunctive Constraints for Service Requests, by Muhammed Al-Muhammed and David W. Embley, June 2006, CAiSE'06. (.pdf)

  • Some answers to questions about ontology from the DEG point of view, by Yuri Tijerino, May 2006. (.doc)

  • Interactive Document Processing and Digital Libraries by George Nagy and Daniel Lopresti, January 2006, Conference on Document Image Analysis for Libraries (DIAL'06). (.pdf)

  • Notes on Contemporary Table Recognition by David W. Embley, Daniel Lopresti, and George Nagy, December 2005, Workshop on Document Analysis Systems (DAS2006). (.pdf)

  • Representing Generalization/Specialization in XML Schema by Reema Al-Kamha, David W. Embley, and Stephen W. Liddle, August 2005, EMISA05. (.pdf)

  • Automating the Extraction of Data from HTML Tables with Unknown Structure by David W. Embley, Cui Tao, and Stephen W. Liddle, Data & Knowledge Engineering, May 2005. (.pdf)

  • Table Processing Paradigms: A Research Survey by David W. Embley, Matthew Hurst, Daniel Lopresti, and George Nagy, February 2005, IJDAR. (.pdf)

  • Toward Ontology Generation from Tables by Yuri A. Tijerino, David W. Embley, Deryle W. Lonsdale, Yihong Ding, and George Nagy, April 2004, WWWJ. (.pdf)

  • Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views by Joachim Biskup and David W. Embley, 2003, Information Systems. (.pdf)