Overview
Databases must do more than simply store and process the increasing amount of data in our world. They must also effectively organize and streamline the data to best aid users.
The Infrastructure for Intelligent Information Systems (IIIS) group is at the forefront of research into data analytics, search technologies, and integration. Our research ranges from generating innovative information extraction techniques to analyzing data translation to optimizing schema mapping, with the ultimate goal of developing next-generation, responsive information systems. Throughout all of our work, emphasis is placed on adaptability, usability, and scalability.
Featured Projects
![]() |
SystemT The SystemT project is an amalgam of two major research themes centered around analytics and search over unstructured content. These two themes are represented by two corresponding sub-projects: SystemT-Information Extraction (SystemT-IE) and SystemT-Programmable Search (SPS). Our main project page describes both sub-projects in greater detail. Project leader: Howard Ho |
| Content Analytics Platform (CAP) With the tremendous growth in the volume of semi-structured and unstructured content within enterprises(e.g., email archives, customer support databases, etc.), there is increasing interest in harnessing this content to power search and business intelligence applications. Traditional enterprise infrastructure for analytics is not designed to meet the demands of large-scale compute-intensive analytics over semi-structured content. In the CAP project, we are developing an enterprise content analytics platform that leverages the Hadoop map-reduce framework to support this emerging class of analytic workloads. Project leader: Sriram Raghavan |
| Gumshoe In contrast with the radical advances in Web search over the last several years, search over enterprise intranets has remained a difficult and largely unsolved problem. There are some critical enterprise-specific factors that differentiate the search problem on the intranet from that on the Web:
Project leader: Sriram Raghavan |
| SystemML There are many small systems for in-memory/in-core analysis of datasets in Mbytes or GBytes range, running on a single machine. However, there is a pervasive need to enable machine learning (ML) on big data. In SystemML, we address the challenges of large-scale analytics, namely: big data to TBytes and PBytes, scalability to large clusters with 1,000s of nodes, productivity of data analysts by providing a higher-level language, and optimization of execution strategies for varying data sets and system configurations. Project leader: Berthold Rainwald |
Recent Publications
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan: "Domain Adaptation of Rule-based Annotators for Named-Entity Recognition Tasks". To Appear in EMNLP 2010.
Bin Liu, Laura Chiticariu, Vivian Chu, H.V. Jagadish, Frederick Reiss: "Automatic Rule Refinement for Information Extraction". To Appear in PVLDB 2010.
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan: "SystemT: An Algebraic Approach to Declarative Information Extraction". ACL 2010
Ronald Fagin, Benny Kimelfeld, Yunyao Li, Sriram Raghavan, Shivakumar Vaithyanathan: "Understanding Queries in a Search Database System". PODS 2010
Laura Chiticariu, Yunyao Li, Sriram Raghavan, Frederick Reiss: "Enterprise Information Extraction: Recent Developments and Open Challenges". SIGMOD 2010 (tutorial)
Barna Saha, Ioana R. Stanoi, Kenneth Clarkson: "Schema Covering: A Step Towards Enabling Reusability in Information Integration". ICDE 2010
Ronald Fagin, Laura M. Haas, Mauricio A. Hernández, Renée J. Miller, Lucian Popa, Yannis Velegrakis: "Clio: Schema Mapping Creation and Data Exchange". Conceptual Modeling: Foundations and Applications 2009: 198-236
David Simmen, Fred Reiss, Yunyao Li, Suresh Thalamati: "Enabling Enterprise Mashups over Unstructured Text". SIGMOD 2009 (demonstration)
Eirinaios Michelakis, Rajasekar Krishnamurthy, Peter J. Haas, Shivakumar Vaithyanathan: "Uncertainty management in rule-based information extraction systems". SIGMOD Conference 2009: 101-114
Kevin S. Beyer, Vuk Ercegovac, Rajasekar Krishnamurthy, Sriram Raghavan, Jun Rao, Frederick Reiss, Eugene J. Shekita, David E. Simmen, Sandeep Tata, Shivakumar Vaithyanathan, Huaiyu Zhu: "Towards a Scalable Enterprise Content Analytics Platform". IEEE Data Eng. Bull. 32(1): 28-35 (2009)
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, Wang Chiew Tan: "Reverse data exchange: coping with nulls". PODS 2009: 23-32
Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal A. Younis: "Top-k generation of integrated schemas based on directed and weighted correspondences". SIGMOD Conference 2009: 641-654
David E. Simmen, Frederick Reiss, Yunyao Li, Suresh Thalamati: "Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT". SIGMOD Conference 2009: 1123-1126
James Cheney, Laura Chiticariu, Wang Chiew Tan: "Provenance in Databases: Why, How, and Where". Foundations and Trends in Databases 1(4): 379-474 (2009)

![[SystemT logo]](../../projects/systemt/systemt_image.jpg)
