A Data Integration Framework with Full Spectrum Fusion Capabilities

February
11
2010

Printable Copy


A Data Integration Framework with Full Spectrum Fusion Capabilities

August 2009

Suzanne Yoakum-Stover, Ph.D.

Potomac Institute for Policy Studies, Senior Research Fellow

US Army CERDEC I2WD, Information Exploitation Futures Lab, Lead Scientist

Fort Monmouth, NJ

Tatiana Malyuta, Ph.D.

New York City College of Technology, Associate Professor

Data Tactics Corp., Principal Database Architect

Alexandria, VA

Norbert Antunes

US Army CERDEC I2WD,

Fusion and Modeling Division,

Computer Engineer

Aberdeen Proving Grounds, MD

ABSTRACT

One of the most challenging problems in intelligence gathering and processing is resolving the issues of syntactic and semantic interoperability of numerous intelligence sources and fusing the intelligence data, information, and knowledge to provide for efficient, accurate and comprehensive analysis. The key features of a successful fusion solution are: 1) The ability to rapidly and seamlessly integrate any source while preserving its original data and semantics; 2) Support for powerful data processing capabilities that can utilize the data and semantics of the integrated sources without limitations. Existing data integration approaches require heavy pre-integration processing (schema harmonization and data normalization) and usually entail loss/distortion of original data and semantics. Processing of the integrated data is defined by, and therefore limited by, the integrating schema – discovering data relationships and fusing data beyond the integrating schema is impossible.

We present a unified data and information integration framework that presents absolutely minimal barriers to incorporating new data and semantics into the integrated system (e.g. no heavy pre-processing or data / data-model conditioning), and embraces the full spectrum of data sources, types, models, and modalities (e.g. text, images, audio, signals).

The approach enables rapid integration of ad-hoc data and data-semantics and results in a multi-layered data store that we call a Unified Data Space (UDS). The UDS supports data fusion to yield information and knowledge while imposing no restrictions on what data must be or how it is to be used, and the diversity of processing by which structural and semantic barriers are overcome.

In this paper we concentrate on the benefits of the approach for data fusion.

Key words: Data Description Framework (DDF), data fusion, data integration, semantic enrichment, structured data.

Introduction

One of the most challenging problems in intelligence gathering and processing is resolving the issues of syntactic and semantic interoperability of numerous intelligence sources and fusing the intelligence data, information, and knowledge to provide for efficient, accurate and comprehensive analysis. The key features of a successful fusion solution are: 1) The ability to rapidly and seamlessly integrate any source while preserving its original data and semantics; 2) Support for powerful data processing capabilities that can utilize the data and semantics of the integrated sources without limitations. Existing data integration approaches require heavy pre-integration processing (schema harmonization and data normalization) and usually entail loss/distortion of original data and semantics. Processing of the integrated data is defined by, and therefore limited by, the integrating schema – discovering data relationships and fusing data beyond the integrating schema is impossible [1 – 3].

We present a unified data and information integration framework that:

  • Presents absolutely minimal barriers to incorporating new data and semantics into the integrated system (e.g. no heavy pre-processing or data / data-model conditioning)
  • Embraces the full spectrum of data sources, types, models, and modalities (e.g. text, images, audio, signals)

A detailed description of the architecture and the philosophy of the approach can be found in [4 – 7]. In this paper we concentrate on the benefits of the approach for data fusion:

  • The UDS permits “horizontal” data fusion spanning across data from all sources. It can accommodate any number of integration models, without imposing physical or semantic barriers, while facilitating navigation, search, and exploration of the integrated data store. As a result, relationships between data from multiple disparate sources that are difficult or impossible to foresee (and therefore impossible to look for) are revealed.
  • The multi-layered UDS permits the “vertical” fusion of data and data models, allowing one to backtrack to sources on one hand and connect to knowledge models on the other. It also supports the processing by which data is cultivated to produce information, knowledge, and ultimately understanding. Thus, the UDS represents a viable integrated solution that “matures” with time.

    Challenge of Data Integration

To be viable within an Ultra-Large Scale (ULS) systems [8] environment consisting of a freely evolving, interdependent collective of human and computational systems, very little of which will ever be under our control, any approach to data and semantic fusion must:

  • Present absolutely minimal barriers to incorporating new data and semantics into the integrated system (e.g. no heavy pre-processing or data / data-model conditioning)
  • Embrace the full spectrum of data sources, types, models, and modalities (e.g. text, images, audio, signals)
  • Impose no restrictions on what data must be or how it is to be used
  • Support the diversity of processing by which structural and semantic barriers are overcome to yield information and knowledge
  • Allow data, information, and knowledge to be re-used according to diverse perspectives

To our knowledge, no traditional approach to data integration, physical or virtual, has all of these characteristics [1, 2, 9, 10, 12]. Consequently, traditional approaches fail to provide viable solutions in the “wild”, i.e. for ULS environments that are characterized by decentralization; inherently conflicting, diverse, and unknowable requirements; heterogeneous, changing, and inconsistent elements; normal failures; continuous operation, evolution, and deployment; and immense scale along many dimensions.

To productively address the challenges of ULS systems environments, we developed the Data Integration and Semantic Enrichment Platform (DISEP) and the Data Description Framework (DDF) based on the following principles:

To enable true data integration and unencumbered semantic enrichment,

  • Data must be perceived objectively, independent of intended use
  • Domain data-models must be considered from a higher level of abstraction

In our approach, these principles are reflected in two essential decouplings: The decoupling of data from domain data-models, and the decoupling of domain data-models from the model of the integrated data store. As a result, the DISEP is able to serve as a universal platform for data integration and semantic enrichment, and within it, the DDF serves as a universal store for structured data.

Data Description Framework

The DDF supports semantic data integration by establishing a domain-neutral unified store for structured data. To achieve this, we consider structure, vocabulary, semantics, and constraints from a higher level of abstraction from which we then distill a minimal set of elements sufficient to capture any data-model. These are illustrated in Figure 1 and defined as follows:

Sign: A sign
gi represents a chunk of data, either physically located within a tangible artifact, or contained within an analyst’s mind. Examples of the former include a string of text in a document; an area of pixels within an image; a segment of an audio stream or other signal. As illustrated in Fig. 1, for tangible artifacts, regardless of the type of medium, signs are always associated with a physical extent or quantifiable span within the artifact, which we call a mention. The set of all signs, G = {gi}, spans across all data sources. In the set, each element is unique: i,j (i ≠ j) gi ≠ gj. G is the construct by which the DDF represents data. From the text data shown in Fig. 1, signs G= {‘Suzi’, ‘Tanya’, ‘July 4, 2007′, ‘Bring lunch’, ‘Message1′} contribute to G (i.e. G’
Í G), though many more signs may be identified even from this simple example.

Concept:
A concept
ci
is an abstract idea, defined either explicitly or implicitly by a source data-model.  For example, the nodes of an ontology, the tag set in an XML Schema Document (XSD), and the attribute / table names in a relational database all represent concepts. In the set of all concepts C = {ci}, each element is unique: i,j (i ≠ j) ci ≠ cj. From the text data shown in Fig. 1, concepts C’ = {‘Message’, ‘Person’, ‘Body_text’} contribute to the full set of concepts C (i.e. C’ Í C).

Predicate: A predicate pi is an abstract idea used to express a relationship between “things.” Predicates are used in the formation of statements (described below) and may be defined either explicitly or implicitly by a source data-model. For example, the arcs of an ontology, and the attributes of an XML or database schema represent predicates. In the set of all predicates P = {pi}, each element is unique: i,j (i ≠ j) pi ≠ pj. The text example of Fig. 1 contributes predicates P’ = {‘To’, ‘From’, ‘Body’}
to the set of all predicates
P
(i.e. P’ Í P). The only predicate that is “built into” (i.e. defined by) our storage model is the IsInstanceOf predicate, which is used to disambiguate signs to form terms as described below. Concepts and predicates are the constructs by which we link to data-models and, thereby, explicitly expose data-semantics.

Term: A term
ti is an ordered pair <gi,cj> where gi
G
and cj
C. Each term represents a disambiguated sign. The process of disambiguation associates a sign with a concept using the ‘IsInstanceOf’
predicate (though not every sign from G is necessarily disambiguated, and not every concept from C is necessarily used for disambiguation). In the set of all terms T = {tij}, each element is unique:
i,j,k,l (i ≠ k or j ≠ l) tij ≠ tkl
. The text example of Fig. 1 contributes terms T’ = {t1, t2, t3, t4}
where t1 = <’Suzi’, person>, t2 = <’Tanya’, person>, t3 = <’Bring lunch’, Body_text>, t4 = <Message1, message>
to the complete set of terms
T (i.e. T’
Í T).

Statement: A statement, si is an encodes a binary relationship between a subject and an object mediated by a predicate. A statement is represented by an ordered triple sijh = <subjecti, predicatej, objecth>. Among the set of all statements, each element is unique:
i,j,h,l,m,n (i ≠ l or j ≠ m or h ≠ n) sijh ≠ slmn
. In our model, subject and object may be either a term or statement. The simplest kind of statement is one in which subject and object are terms s0ijh = <ti, pj, th>. Statements in which the object is itself another statement represent reifications: s1klm = <tk, pl, sm>. Finally, a statement in which both subject and object are other statements represents a relationship between statements: s2xyz = <sx, py, sz>. The set of all statements S = {s0ijh} U {s1klm} U {s2xyz}. The text example of Fig. 1 shows three statements: S’ = {<t4, to, t1>, <t4, from, t2>, <t4, body, t3>} all with the same subject, which is the term corresponding to the message itself. These statements contribute to the set of all statements, i.e. S’ Í S.

These elementary constructs (sign, concept, predicate, term, and statement) define a data reference model, which we call the Data Description Framework (DDF) [13]. Because it effectively decouples data from data-models, it can encapsulate any sort of data-model. Because it binds knowledge to data, it enables deep data integration and semantic enrichment. By using the DDF as the basis for implementing a stable storage-model, we are able to build a practical data integration platform on commodity database infrastructure.

The reader familiar with the Resource Description Framework (RDF/RDFS) may wonder what is different here. Indeed, RDF and DDF share DNA, so to speak, since both employ a similar level of abstraction and expose semantics. Unlike RDF however, DDF also prescribes the exposure of data as signs which can freely participate in the disambiguations and associations necessary for data integration. In contrast, a datum represented as an RDF literal cannot be explicitly disambiguated or associated. Also, in contrast to DDF signs, which provide a primal level of data integration (to be described below), there is no mechanism in RDF to prevent a single datum from being represented by multiple URIs. This is not a criticism of RDF as these differences reflect the fact that RDF is a meta-model not specifically aimed at data integration. Thus, employing RDF for data integration necessitates building a particular metamodel instance (i.e. a model) in RDF along with rules prescribing the manner of data exposure [3]. In contrast, DDF is a model that makes explicit commitments to support data integration. Because this model represents an abstraction over domain data-models, the DDF can represent data structured by any data-model, and be represented in any metamodel (including RDF).

As illustrated in Figure 2, the DDF forms a layer of data and semantics (Layer 2) in the DISEP that lies between the indigenous source systems (Layer 1) and their data/knowledge models (Layer 3). A more detailed description of the layered architecture is presented in [13]. Layer 1 feeds the layers above, and Layers 2 and 3 interact: Layer 3 provides the semantic context for Layer 2, and Layer 2 participates in the formation of an overarching knowledge model in Layer 3. Together Layers 2 and 3 form what we call the Unified Data Space (UDS).

Demonstration of Fusing Potential

The DDF fusing potential is demonstrated in Fig. 3:

  • The UDS integrated (Fig. 4 and Fig. 5) selected data from Freebase [16] (structured data), GovTrack [17] (semi-structured data), and the images and articles from Wikipedia [18, 19] (unstructured data).
  • The UDS absorbed the diversity of the data types and modalities: audio, images, unstructured text, and data from the relational database.
  • Accommodating data from the structured and semi-structured stores was performed in an automated fashion with little to no up-front preprocessing that is usually needed in traditional integration approaches.
  • The unstructured text data and images were represented with the help of DDF constructs via manual or automated entity extraction and face-recognition processes.

Figures 4 and 5 illustrate the process of “DDFying” the data sources from Fig. 3: extracting and disambiguating data, and representing disambiguated data as terms (Fig. 4); associating disambiguated data as statements (Fig. 5).


Horizontal Fusion Across Sources

Utilizing data from multiple sources is complicated by their original disconnect – each data store is built on some data scope and on a specific data-model, and over time, neither the scope of data nor the data-model change much. A traditional integration approach, even a successful one, results in yet another store with its own data and data-models, with the limited ability to evolve – often another candidate for future integration efforts. Significant investment into the integration project may result in the “rise” of a new data store but cannot prevent its “fall.” Fig. 6a) demonstrates a progression of integrated stores – isolated silos of data disconnected from each other and from the data sources.

The Unified Data Space that is built and cultivated on the DDF model allows to incorporate any data source without sacrificing the source’s original data and data-model – DDF unlocks the data source to virtually endless enrichment by different cross-source disambiguations and associations. The DDF UDS is permanently and invariably open to new data sources, and is constantly evolving with its data and data-model expanding. Each participating data store contributes to the expressiveness of the whole Data Space, and in turn benefits from being fused with other data stores by gaining richer semantic and data contents (Fig. 6b).

Using the “fusion” metaphor – a traditional integration approach produces a finished product – an alloy, while the DDF approach produces a melting pot.

The evolving DDF Data Space is representative of the sources and the results of cross-source enrichment. It also takes data utilization and exploitation to a new level and supports navigation, exploration, and querying, without limits.

In the subsequent text, we represent signs, concepts, and predicates using Arial font. Terms are denoted as
[sign, concept] (
e.g.
[Adam, Chemist])
and statements are denoted using an intuitive triple representation, e.g. [Adam, Chemist] hasInventoryID [1001,InventoryID].

Fig. 7 demonstrates these capabilities for the following scenario:

  • A user (or a process) performed search for the text (i.e. sign) ‘Bush’ and retrieved a number of terms ([George Walker Bush, Name] from Wikipedia, [George W. Bush, President] from Freebase, …) visualized as the nodes.

  • The user asserts that ([George Walker Bush, Name] from Wikipedia is same as [George W. Bush, President] from Freebase – a new statement [George Walker Bush, Name] sameAs [George W. Bush, President] is created in the UDS and is shown as an arc connecting the two nodes. The figure shows the results of. The dashed ellipses show how the multiple cross-source assertions fuse data from different sources.
  • Furthermore, the fused data allows us to break the sources barriers and see the cross-source relationships between data as if they came from the same source. For example, [George Walker Bush, Name] from Wikipedia (ellipse 2) will be associated with [Dick Cheney, Vice_President] from Freebase (ellipse 1).
  • The user can semantically enrich the UDS by introducing new DDF constructs, such as signs, terms, and statements. For example, the user asserts that [BUSH George Walker, Face] from one Wikipedia article is associated with [Bush George Herbert Walker, Face] from another Wikipedia article via predicate hasFather (green arc).
  • Following the original and asserted relationships between data the user can navigate across the sources, e.g. from ellipse 1 to ellipse 4 in the figure.

Vertical Fusion Across Data and Knowledge

The DDF defines the organization of the Layer 2 of DISEP (Fig. 2) and establishes connections with its other layers – the DISEP provides for a contiguous multi-layer store: from sources in the wild, to the structured data store, and to the knowledge representation.

Data is taken from Layer 1, tied with the semantic elements from Layer 3 to get the structured data store, on which we can perform efficient data processing and enrichment. However, in DDF we may lose the rich data context of the data element from Layer 1, so when we need to perform data analysis in the original data context, we can go back to Layer 1. We also may lose the rich semantic context for the semantic element, so when we need to perform model analysis, we can go to Layer 3. Fig. 8 demonstrates contiguousness of the UDS and vertical enrichments processes.

Fig. 9 illustrates these enrichment processes on our example. After [George Walker Bush, Name] from Wikipedia is fused with [George W. Bush, President] from Freebase, a user (a process) can see in Layer 3 that the concept President is associated with the concept Party and he returns to the Wikipedia source to find out that George Walker Bush is a Republican. A new term [Republican, Party] and a new statement [George Walker Bush, Name] associatedWith [Republican, Party] are created in the Layer 2, enriching the integrated data store that supports exploration, navigation and querying.

The enrichment operations preserve the contiguousness of the UDS: from Layer 1 we can “see” how a particular data element is associated with any abstraction of Layer 3, and from Layer 3 we can “see” how any abstract idea is associated with data. We truly break the barriers not only between the sources, but also between the data and knowledge worlds.

Summary

It is acknowledged that there is no silver bullet for solving the problem of data integration [7], and that all integration approaches face deep challenges associated with scale, performance, query processing, data conditioning / pre-processing, semantic enrichment, viability, and sustainability. The DISEP and DDF serve to address these challenges as follows:

Scalability. The challenge of scale is common to most integrated stores. The “lossless” data representation of the DDF slightly exacerbates this problem because it generally requires several times more storage space than in the original source. Fortunately, distributed database technologies, and cloud computing infrastructure in particular, provide viable means to manage this challenge.

Query Processing and Semantic Exploration. The DDF enables semantic navigation over the DDF Data Space by the action of a series of questions that “surf” across the entire DDF Data Space unimpeded by barriers between source systems. We distinguish two types of navigation and data retrieval on the DDF Data Space:

Exploration. A user navigates the Data Space having limited or no knowledge of the sources and their models. Navigation is data-driven; the result of the initial user’s query is used to generate the next .

Querying is similar to querying of traditional data sources when a user formulates a request assuming knowledge of a data-model. Querying is model-driven; each query is independent from others.

A very important consequence of the unified representation of data and data-semantics in DDF is the unified structure of the queries. This allows querying patterns to be defined and processing optimized. As a result, the ad-hoc querying and exploration of the data store can be performed naturally [15]. Ad-hoc querying of traditional integrated stores has serious performance issues and is constrained by the integration model. Moreover, some queries, such as search over data values, which are natural in the DDF, cannot be implemented in a practical way using other integration solutions.

Semantic enrichment. The scope of semantic enrichment achievable by the DDF is defined by the integration actions that it supports: incorporating new data and data-models, disambiguating data, and building associations between disambiguated data. Because traditional data integration approaches support only some aspects of data and data-model enhancement, their semantic enrichment power is severely limited. In addition, the DDF implementation supports extensive metadata [14] (not discussed in this paper), which may provide additional semantic enrichment by, for example, capturing information quality. Metadata also enable more sophisticated information retrieval processes.

Viability. One of the most important advantages of the DDF over other integration solutions is that it meets the challenge of the viability of an integration solution. The DDF accommodates new data and semantics and allows for virtually endless semantic enhancement through new data disambiguations (i.e. term formation) and new semantics associations (i.e. statements formation).

Sustainability. An integration solution must “…offer some services immediately without any setup time, and improve the services as more investment is made into creating semantic relationships.” [7]. The DDF actually achieves this: Data sources can be integrated in the DDF without heavy preprocessing or data-model harmonization. The Data Space can be explored, and semantic relationships discovered, without a-priori understanding of source data-models. Additional refinement and enrichment of the Data Space serves to increase the effectiveness of Data Space services.

Implementation: General Overview

A DDF data store can be implemented in a variety of ways (e.g. objects, relations, triples). We have a prototype in Oracle 10g and there is an ongoing effort of implementing DDF on the Cloud. The following table gives an overview of the DDF architecture on the Cloud.

Table 1. Architecture of the DDF Integration Solution

Layer

Description

Content

Content Origin

System/Organization

Process

3 – Model Description Framework (MDF)

Universal store

for data / knowledge models

Model elements and their relationships

(e.g. concept, predicate, super/sub class, part-of, property, …)

HBase

Any Information System

  • Relational schemas
  • Taxonomies
  • RDF ontologies

Any project/organization

  • Cyc project
  • DARPA, Gov’t labs, DoD, IC, ..
  • Standards bodies
Any process

  • Data model harmonization
  • Data model integration
  • Data model management
  • Data model enrichment

2 – Data Description Framework (DDF)

Universal store

for structured data

Data elements, semantic elements, and associations

Includes artifact and process metadata

(e.g. source, creator, timestamp, …)

HBase

Any System

  • Relational database
  • Object store
  • XML source
  • Triple store
  • Key-value store

Any Organization

Any process

  • Object extraction
  • text, image, voice, signature, …
  • Natural language processing
  • Link anaysis
  • Analyst manual activity
  • Alerting, reporting, …
  • Other tools and applications

1- Artifact Description Framework (ADF)

Universal store

for unstructured artifacts

Indiginous Artifacts

(documents, images, audio, video, signals, … )

Hadoop File System

Any Source

  • External filesystem
  • Web
  • Document repository

Any process

  • Reporting
  • Alerting
  • Security

Conclusion

The DISEP and DDF support data fusion and unencumbered semantic enrichment by implementing the following key principles:

  • Data must be perceived objectively, independent of intended use
  • Domain data-models must be considered from a higher level of abstraction

Data fusion approaches, physical and virtual, generally manage to preserve only a portion of the original data and semantics, and present these with yet another single, restrictive data-model. In contrast, the DDF persists and presents the entirety of the source data and semantics by using a higher level abstraction that imposes no particular data-model, yet supports any. The DDF populated with data produces a Unified Data Space that represents the primal integrated data layer of the DISEP. Within this Data Space, the original data and data-models co-exist and may be enriched either through the ingestion and integration of additional data or semantic enhancement.

Over decades of data processing, we have been formalizing our perception of data and then transforming (or even creating) and storing data according to this perception. Unfortunately, there has been very little effort to ensure correctness/durability/objectivity of those perceptions. As a result, we work with numerous models and formats of data, and numerous versions of data buried beneath. The evolution of our perception and understanding of data cannot be reflected in these data stores. New data, which does not conform to the store’s model, also cannot be accommodated. Thus we are trapped in an endless loop of creating and integrating new data stores, each of which deals with only a fraction of the data surrounding us. None of these can be expanded to represent other data, and all are valid for a relatively short time. In DDF, on the other hand, data “lives” alongside data-models, not inside them. This enables loose coupling of data and perceptions (i.e. data-models), and allows multiple perceptions to co-exist in the Data Space.

Without imposing modifications on existing data stores, the DDF can expose their data and semantics for use and re-use, without further increasing data entropy. The DDF Data Space is a live integrated store that evolves with our intentions (i.e. applications) and perceptions (i.e. data-models).

Acknowledgements

The authors thank the following US Army CERDEC I2WD personnel for their continued support: Mr. Anthony Lisuzzo, Director, Mr. Kesny Parent, DCGS-A Branch Chief, and Ms. Virginia Goon IXFL Manager. The authors also thank Mr. Oscar Wood and Mr. David Salmen of Data Tactics Corporation, as well as Mr. Andrew Eick of MissionFoc.us for many productive and stimulating discussions. This work was funded by US Army CERDEC I2WD under contract number W15P7T-06-D-A401/009.

References

[1] Batini, C. et al. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, (18) 4, 1986.

[2] Bernstein P., and Ho, H. Model Management and Schema Mappings: Theory and Practice. Proceedings of VLDB Conference, 2007.

[3] Booth, D. Why URI Declarations? A comparison of architectural approaches. HP Software, 2008. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-422/irsw2008-submission-9.pdf

[4] Franklin, M., Halevy, A., and Maier, D. From Databases to Dataspaces: A New Abstraction for Information Management. ACM SIGMOD Record, 2005.

[5] Halevy, A. et al. Enterprise information integration: successes, challenges and controversies. Proceedings of 24th International Conference on Management of Data, Baltimore, 2005.

[6] Halevy, A. Franklin, M., and Maier, D. Principles of Dataspace Systems. Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2006.

[7] Halevy, A., Rajaraman, A., and Ordille, J.
Data Integration: The Teenage Years. Proceedings of VLDB Conference, 2006.

[8] Northrop, L., et al., Ultra-Large-Scale Systems The Software Challenge of the Future. Pittsburgh: Carnegie Mellon University, 2007.

[9] Omelayenko, B. and Fensel, D. An Analysis of B2B Catalogue Integration Problems. Proceedings of the International Conference on Enterprise Information Systems (ICEIS-2001),
2001.

[10] Parent, C. and Spaccapietra, S. Issues and approaches of database integration. Communications of the ACM, 41(5), 1998.

[11] Sowa, J. Knowledge Representation. Logical, Philosophical, and Computational Foundations. Brooks/Cole, 2000.

[12] Yero, J. Logical vs. Physical Data Integration: A Practical Decision Guide. The DAMA International Symposium & Wilshire Meta-Data Conference. San-Diego, CA, 2008.

[13] Yoakum-Stover, S. and Malyuta, T. Unified Architecture for Integrating Intelligence Data, Proceedings of MIT Information Quality Industry Symposium, MIT, Cambridge, MA, 2008.

[14] Yoakum-Stover, S. and Malyuta, T. Unified Integration Architecture for Intelligence Data, Proceedings of DAMA International Europe Conference, London, UK, 2008.

[15] Yoakum-Stover, S. and Malyuta, T. Unified Data Integration for Situation Management, Proceedings of the 4th IEEE Workshop on Situation Management (SIMA2008) at MILCOM 2008, San Diego CA, 2008.

Data Sources used in demonstration:

[16] Freebase. http://www.freebase.com/

[17] GovTrack.us: Tracking the U.S. Congress. http://www.govtrack.us/

[18] George W. Bush. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title = George_W._Bush&oldid=300819162

[19] Bill Clinton. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title = Bill_Clinton&oldid=301113520