Business PME is a gate of free information bound for the companies in the United States of America. This website offers thousands of contents as well as a companies directory.
The group’s other BtoB websites
-- Professional Networking
Saturday March 13th 2010
SearchData integration | ||
Data integration is the problem of combining data residing at different sources and providing the user with a unified view of these data. This important problem emerges in a variety of situations both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories). Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has been the focus of extensive theoretical work and numerous open problems remain to be solved. In practice, data integration is frequently called Enterprise Information Integration. HistoryThe problem of combining heterogeneous data sources under a single query interface is not a new one. The rapid adoption of databases after the 1960s naturally led to the need to share or merge existing repositories. This merging can be done at several levels in the database architecture[2]. One popular approach is Data Warehousing (see figure 1). Here data from several sources are extracted, transformed, and loaded into source and can be queried with a single schema. This can be perceived architecturally as a tightly coupled approach because the data reside together in a single repository at query time. Problems with tight coupling can arise with the "freshness" of data, for example when an original data source is updated, but the warehouse still contains the older data and the ETL process needs to be executed again. It is also difficult to construct data warehouses when you only have a query interface to the data sources and no access to the full data. This problem frequently arises when integrating several commercial query services like travel or classifieds web applications. The recent trend in data integration has been to loosen the coupling between data. Here the idea is to provide a uniform query interface over a mediated schema (see figure 2). This query is then transformed into specialized queries over the original databases. This process can also be called as view based query answering because we can consider each of the data sources to be a view over the (nonexistent) mediated schema. Formally such an approach is called Local (source/database) As View (LAV). An alternate model of integration is one where the mediated schema is designed to be a view over the sources. This approach called Global As View (GAV) is often used due to the simplicity involved in answering queries issued over the mediated schema. However, the obvious drawback is the need to rewrite the view for mediated schema whenever a new source is to be integrated and/or an existing source changes its schema. Some of the current work in data integration research concerns the Semantic Integration problem. This problem is not about how to structure the architecture of the integration, but how to resolve semantic conflicts between heterogeneous data sources. For example if two companies merge their databases, certain concepts and definitions in their respective schemas like "earnings" inevitably have different meanings. In one database it may mean profits in dollars (a floating point number), while in the other it might be the number of sales (an integer). A common strategy for the resolution of such problems is the use of ontologies which explicitly define schema terms and thus help to resolve semantic conflicts. ExampleConsider a web application where a user can query a variety of information about cities such as crime statistics, weather, hotels, demographics, etc. Traditionally, the information must exist in a single database with a single schema. Information of this breadth, however, is difficult and expensive for a single enterprise to collect. Even if the resources exist to gather the data, it would likely duplicate data in existing crime databases, weather websites, and census data. A data integration solution addresses this problem by considering these external resources as materialized views over a virtual mediated schema. This means application developers construct a schema to best model the kinds of answers their users want. This virtual schema is called the mediated schema. Next, they design "wrappers" or adapters for each data source, such as the crime database and weather website. These adapters simply transform the local query results (those returned by the respective websites or databases) into an easily processed form for the data integration solution (see figure 2). When an application-user queries the mediated schema, the data integration solution transforms this query into appropriate queries over the respective data sources. Finally, the results of these queries are combined into the answer to the user's query. A convenience of this solution is that new sources can be added by simply constructing an adapter for them. This contrasts with ETL systems or a single database solution where the entire new dataset must be manually integrated into the system. Copyright 2008 - France BtoB from Wikipédia
|
• ARM Holdings Company
• Web bug • Enterprise resource planning • Data processing • Barcoding Company • How ADSL works • Data warehouse | |