Data warehouse 2 Non-edible products division Access the data within the data warehouse. Facilitating Business Change with Data Warehousing Data warehousing involves facilitating change in business processes. In addition to being able to make better, information-driven operational and tactical decisions, you gain insight into key areas that can help you make strategic decisions about the fundamental aspects of your business. Your data warehouse can act as an early-warning system to let you know that you might need to make some major business changes.
Both these decisions are, for the most part, tactical because they involve product-stocking decisions. Neither decision fundamentally changes the way your company does business, but rather helps you decide which products you want to keep in stock. But suppose that trend-line information coming back from the data warehouse shows relatively flat sales of all types of music and dramatically increasing revenues from DVD and video game rentals.
The future of your chain might lie not in sales of music at all, but rather in DVD and video game rentals. Rather than have to rely on a gut feeling, you can make data-driven strategic decisions about how your business should change. Gas guzzler sales are plummeting as fast as the price of gas is increasing.
The highly positive growth curve that sport utility vehicles SUVs had a few years ago is giving way to the hybrid and green products. Again, your data warehouse gives you the information to make data-based decisions, rather than have to rely on hunches or, at best, bits and pieces of data gathered from various computer systems. Although a certain amount of analysis is standard across companies, the interpretation and actions you formulate from your data assets can give you a competitive edge in the global marketplace. Leverage the knowledge and experience of other people by studying their experiences with data warehousing products; asking questions about the most difficult problems they encountered during product development and after they put their data warehouse into use; and determining how effective their users have been in making better, information-based business decisions.
- A Real-Time Approach to Process Control, Second Edition;
- Dray Prescot 16 - Savage Scorpio;
- Data Warehousing For Dummies Cheat Sheet - dummies;
- Data warehouse.
Ensuring That Your Implementations Are Unique A data warehouse is composed of many different components, each of which can be implemented in several perhaps many ways. Two companies in the same industry, for example, each might have a salesand-marketing data warehouse that supports users across four different business organizations, allows access by using the same business intelligence tool set, and uses the same database management system in which to store approximately 50 gigabytes of data.
A subject area is a high-level grouping of data content that relates to a major area of business interests, such as customers, products, sales orders, and contracts. Subject area and data content will drive your user access to this data and the associated presentation through business intelligence tools. Some users require only simple querying or reporting on the data content within a subject area; others might require sophisticated analytics.
These data access requirements assist in classifying your data warehouse. Depending on the class of data warehouse, a personal, departmental, or enterprise database management system may be required. This is where the business rules are put to action to assure that the data content is of the highest possible quality for broad user adoption. Chapter 3: Have It Your Way: The Structure of a Data Warehouse The data warehouse lite A data warehouse lite is a no-frills, bare-bones, low-tech approach to providing data that can help with some of your business decision-making.
No-frills means that you put together, wherever possible, proven capabilities and tools already within your organization to build your system. The term data mart is commonly used to refer to what I call the data warehouse lite. As often happens in information technology IT areas in which vendors jockey for market share, however, many people now think of a data mart as little more than a small-scale data warehousing environment.
If you concentrate on the aspects of the environment that drive the overall complexity of the implementation — breadth, database, data content, tools, extraction and movement, and architecture — you can avoid getting confused. Subject areas and data content A data warehouse lite is focused on the reporting or analysis of only one or possibly two subject areas. Suppose that in your job at a wireless division of a telephone company, you analyze the sales of services such as in-network minutes, out-of-network minutes, text messaging, Internet access, and other mobile usage to consumer households.
If you build a data warehouse lite exclusively for this purpose, you have all the necessary information to support your analysis and reporting for the consumer market. Based on the subject area limitation, a data warehouse lite has just enough data content to satisfy the primary purpose of the environment, but not enough for many unstructured what-if scenarios its users might create.
You must choose carefully, therefore, from among the set of all possible data elements and select a manageable subset — elements that, without a doubt, are important to have. Data sources A data warehouse lite has a limited set of data sources — typically, one to a handful. Denormalization is the opposite of the relational database concept of normalization, a somewhat complex set of guidelines that tells you which data elements should be in which tables in a database see Chapter 5. Figure shows an example of a single-source data warehouse lite built on denormalization.
Although you can use externally provided data see Chapter 19 in a data warehouse lite implementation, the data you use is rarely newly acquired. Chapter 3: Have It Your Way: The Structure of a Data Warehouse 2 Figure Denormalizing data from a single application restructures that data to make it more conducive to reporting needs. Reorganize the data by denormalizing it. Database Data warehouse lite solutions are limited by users, data content, and the type of business intelligence tools utilized.
These limitations are the primary reason that a data warehouse lite is usually built on a standard, generalpurpose relational database management system you can find out about RDBMSs in Chapter 5. Create a database backup that contains copies of all tables from the source that provide data to the warehouse, and then reload those tables into a staging area on the system where you plan to locate your data warehouse.
You should ensure that the network bandwidth and time window are adequate to copy all the source tables to the system by using a file-transfer program. Run a series of quality assurance QA routines to verify that all data has been loaded properly. Check row counts, numeric totals, and whatever else you can.
Architecture The architecture of a data warehouse lite is composed of the database used to store the data, the front-end business intelligence tools used to access the data, the way the data is moved, and the number of subject areas. The watchword of this environment is minimalist: no bells, no whistles, nothing fancy — just enough technology applied to the environment to give users access to data they need.
Figure The lowtech approach to moving data into a data warehouse lite: database backup tapes. Customer order application Shared Data up to a 1Back tape or file. Customer order application Figure The architecture of a data warehouse lite is built around straight-line movement of data. Data warehouse deluxe: Broadening the horizon Indirect sales Distributors Contracts Direct sales Territories Credit risk Inventory levels Figure A data warehouse deluxe has a broader subjectarea focus than a data warehouse lite.
Shared data Returns Product costing Customer Shared data Competitor sales Economic data Data warehouse Subject areas and data content A data warehouse deluxe contains a broad range of related subject areas — everything or most things that would follow a natural way of thinking about and analyzing information. For example, a user might run reports comparing trends in add-on services for businesses and consumers to see where to concentrate future sales-and-marketing efforts. When you implement a data warehouse deluxe, you almost always need access capabilities unlike with a data warehouse lite , in addition to simple results reporting.
Follow these steps to thoroughly understand your source systems: 1. Take a complete inventory of available information. This inventory is called a source systems analysis, as discussed in Chapter This step is one of the most severe tests of how well the IT people and business users get along throughout the data warehousing project.
Although the exact number of data sources depends on the specifics of your implementation, data warehouse deluxes tend to have an average of eight to ten applications and external databases that provide data to the warehouse. Although tool vendors increasingly try to provide suites of products to handle as many of these different functions as possible, you do have to deal with different products — and so does your user community. Database Data warehouse deluxe implementations are big — and getting bigger all the time. Implementations that use hundreds of gigabytes a gigabyte equals 1 billion bytes and even terabytes 1 trillion bytes are increasingly more common.
To manage this volume of data and user access, you need a very robust server and database. Data extraction, movement, and loading Prepare for the challenge! Batch windows, the time frames in which updates are made to the warehouse, are complicated by the number of data sources you have to handle.
If you could assign some difficulty factor an integer, for example to the process of getting data into the warehouse, the following measures would hold true: You have n data elements that you want to include in the data warehouse with a difficulty factor of x. To make this difficulty factor easier to understand, assign some numbers to n and x. Say that your data warehouse has elements n and the difficulty factor x is 5.
Data marts are discussed in depth in Chapter 4. There are few enterprises that have ventured in this direction, though due to overall cost and capabilities, it is still rare to find many data warehouse supremes. Instead, only part of your warehouse probably a small part is Chapter 3: Have It Your Way: The Structure of a Data Warehouse physically located on some data warehouse server; the rest is out there in cyberspace somewhere, accessible through networking capabilities as though it were all part of some physically centralized data warehouse.
With a data warehouse supreme, your warehouse users have an infinite number of subject-area possibilities — anything that could possibly be of interest to them. Think of how you use the Internet today to access Web sites all over the world — sites that someone else creates and maintains. Now, imagine that each of those sites contains information about some specific area of interest to you — rather than advertising, job ads, electronic storefronts, and whatever else you spend your time surfing the Internet trying to find. Also imagine that you can query and run reports by using the contents of one or more of these sites as your input.
The leading-edge corporations are beginning to pursue and deliver seamless convergence of different types of data: narrative documents, video, image, and ordinary data such as numbers and character information. A data warehouse supreme has all this — all the different types of data that you need to support better business decision-making. Data sources Because of the wide breadth of subject areas in a data warehouse supreme, it has numerous data sources. The bad news: Someone has to perform those tasks, and you have little or no control over elements such as quality assurance processes or how frequently the data is refreshed.
I have more good news, though: Because the most critical part of a data warehouse supreme is still internally acquired data the data coming from your internal applications , from that aspect, the things you do today to make the data warehouse-ready will still be done in the future. Because you populate your data warehouse supreme with multimedia information — in addition to traditional data, such as numeric, alphabetic, and dates — the types of data sources broaden from traditional applications to video servers, Web sites, and databases that store documents and text.
Of the four, the most significant advances and improvements during the next few years probably will occur with data mining while vendors push enhancements into their products. However, these user-access methods will be relegated to providing information that will be visualized in other forms. The business intelligence tools will enable users to pull information from the data warehouse supreme and integrate it with a better visualization — for instance, Google Earth or Microsoft Virtual Earth. Such combinations, known as mash-ups, are becoming more prevalent and enable users to see the data from the data warehouse supreme in more realistic forms — not columns on a report, but dots or shadings on a map.
Figure illustrates some of the ways in which intelligent agents can help you make very efficient use of data warehousing. Data warehouse Figure Intelligent agents are an important part of the pushtechnology architecture of a data warehouse supreme. When new data is inserted into the source database or existing data is modified or deleted , the appropriate instructions and accompanying data are sent to the warehouse. Architecture Figure shows an example of what the architecture of a data warehouse supreme might look like.
But with all the upcoming technology trends and improvements discussed in the preceding sections, your data warehouse supreme can look like almost anything you want. Direct data access services Directory, data access, transaction management. Initial loading services Restocking Push services refresh services Source 1 Source 2: DW-enabled Source 3 To Centralize or Distribute, That Is the Question Traditional data warehousing — creating a data warehouse lite or data warehouse deluxe — usually involves copying data from one or more different source databases and files into a single warehouse-owned database.
Whether that database is relational or multidimensional is irrelevant. While data warehouse environments become larger and more complex, though, you should consider not funneling all your data into a single database; rather, you could make your data warehouse a collection of databases that make up your overall information delivery and analytical environment.
Chapter 3: Have It Your Way: The Structure of a Data Warehouse Application 1 Application 2 Figure A data warehouse might consist of more than one database, under the control of the overall warehousing environment. You can much more easily get in and out here quickly with what you need.
This chain used to make fun of the warehouse-size competition by featuring radio ads that had helicopter search parties looking for shoppers lost in a distant department and references to shuttle buses having to take shoppers between departments in the warehouse-size stores. The idea of a data mart is hardly revolutionary, despite what you might read on blogs and in the computer trade press, and what you might hear at conferences or seminars. In an environment like the one shown in Figure , the data sources, data warehouse, data mart, and user interact in this way: 1.
The data sources, acting as suppliers of raw materials, send data into the data warehouse. Which financial accounts are trending high? What are the trends in sales?
- Cloud Data Warehousing for Dummies, Free Snowflake Computing Inc. eBook.
- Northanger Abbey (The Cambridge Edition of the Works of Jane Austen).
- Cloud data warehousing for dummies;
Data mart 2 Figure The retail-outlet approach to data marts: All the data comes from a data warehouse. The data warehouse serves as a consolidation and distribution center, collecting the raw materials in much the same way that any data warehouse does. The data marts order data from the warehouse and, after stocking the newly acquired information, make it available to consumers users. If your data mart is the only one within your company that needs additional data be sure to ask around , leave the warehouse alone and bring the supplemental data directly into your data mart. If other data marts or other projects served by the data warehouse can use the additional information, add that information to the data warehouse first and then send it, along with the other contents you need, to the appropriate data marts.
In many probably most of these situations, you create a quick-strike data mart — in effect, a miniature data warehouse but built to meet the demands of a set of users who need the data content now. You follow the same methodology and complete the same processes of data extraction, transformation, quality assurance, and loading as described in Chapter 3.
The difference is that you use this methodology on a smaller scale than you do with a full-blown data warehouse. As shown in Figure , you often need to bring data into a top-down, quickstrike data mart to answer a specific set of business questions within relatively narrow confines. Chapter 4: Data Marts: Your Retail Data Outlet Data warehouse Load some sales history specific products Figure A top-down, quick-strike data mart is a subset of Sources what can be built if you pursue fullscale data warehousing instead. Sales History database Load some product development data Product Development database Load some customer data specific geography Customer database Bottom-up, integration-oriented data marts If pressing business needs steer you toward a quick-strike data mart but you have a longer-term vision of integrating its contents with other data, what can you do?
Have you created an architectural dead end in your data mart? Will Harry Potter save his mates from Lord Voldemort? Sorry about that — the numerous releases are getting to me. You might be able to successfully use this approach. The answer to this question is typically that the data will be a subset of the overall enterprise data. The following sections describe some ways that you can select subsets of information for a data mart and the circumstances under which you might want to try each approach. Geography-bounded data A data mart might contain only the information relevant to a certain geographical area, such as a region or territory within your company.
Figure illustrates an example of geography-bounded data. Rocky Mountain Region data mart Load history for Figure Arizona and A Colorado sales geographybounded data mart contains Sources only the sales Sales information History relevant to database a given territory. When you create separate data marts for various geographical reasons, these types of comparisons become much more difficult to make. As shown in Figure , a bank might create one data mart for consumer checking-account analysis and another data mart for commercial checking accounts.
Commercial Banking data mart Figure Each organization in this bank gets its own data mart, tuned to its specific needs. For example, the commercial checking group has no need whatsoever to analyze consumer checking accounts and vice versa.
ISBN 13: 9780470407479
It pays to dig into the business needs during the scope phase of a data warehousing or data mart project. A multinational chemical company, for example, might create a data mart exclusively for the sales and marketing functions across all organizations and across all product lines, as shown in Figure Sales history Marketing South American product development and sales Chapter 4: Data Marts: Your Retail Data Outlet Market-bounded data A company might occasionally be so focused on a specific market and the associated competitors that it makes sense to create a data mart oriented with that particular focus.
To truly provide the business intelligence that a company needs in a competitordriven situation, construct the data mart to include multimedia information, in addition to the traditional data types typically found in a data warehouse. Chapter 25 describes multimedia data and data warehousing. External data: External data: Stock analyst Financial data Competitor A's Competitor A's Figure reports about about The data consumer product industrial product Competitor A Competitor A sales history sales history mart as a weapon of war: Collect information Our-company-versus-Competitor-A data mart about your market and competitors, and see Our consumer Our industrial Our Stock analyst how you product sales product sales financial reports about us stack up.
Based on the answers, a company might speed up or slow down production lines, start up extra shifts to increase production or initiate layoffs, or decide whether to acquire other companies. Business questions that have this degree of weighty importance traditionally cause nightmares for the in-house employees chartered with digging out data and reports, consolidating and checking the information, and reporting the results to executive management. Sounds like a job for a data warehouse, you say?
Later, this type of environment might grow into a larger-scale data warehouse. It often makes more sense, however, to concentrate your efforts on supporting a data mart that has known business value, instead of on supplementing it with volumes of additional data that might provide business value but can also slow response time or significantly complicate the end-to-end architecture. Again, the job you do in the early phases of your project makes a big difference in the direction you take and your level of success. Some take you into an architectural dead end because you get only limited value and have to start all over to expand your capabilities.
Data mart or data warehouse? Concentrate instead on your business problem and its possible solution. What data do you need in order to perform certain informational and analytical functions; where is that data now and in what form; and what do you have to do to make it available to your users? Leave the terminology wars to the vendors and analysts. Implementing a Data Mart — Quickly No matter how you decide to divide the universe of possible contents into some subset for your data mart, remember that in order to obtain maximum business value from your data mart, you must implement it quickly.
As described in Chapter 13, you spend the majority of your upfront time on the project focusing on the specific business value the end user wants and over several iterations build the solution into their vision. If you set aside two weeks for your scope, for example, stick to that window. Although costly and dangerous in any project data warehousing or otherwise , scope creep when additional feature requests keep creeping in long past the cutoff point can devastate a data mart effort. By adding these last-minute features, you probably add complexity to your data mart with only marginal incremental business value if any , so you do little other than put your project at risk.
But what kind of database do you need for your data warehouse? However, in nearly all situations, your project benefits significantly from the use of a relational database management system RDBMS. Chapter 6 discusses alternative technologies that you can use to store and manage your data warehouse.
The s generation of data warehousing implementations grew up on multidimensional databases, but the current marketplace is experiencing a clearcut trend toward using relational databases, particularly for large-volume data warehouses more than about gigabytes of data. The Old Way of Thinking Jump back to The data warehousing revolution was picking up steam, and companies all over the U. This relational-versus-multidimensional war ranked up there with the all-time champion of polarized technology proponents: the Ethernet-versus-tokenring-LAN local area network wars of the mids.
Proponents of either LAN technology fervently — and sometimes savagely — argued that their approach was far superior to the other and that only fools would attempt to implement a LAN by using the other technology. Relational-versus-multidimensional was a s version of this memorable classic! A technology-based discussion: The roots of relational database technology Relational database management systems RDBMSs have their roots in the relational data model developed by Dr. Codd, then with IBM, back in You can check out the paper on the Web, at www.
Throughout the s, in an attempt to commercialize relational technology, IBM and a few other organizations worked on prototypes, proof-of-concept systems, and early starts at product development. When you join tables in a relational database, problems start occurring. Back in the early, pre-relational days of databases, data usually was linked together by using pointers, which told the DBMS software the location of the next logical piece of data or the previous piece of data, or some other path that made sense. Forget all about the mathematical foundations of the relational model, the principles of normalization, and other highly technical aspects of RDBMSs.
In a typical spreadsheet program, columns and rows form a series of cells. You can combine information from across the various tables by joining those tables making a match between tables, usually by looking for columns in two or more tables that are the same. She must be going on a camp-out! The power of the relational database becomes evident when you join tables together. Back when databases used pointers, accessing logical sequences of data was fairly quick, even on older hardware. Data access went something like this: 1. Go directly to the first record you want usually from some type of index.
After reading the record, read the pointer attached to it that indicates where to go next. Go directly to the record indicated by that pointer. Read that second record, and then read its pointer. Repeat this process until you reach the end of the list or until some other criteria tells your program to stop.
To keep this example simple, assume that no two customers have the same name. Back to the history of relational databases this discussion is leading somewhere, I swear! Finally, near the end of the s, the corporate IT world most of it, anyway became convinced that you could use RDBMS products for more than just play. And along comes data warehousing. Here it is, summed up in a few sentences: OLTP applications typically access a small number of tables preferably one, but usually only two or three during a given transaction, a series of steps that either access data from a database or put data into it.
Data warehousing applications — the business analytics, or online analytical processing OLAP ones, as described in the following section — rarely access one, two, or three tables at a time. Rather, the nature of data warehousing bringing together a lot of information that, when the pieces of data are related to one another, provides business intelligence capabilities usually means that a single query must access a large number of tables. The way you design your relational database plays a large part in the numberof-tables situation you have to deal with.
To use an analogy, a four-wheel-drive, off-road vehicle and a sports car both have similar characteristics: an engine, four wheels with tires, and a steering wheel, for example. A sports car, however, has been developed to use the basic common framework of an automobile to go fast; in off-road situations out in the wilderness, the driver of a sports car is probably in trouble.
Part III of this book discusses the area of business intelligence, which is the reason you build a data warehouse. Go ahead — take a peek at the Table of Contents or jump ahead to those chapters. OLAP business analysis capabilities are only one of four different classes of business intelligence. To make a blanket statement that RDBMS technology is unsuitable for data warehousing means unequivocally that the only thing you want to do with a data warehouse is OLAP processing.
Again, that is incorrect. You will find the data warehouse valuable for data quality purpose necessary to integrate your run-the-business applications as well as other business intelligence capabilities, such as data mining and historical consolidated reporting. Fine-tuning databases for data warehousing Smart people run RDBMS vendors — at least, the vendors still in business after surviving the Great Database Wars of the s and s.
We have to do something about this situation. This inefficient means of joining tables caused these types of queries to run slowly. So, for the first couple decades in the history of RDMSs, the vendors focused on optimizing their products for run-the-business capabilities. Optimizing data access As described in the preceding section, the challenges that the RDMS vendors had to address involved the method in which users accessed data. Business analysis queries tend to pull two forms of data — descriptive data, often called dimensions; and measurement data, often called facts.
When a database was designed according to the principles of a star schema, the problems that occurred because of the large number of tables involved in a single query were dramatically reduced by using new data access optimizations, such as a star join — a different, more efficient way of doing joins when you have a large number of tables involved in a query.
Avoiding scanning unnecessary data When users performed their business analysis, they commonly ran into problems when they needed to scan large volumes of data. However, this type of question forces every row in the table to be read. Just think of a user asking this question of Wal-Mart, whose staff would have to read all the cash register receipts from all their stores worldwide.
In general terms, the old way of thinking forced the database to 1. Once again, envision this happening at Wal-Mart! So, RDBMS vendors came up with creative indexing schemes to assist in directly finding data that have high cardinality — basically, the number of unique values in a database column. Traditional indexing technology supported only low-cardinality indexing.
The new way of thinking introduced the concept of high-cardinality indexing. Handling large data volume In the early days of data warehousing, most data warehouses were rather small when compared to the current data warehouses, measuring about 50 gigabytes of data or smaller. While technologies mature and IT professionals become more comfortable with data warehousing and when they become more daring after getting a few successful projects behind them , those IT folks commonly need or want increasingly larger amounts of data in a warehouse.
They want not only larger amounts of data, but also increasing levels of detail not just summarized information. For years, the DBMS world has had a term that applies to very large databases. The term is very large database, or VLDB. VLDB has become increasingly synonymous with data warehousing. Parallel processing in the RDBMS world is a relatively complex proposition and a vendor-versus-vendor battlefield because of different ways of doing it. Furthermore, queries against the database table run in parallel against each of the partitions, effectively reducing the time it takes to get an answer to a query because each parallel query is operating against a smaller amount of data than in the non-partitioned entire table.
Chapter 5: Relational Databases and Data Warehousing Designing Your Relational Database for Data Warehouse Usage The traditional usage of relational databases, to support the run-the-business transaction-processing applications, has meant that you had to follow certain design principles. If you deliberately violated one of those principles, you had to handle your own work-arounds.
The following sections give you the story behind relational database design and data warehousing. Without getting into all the mathematical formalities, the following sections explain what normalization means. The official way to handle repeating groups, known in the world of conceptual data modeling as multivalued attributes, is to create a separate table that has a primary-key join column.
You go through this seemingly ridiculous, overly complex set of steps to ensure that a relational database is in first normal form meaning it has no repeating groups of data.
Data Warehousing For Dummies
You can also design a relational database to include other normal forms. Second normal form which has no partial-key dependencies and third normal form which has no non-key dependencies are the mostly commonly used forms in relational database design, though the seldom-used fourth normal form, fifth normal form, and Boyce-Codd normal form are also options. A database modeling book can give you more information about these terms.
To get to the point: A relational database is normalized when the rules of normalization are followed or mostly followed because of the way it is designed for and accessed during typical OLTP functions. Just remember that normalization is important for update operations — more specifically, for optimizing the runthe-business database design.
A data warehouse almost never permits update operations because data is modified by bulk reloading operations, rather than single in-place updates as part of a transaction. One of the side effects results of a highly normalized database is that it has many tables in it; to create data warehousing facts of information sales by quarter by territory, for example , many multi-table join operations must occur. These data architectures include the operational data store ODS , data warehouse, and data marts, along with various interfaces for applications and the operational environment.
The star schema design techniques leverage highly denormalized structures.
Data Warehousing For Dummies: Thomas C. Hammergren: bygyzohigeta.cf
You gleefully throw away the rules of normalization and put data where it makes the most sense, not based on update-oriented restrictions but based on query patterns. Ralph Kimball developed the star schema design technique. This hybrid approach encompasses the best of breed between third normal form 3NF and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise.
This data model is architected specifically to meet the needs of enterprise data warehouses. Inside the data vault model, you can find familiar structures that match traditional definitions of star schema and 3NF that include dimensions, many-to-many linkages, and standard table structures. The differences lie in relationship representations, field structuring, and granular timebased data storage. The modeling techniques built into the data vault have undergone years of design and testing across many different scenarios, giving them a solid foundational approach to data warehousing.
Dan Linstedt is the data vault method author, creator, and inventor. Choose your relational design approach carefully. Although people have a tendency to choose one method over another, the technology and tools that you leverage impact your design approach. Based on your needs and tool selection, one technique might work better than the others. Relational Products and Data Warehousing The following sections discuss some leading relational database products that you might want to use for your data warehouse. Almost all these vendors have, during the past few years, acquired additional products, including OLAP or multidimensional-oriented technology and other RDBMSs to support very diversified platforms and integrate into their product lines and architectures.
You might want to keep an eye on the whole picture because a data warehousing environment might well have both relational and multidimensional servers. IBM Data Management family www. Microsoft SQL Server www. Over the years, Microsoft has added features to enable organizations to expand capacity for data warehousing, changes which have made SQL Server the preferred platform for data marts. Oracle www. The model of facts and dimensions can also be understood as data cube. Where the dimensions are the categorical coordinates in a multi-dimensional cube, while the fact is a value corresponding to the coordinates.
In the normalized approach, the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories e. The normalized structure divides data into entities, which creates several tables in a relational database.
When applied in large enterprises the result is dozens of tables that are linked together by a web of joins. Furthermore, each of the created entities is converted into separate physical tables when the database is implemented Kimball, Ralph The main advantage of this approach is that it is straightforward to add information into the database. Some disadvantages of this approach are that, because of the number of tables involved, it can be difficult for users to join data from different sources into meaningful information and to access the information without a precise understanding of the sources of data and of the data structure of the data warehouse.
Both normalized and dimensional models can be represented in entity-relationship diagrams as both contain joined relational tables. The difference between the two models is the degree of normalization also known as Normal Forms. These approaches are not mutually exclusive, and there are other approaches. Dimensional approaches can involve normalizing data to a degree Kimball, Ralph In Information-Driven Business ,  Robert Hillard proposes an approach to comparing the two approaches based on the information needs of the business problem. The technique shows that normalized models hold far more information than their dimensional equivalents even when the same fields are used in both models but this extra information comes at the cost of usability.
The technique measures information quantity in terms of information entropy and usability in terms of the Small Worlds data transformation measure. In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. These data marts can then be integrated to create a comprehensive data warehouse. The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts , which are dimensions that are shared in a specific way between facts in two or more data marts.
The top-down approach is designed using a normalized enterprise data model. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. Data warehouses DW often resemble the hub and spokes architecture.
Legacy systems feeding the warehouse often include customer relationship management and enterprise resource planning , generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load process, data warehouses often make use of an operational data store , the information from which is parsed into the actual DW. To reduce data redundancy, larger systems often store the data in a normalized way. Data marts for specific reports can then be built on top of the data warehouse. A hybrid DW database is kept on third normal form to eliminate data redundancy.
A normal relational database, however, is not efficient for business intelligence reports where dimensional modelling is prevalent. Small data marts can shop for data from the consolidated warehouse and use the filtered, specific data for the fact tables and dimensions required.
The DW provides a single source of information from which the data marts can read, providing a wide range of business information. The hybrid architecture allows a DW to be replaced with a master data management repository where operational, not static information could reside. The data vault modeling components follow hub and spokes architecture. This modeling style is a hybrid design, consisting of the best practices from both third normal form and star schema. The data vault model is not a true third normal form, and breaks some of its rules, but it is a top-down architecture with a bottom up design.
The data vault model is geared to be strictly a data warehouse. It is not geared to be end-user accessible, which when built, still requires the use of a data mart or star schema based release area for business purposes. There are basic features that define the data in the data warehouse that include subject orientation, data integration, time-variant, nonvolatile data, and data granularity.
Unlike the operational systems, the data in the data warehouse revolves around subjects of the enterprise database normalization. Subject orientation can be really useful for decision making. Gathering the required objects is called subject oriented. The data found within the data warehouse is integrated. Since it comes from several operational systems, all inconsistencies must be removed. Consistencies include naming conventions, measurement of variables, encoding structures, physical attributes of data, and so forth.
While operational systems reflect current values as they support day-to-day operations, data warehouse data represents data over a long time horizon up to 10 years which means it stores historical data. It is mainly meant for data mining and forecasting, If a user is searching for a buying pattern of a specific customer, the user needs to look at data on the current and past purchases.
The data in the data warehouse is read-only which means it cannot be updated, created, or deleted. In the data warehouse, data is summarized at different levels. The user may start looking at the total sale units of a product in an entire region. Then the user looks at the states in that region.
Finally, they may examine the individual stores in a certain state. Therefore, typically, the analysis starts at a higher level and moves down to lower levels of details. Advanced Search. By using tdwi. Learn More. Today, companies have a much greater collection of more relevant data than ever before. This includes a diverse range of sources, internal and external, including data marts, cloud-based applications, and machine-generated data.