Data Lake, Data Warehouse and Data Mart are three ways of storing and analyzing data, three approaches of data management architectures, helping the business to answer different kinds of questions from their data.
A Data Lake contains raw data in its native format (photos, emails, audios, videos, documents, ..), without being processed; it is like an enormous repository. This raw data could be refined and processed to integrate a Data Warehouse later, but in the Data Lake it remains untouched. Data in a Data Lake, is schema-on-read (that can make it slower), comes from diverse sources, and could be used for big data processing, data science and machine learning (for exploratory analysis and model development) and real-time analytics. A Data Lake could also store historical data - due to its cost-effectiveness characteristic -, which could be valuable for future analysis. So, scalability is another characteristic of a Data Lake, being designed to handle large volumes of data.
A Data Warehouse is a large, centralized and unified storage system that has structured and processed (cleaned, transformed and organized into star or snowflake schemas) data from multiple sources, enabling cohesive datasets, for business analysis (including historical trend analysis over time), reporting and strategic decision-making; it is the core analytics system of an organization. To help ensure consistency and reliability, the structure is defined before data is loaded - schema-on-write. At same time, Data Warehouse has a higher performance, being optimized for complex queries in larger datasets and analytical workloads.
A Data Mart is a focused subset of a Data Warehouse, destined to serve the specific needs of a business unit, a department or a group of users. Concentrating data on a specific subject area (such as sales, finance or marketing), it assures a faster access to relevant data,improved/optimized performance (reducing the load on the central Data Warehouse) and user autonomy (allowing users to control their data according to their unique requirements). As a subset of a Data Warehouse, it keeps a simplified design, being smaller, less complex and easier to manage.
Comentários
Enviar um comentário