By: Alex Infanzon
Pure Storage Solutions Manager
For a number of years, market analysts have been writing about the evolution of traditional data warehouses. It started with the development of a scalable distributed file system for large distributed data-intensive applications (e.g., Google File System). It continued with new programming frameworks to support the processing and storage of extremely large data sets in a distributed computing environment (Hadoop, Spark, et cetera). And today (2017), we have large data warehouses using all-flash storage devices designed to replace spinning media, commonly used in enterprise storage arrays.
Even with these new developments, most market analysts agree the data warehouse, built upon a relational database, will continue to be the primary analytic database for storing much of a company’s core transactional data, such as sales transactions, customer data and financial records. These data warehouses will be augmented by big-data systems (data lakes). The data lakes are repositories for new sources of large volumes of machine-generated data such as: log files, social media data, videos and images. Furthermore, the data lake will be also used as a repository for more granular transactional data or older transactional data which is not stored in the relational data warehouse.
Even though this new information architecture consists of multiple physical data repositories and formats, the logical architecture is a single integrated data platform, spanning the relational data warehouse and the data lake.
How much is all this data worth? It seems like a silly question, until you recognize the financial valuations of companies that were built on data, like LinkedIn, Uber or Twitter. The value hidden in the data comes from being able to perform real-time, ad-hoc analytics, correlate data to various internal and external sources, and create a bridge between your data warehouse and data lake stores.
Data is the new business currency. Organizations continue to generate and store large volumes of digital data. It is estimated that between now and 2020, the global volume of digital data is expected to multiply another 40 times or more. Much of that new information will consist of personal details: where people have been, what products they’ve bought, what movies they like, which candidates they support—the list is nearly endless.
The volume, velocity and variety of data poses management and processing challenges. Starting with flexibility and scalability of the environment used to host it. The volume and velocity makes it difficult to ingest, store and optimally respond to both traditional transactional reporting and newer Big Data workloads. The data growth in source systems has an impact in loading and query performance.
Fortunately new architectures and technologies are changing the landscape of the modern data warehouse. More powerful computing and all-flash storage platforms are available for increasingly lower costs. At the same time, relational database management systems (e.g., Oracle 12c, PostgreSQL and others) continue to evolve and provide new features to address some of the challenges mentioned above.
Also, modern hardware data platforms using all-flash arrays empower businesses to implement agile ways to obtain the information needed. It also enables the analytical capabilities that are a critical part of an organization’s digital transformation and competitive strategy.
Modernizing your data warehouse using this approach unifies data and its processing, despite being strewn across multiple platforms. Users can choose the best approach for a given data workload or analytic goal, plus offload certain workloads from the data warehouse to the data lake and vice versa. FlashBlade is uniquely powered to help solve the challenges of data warehousing, management, and analysis. To learn more visit www.purestorage.com/flashblade.