If your company is anything like some I've worked in, data is segregated, siloed and unstructured. Finance and Sales can’t agree on last month’s customer acquisitions, and worse yet, net revenue. Moreover, you’ve exhausted Excel’s row limits attempting to join data from disparate data sources so much so that your laptop’s processor is burning a hole in your desk, or in my case, sweatpants. If any of this sounds familiar, it’s time to elevate your company's data stack with a data warehouse.
In the last few years alone, there's been tremendous growth in data infrastructure technologies and the best practices that support this technology. From traditional and often clunky on-premise data warehouses, expensive and brittle ETL tooling and a heavy dependency on highly-skilled Data Engineers, we’re now observing the rapid shift to cloud-based SaaS data warehousing applications, flexible EL pipelines and the emergence of self-serve analytics tools for the non-technical user.
In this blog post, I’m going to talk about the telltale signs your business has outgrown its current, and in some cases, non-existent, data capabilities and how adopting a modern data stack would, among other things, increase the efficiency and speed in which you arrive at insights and address seemingly complex business problems.
Are you ready for a modern data stack?
The critical question you need to ask yourself before embarking on your data stack journey is: how do I know I’m ready? Simply put, when your productivity is diminished by your capacity to efficiently elicit meaningful business insight, it's likely time to move forward with a data warehouse.
Here’s my 4 top indicators you’ve outgrown your current analytical capabilities:
- Inefficient use of your time. What’s more precious than your time? Moreover, how can you ensure you maintain a high-degree of productivity? If you find most of your time is spent attempting to wrangle and consolidate heterogeneous data into a spreadsheet, only to keep returning to this spreadsheet on a monthly or quarterly basis, it’s likely time to elevate your data infrastructure. Without the proper infrastucture, simple, ad hoc questions about sales and revenues become time consuming and often times, frustrating to answer.
- High dependency on spreadsheet tools. There’s no denying that spreadsheets are essential tools for any business, but when we’re pushing 20 columns with upwards of 100,000 rows of data, we tend to overwhelm the utility of our spreadsheet application resulting in sluggish and often, error-prone consequences. Overtime, these consequences will worsen as we add more data and apply additional business logic. Moreoever, when freezing panes, creating pivot charts and graphs no longer cut it, it may be time to consider provisioning a data warehouse.
- Single source of truth. Is there misalignment on last month’s churns between Marketing and Finance? Do Sales and Revenue disagree on forecasted sales metrics? More often than not, asymmetry between people and departments is a product of siloed data collection and reporting, and an inability to gather and retain data centrally. Furthermore, different departments have distinct approaches to calculating the same thing. Often associated with a data governance framework, ensuring your entire organization has a single source of truth is paramount for the effective and efficient elicitation of business insights. A modern data stack would centralize your organization’s data and ensure pervasive application of the same data and the same business logic.
- Data security. Do you struggle with limiting reporting access to specific users? Should everyone have access to your company’s financials or are these metrics reserved for leadership only? Are you sharing confidential spreadsheets by email which can be subject to security threats? How do you ensure your company’s data does not traverse the public internet? Modern data stacks have made significant strides in the way of data security and user access, particularly with cloud-based data warehousing applications, ensuring data is encrypted in-transit and at rest. For example, cloud data warehouse Admins can grant user authentication, limit access rights through role grants and apply column masking based on the organization’s security protocols.
In today’s data-centric world, in order to stay competitive, one thing that is certain is that you must embrace a culture in which business decisions are supported by your data. As we just learned, when the volume of data and the complexity of your use cases increases, managing your data will routinely become more difficult, and the processes that may have once worked may no longer drive any utility. Combating the previously mentioned symptoms require robust infrastructure that can make data available proficiently and a framework that will guide the management of your data as your business scales. Enter the Modern Data Stack.
The Modern Data Stack
Increasingly, I’m observing companies with relatively small data teams and operating budgets adopt cloud-native data warehousing and business intelligence infrastructure that embrace the flexibility and scale that cloud tech offers. In addition, with the rise of free, self-serve enterprise-grade data vendors and options for open-source tooling, traditional barriers of entry are being dismantled and companies of all sizes can build infrastructure that significantly speeds up time-to-insights. The process of provisioning this stack is simple, future-proof and requires minimal technical oversight from Data Engineers. Let's now take a look at the common ingredients of a modern data stack.
Blueprint of the Modern Data Stack
The core components of a modern data stack are typically made up of a cloud-based data warehouse, data pipelines and connectors, a business intelligence platform and workflow orchestrator that manages the propagation and transformation of data through the stack.
Cloud-based Data Warehouse: to ensure a single-source of truth, a data warehouse serves as the central location to collect, store and integrate data from all your disparate data sources. Unlike traditional on-premise data warehouses, SaaS cloud data warehouses leverage the storage and computing resources allocated by your cloud provider to ensure your data infrastructure is available, scalable and secure. Popular services providers include: Snowflake, AWS Redshift and Google’s BigQuery.
Data Pipelines/ Ingestion: a connector service that offers flexible solutions to easily load diverse data streams into your data warehouse with minimal engineering effort. Selecting a data pipeline management tool depends on the variety and unique nature of source systems, data structures and data types you plan on collecting in your data warehouse. Common data source types may include
- Analytics (Google Analytics, Amplitude),
- CRM (Salesforce, Hubspot),
- Advertising (Google’s AM360, Facebook),
- Customer Success (ZenDesk, Shopify),
- ERPs (SAP ECC, Oracle’s Netsuite, Microsoft Dynamics),
- Email Marketing (Mailchimp)
Connections to third-party data sources are typically established one time and as your data sources grow or change, you can leverage existing connector infrastructure to funnel data into your data warehouse. Popular service providers include Stitch and Fivetran.
BI and Analytics Platform: a powerful visualization and data science platform that can take advantage of the consolidated data warehouse to offer agile exploratory analysis and an ability to generate richer business insights. Once data is available in your data warehouse, you need a BI and Analytics platform to make sense of it. The primary goal of this layer is to make data actionable - with modern BI tools, we now have the ability to look beyond the bar chart and automate marketing propensity models, monitor and detect security and intrusion threats and in general, automate what were once very rudimentary and manual reporting tasks. Popular services providers include Narrator, Tableau and Looker.
Data Orchestration and Transformation: the final element in a typical modern data stack, data orchestration and transformation utilities add an additional modeling layer for your business logic. Though both your data warehousing and BI applications may offer this functionality, it is recommended that you decouple your transformations and attributions from these other layers in your stack so that it can be accessible to other tools and applications that may want to leverage them. In addition, these tools can coordinate and schedule the sequence and dependencies of all your data pipelines. Popular services providers include dbt, Airflow and Narrator.
The strengths of this proposed architecture leverage the availability and scalability of modern cloud-based applications that can handle voluminous and diverse datasets and require limited technical oversight from engineering resources.
Additional benefits of this approach, particularly for smaller shops, include the low costs to provision and maintain this stack, the speed and ease of getting started, and the wide variety of data applications and features to select from.
Only a few years ago this proposed data architecture would not have been possible. With the advancement of cloud-based solutions, even early-stage businesses can deploy flexible infrastructure allowing them to spend less time engineering their data management applications and more time analyzing their data.
For more inspiration, Jason Harris of Fivetran lists his Five Reason to Consider a Modern Data Stack and Tristan Handy, founder of dbt, offers his thoughtful predictions on The Future of the Modern Data Stack.
A great place to get started is Mode's guide to building a modern data stack in 30 minutes.