Data Stack Modernization - Backcountry.com

 
Project Type:  Data Warehouse  |  Data Pipelines  |  Machine Learning

 
 
backcountry_logo.png

Backcountry.com is an online specialty retailer that sells clothing and outdoor recreation gear. Backcountry.com also operates online brands such as Competitive Cyclist, MotoSport, BergFreunde.de, and Steep and Cheap.

 
 
 

Overview

Starting in early 2020, Backcountry and Ternary Data embarked on a mission to modernize Backcountry’s data architecture. Originally on AWS, Backcountry evaluated many options and settled on Google Cloud and Looker. This involved a complete overhaul of Backcountry’s data warehouse and machine learning infrastructure and practices. Ternary Data helped Backcountry’s data team evaluate several architectures on Google Cloud Platform, before proving out a final configuration for modernization and building this workload in production.

 

Summary of Results

  • Data is available and useful across the business. Actionable data is readily available for the business and machine learning projects, using BigQuery and Looker.

  • Reduction in query times for key data. With BigQuery as the data warehouse, query times for historical sales went from taking 10 hours to under 3 minutes.

  • Data science has quick time to value. Data scientists are able to experiment and iterate very quickly using Google Cloud’s AI Platform, BigQuery ML, and AutoML.

 

The Challenges

When Backcountry engaged Ternary Data for data architecture strategy, it was clear that they needed an overhaul of their legacy data infrastructure. Some challenges included:

  • Analytics and Reporting -  The business and data scientists struggled to access data within their oversubscribed and cost-prohibitive Oracle on-premises data warehouse. Queries were slow at an average return time of more than 97 seconds, often timing out after several hours. Storage was extremely limited due the cost of scaling databases that combine storage with compute, forcing pruning of valuable data. As a result, Backcountry could only store 1 year's worth of web data, alongside its historical transaction data. This meant over 90% of the company's available useful data like Google Ads, Facebook, historical clickstream, machine learning outputs, or large cost centers like hourly employee costs were not available or accessible by analysts and data scientists. Additionally, Backcountry used OBIEE for its business intelligence, which was cumbersome and lacked both basic and advanced functionalities the team needed to improve data governance and reporting automation.

  • Overly complex infrastructure - Backcountry was maintaining both a data lake (Amazon S3 with Databricks) and a data warehouse (Oracle). Managing data and schemas in the data lake was a significant labor cost, and keeping data in sync across Oracle, S3 and other systems was a constant struggle. They also use some legacy Microsoft SQL Server and Postgres databases for reporting right now, which will eventually be deprecated.

  • Data pipelines - Backcountry used Talend, which functioned in a pure chronological and on-prem stack. As the company grew they increasingly required event-based (Airflow) pipelines, and cloud-native partners.

  • Machine learning - Data science initiatives were slow to deliver value due to the limitations of the Oracle data warehouse and incoherent data pipelines. While the team had shipped demonstrably high value machine learning models, scaling beyond basic machine learning, or scaling beyond a single department was made impossible by the stack. Even a 10% downsampled training dataset often took over 10 hours to pull from the database, and most of the time these larger analytic queries would simply fail after a few hours of running. Additionally the lack of data governance and clear business-defined/democratized data definitions and metrics meant that even when models could be built, they could be optimizing toward incorrect metrics.

 
 
In our data journey we spoke with HBR authors, CTO’s, and many consultants. Ternary shined through in each stage of our journey, helping us to disseminate often contradictory advice. Ternary was the partner we needed to see us through the entire process, bringing practical context and knowledge from the research stage all the way to reflecting on a successful cloud migration and data modernization initiative. When we hit architectural cruxes, or lacked internal experience with a problem. Ternary “taught us to fish.”
— Taylor Bentz, Director of Data at Backcountry.com
 
 
 

The Requirements

  • Analytics and Reporting - The business must get fast and actionable insights, with an SLA of 30 seconds for query response. 95% of queries must return with SLA.

  • Simple Infrastructure - The system should be cost effective and serverless wherever possible - Avoid undifferentiated heavy lifting. Allow data engineers to focus on data rather than operations.

  • Data pipelines - The new data pipelines must scale as data and analytics requirements grow in the future. The tool must seamlessly integrate with BigQuery.

  • Machine learning - Data scientists need a platform for rapid experimentation and the ability to put their models into production.

  • Marketing- Backcountry wants to integrate Google Analytics, Ads and YouTube for marketing performance analytics. This is a big deal for Backcountry’s marketing department.

 

The Solution

Given the requirements, Ternary Data architected a cost-effective and low maintenance solution.

  1. Analytics and Reporting

    • BigQuery increased the speed of Backcountry’s decisions and reporting. As a benchmark, querying all of their historical sales used to take over 10 hours. Now, the same queries take under 3 minutes.

    • Actionable data delivered through BigQuery is readily available for the business and machine learning projects. Backcountry leverages BigQuery to quickly analyze and model customer lifetime value, personalization, clickstream, marketing data, and much more.

    • Looker is used for actionable analytics and visualization. Looker’s LookML provides sound data governance and consistency. 

  2. Simple Infrastructure 

    • The Google Cloud Platform Marketplace significantly reduces the complexity and cost of Backcountry’s data stack. Backcountry now interfaces with one main vendor, and pays one bill.

    • Backcountry’s data team builds expertise in one cloud platform, on Google Cloud’s best of breed data technologies.

  3. Data Pipelines

    • Cloud Composer is based on the open source Apache Airflow project and the Python programming language, making it a good fit for the team’s skillset. It provides deep integration with the entire GCP ecosystem, combined with the power to connect to nearly any data source through custom code.

    • Fivetran’s hands-off and seamless data pipelines into BigQuery greatly reduced the operational burden on Backcountry’s data team.

  4. Machine learning

    • Data scientists are able to experiment and iterate very quickly using Google Cloud’s AI Platform, BigQuery ML, and AutoML.

    • Backcountry can rapidly incorporate machine learning models into production, generating a lot of value for both Backcountry and its customers.

 
Backcountry.com - new data stack architecture

Backcountry.com - new data stack architecture

 
 
 

 

About Ternary Data

Ternary Data is a specialty data architecture and consulting firm based in Salt Lake City, Utah. Ternary Data advises and coaches companies on discovering the value in their data through cloud services and best practices.

Ternary Data is partnered with Google Cloud, Looker, Fivetran, and many more best in class data technology companies.