As the Director of Data Science at Paytronix, Jesse Marshall leads a team of seven, which includes two data scientists and five data engineers. The team works closely with the larger strategy and analytics (S&A) team, which provides clients with insights they need to truly engage with guests. That’s why, explains Marshall, his own team’s No. 1 priority is to make data actionable for other departments: “In this day and age, there’s no shortage of data—but to have data and to have data that is actionable are two very different things.”
Marshall’s team was challenged with collecting, organizing, and deriving insight from data coming from a multitude of sources, running on multiple databases, and in disparate formats. “We see data in so many different forms so we have to be flexible with how we can ingest the data,” he says. “All that data comes into Snowflake, and if data is not actionable, it’s not very valuable.” According to Marshall, getting data into Snowflake was the easy part, since it is so flexible in regards to data types. But the team still had the issue of how to make that data actionable. “When we’re doing an ETL, how do we make it so that it’s not a weeks-long project to get the data into the different departments’ hands?”
Watch on-demand webinar:
How Paytronix uses Coalesce to eliminate data delays, inefficiencies, and the high costs of performing data transformations in Looker.
Back then, the company was using a mix of Scala and PySpark jobs for data transformation—custom code, hand written. This was a great structure for analytics at the time, but it became clear that it was not able to keep up with the growing demands of the business. Marshall wanted to get ahead of the game and be ready for increased sales and customer demand. The increasing scale was putting pressure on the platform, and a lot of time was dedicated to maintenance and break-fix support. In addition, the technology had such a long learning curve that only a small number of people had enough knowledge to be able to work on it.
“I was frustrated with how long it took from requesting a certain table or a pipeline to it getting built. I wanted to completely rethink that process,” Marshall says. “Every time you had a change, it was like starting from scratch with the pipeline again. Say you had four big projects a year, and those four projects all had to be good ideas that you fully delivered on. If one of those four projects didn’t work out for whatever reason, it was a really big hit to your overall contribution.”
Marshall knew this type of approach didn’t work well when it came to data science projects, which usually began with just an idea and a rough set of requirements. “When you start a [data science] project, you never start with 100% clear requirements,” he explains. “As the idea becomes clear, as you learn things and train the models, you need to make tweaks to the pipeline to add some features and take some away. You’re trying a lot of things; some will work and some won’t.”
This was a completely different approach than the company took to maintaining its core business platform, which needed to work without interruption and where there was no room for experimentation. But Marshall believed the data science side of the business should be more R&D, and he wanted to completely change the dynamic so there would be little effort required for his team to come up with a pipeline or an idea, get things to a proof-of-concept phase right away, and then test it quickly. Or, as he puts it, he wanted to take a “Moneyball” approach: “With the data team, instead of say four ideas (and these numbers are arbitrary), I wanted it to be five times that, so say 20 ideas. And instead of a 25% failure rate, I wanted it to be 50% or 75%. I wanted the team to try a lot of things and fail very quickly.”