Redwood Logistics Ships a Smarter Data Architecture to Power Faster Supply Chain Insights

This fourth-party logistics provider delivered a new approach to its Data Vault setup, with Coalesce a core part of the payload

Company:
Redwood Logistics
HQ:
Chicago, IL
Industry:
Supply chain and logistics technology
Employees:
1,000
Top Results:
15–20
minutes
to propagate changes to production versus 4 days
15x
faster
data refresh, down to just 2 minutes from 30 minutes
83%
reduction
in time to build a new financial workflow model

“The functionality Coalesce provides—not just the UI element, but also the raw engineering capabilities to break open node types and write your own code with Jinja and YAML—has really elevated us to a new type of engineering.”

Paul Nogle
Lead Data Engineer, Redwood Logistics

Redwood Logistics is a 4PL (fourth-party logistics) provider, which includes end-to-end supply chain management along with the technology and integrations to support it. The company offers a logistics platform as a service (LPaaS), which gives clients a flexible framework for managing supply chains. At its core is RedwoodConnect, a cloud-native integration platform as a service (iPaaS) to help clients streamline their operations, reduce costs, and enhance visibility across their entire supply chain.

The company was originally founded in 2001 as a brokerage and has grown quickly, both organically and through strategic acquisitions. Redwood serves businesses large and small across numerous industries, including consumer packaged goods (CPG), retail, manufacturing, and more—any customer who needs to move freight. The 4PL services Redwood offers include systems management, integration, and implementation; transportation optimization; warehousing and distribution; and data analytics and reporting. Gaining comprehensive insights into every aspect of the supply chain empowers Redwood’s customers to manage their inventory proactively and respond quickly to any supply chain disruptions.

Heavy load in the slow lane

Challenges

Internal teams lacked confidence in dashboard data due to unclear data lineage
Even simple tasks took days due to a complex deployment pipeline
Fragmented, convoluted architecture built on stored procedures and multiple transformation systems was hard to maintain
Engineers had to write repetitive code even for similar data structures, slowing down delivery

The Redwood Logistics data organization is made up of three different teams: Data Engineering, Data Analytics, and Data Science. On the Data Engineering team, Matt Norris serves as the Principal Data Engineer and Paul Nogle as Lead Data Engineer. Along with four additional engineers, they oversee any type of ETL process involving raw data that comes in from a variety of sources. “Redwood uses several transportation management systems, so we have many source databases that we use operationally,” explains Nogle. “We pull all that data into Snowflake, model it, and then put it out to data marts for other teams to consume. We also federate some external data sources into our models alongside our operational data.”

The Data Analytics team uses this cleaned-up data to build and power various dashboards used throughout the organization, created in Power BI or Tableau. A number of groups across Redwood rely on these dashboards to do their work: the finance team doing their month-end reporting, operational leaders keeping track of various performance metrics, and executives monitoring the current state of the business across the entire enterprise. But early on, Norris and Nogle realized they needed to address the lack of confidence some of these internal teams had in the data they were seeing in their dashboards.

“Our end users would come to us to say they didn’t really understand a number they were seeing,” says Nogle. “They’d frequently ask us to walk them through how the data in the operational system translated to what they saw in the dashboard.” Because they didn’t have any data lineage in their previous state, illustrating the full flow of how data got from point A to point B was a time-consuming, manual process that ate up a lot of the team’s time.

Another major challenge they faced was simply their unsustainable workload, which had grown to be too much to effectively manage for six data engineers who were all writing their own code. “To give you an idea of our process, let’s say someone asks us to add a new column to a table,” Nogle says. “While it takes one engineer no time at all to write the code in our local environment, it will take two days to propagate it into our QA environment and then into our testing layer, and then they will have to prepare all the scripts to submit for our change approval process. It will eventually get deployed three or four days later.”

For the past six years, the team’s data architecture has been centered around a Data Vault implementation in Snowflake. “We originally built it using stored procedures and tasks, which required a lot of manual coding and lacked built-in lineage,” says Nogle. “When we first rolled out Data Vault, we relied on Views, but the complexity of our transformations at that time was too heavy for Views—they consumed too much compute and became cost prohibitive.”

Eventually, the team switched to stored procedures to have better control over temporary tables instead of trying to store it all in memory. While that improved performance, they lost visibility into data lineage and introduced challenges around code maintainability. Today, their Data Vault sits across four or five different data transportation management systems. “On top of that, we’ve built our data warehouse layer—composed of dimension and fact tables—and then two data marts that provide business-specific semantic layers to end users,” explains Nogle.

Initially, setting up a new data architecture presented a huge challenge because with the team’s original setup, they were forced to manually model everything from the raw data layer up. “Once that was done, if the new architecture was similar to something we’d already built, we could copy and paste parts of it and tweak as needed—but even then, we’d still be writing a lot of repetitive code by hand,” he says. “Whether we were building a satellite, a link, or a hub, it didn’t really change what we were doing conceptually. It was just a matter of selecting the right fields from the right source tables.” But that whole process was highly manual and time-consuming, and they wanted to be able to move faster and deliver to their users more quickly.

Clearing the roadblocks

Solution

Rearchitected data environment to modernize use of Data Vault and reduce redundant compute
Adopted Coalesce to automate CI/CD and orchestration, replacing manual deployments and Snowflake tasks
Introduced embedded data quality checks to proactively detect issues and increase trust in the data

Instead of ripping out their existing tech stack, the team has launched a project to rearchitect their environment to better align with modern data engineering practices. “The new data architecture will largely mirror what we have today,” explains Nogle. “We’ll still have all our raw data sources, a Data Vault 2.0-ish, and a data warehouse and data marts.” Currently, he explains, the data warehouse and the data marts run as separate Snowflake tasks, each executing the same code and unnecessarily doubling compute: “Moving forward, we want to consolidate the architecture so the data warehouse serves as our ‘golden’ layer, and the data marts just pull from that.”

Because they are building everything modularly, Nogle explains, they will be able to refresh only the parts of the code they need, and at the frequency their stakeholders require, without having to double-dip into compute. “So while our architecture will remain similar, we’re taking a more granular approach to our Data Vault architecture,” he says. “And we’re redesigning the data warehouse and data mart setup—reevaluating tables versus views, stored procedures versus views and tables.”

A significant part of their data architecture evolution is the adoption of Coalesce. Norris explains that a lot of the major changes will come in how they handle continuous integration and delivery (CI/CD) and orchestration. “Right now, all our orchestration is done natively through Snowflake using tasks—it’s just cron definitions on stored procedure calls. But that’s all moving into Coalesce,” he says. “Our deployment process today is largely manual. Moving all this stuff to Coalesce is going to give us built-in CI/CD, which I think will be huge.”

“Another big improvement with Coalesce will be the addition of embedded data quality checks,” adds Nogle. “Right now, we collect metadata about our tables, but we’re not really servicing that in any practical way where data consumers can assess if the data is trustworthy. We only hear about issues when someone notices something’s wrong.” He says that by using Coalesce, they will move toward actively checking that the data looks as expected at each step. “That way, we’ll be alerted to issues so we can react faster. Long term, the goal is to make that quality information visible—whether that’s through a cataloging tool, such as Coalesce Catalog, or a custom dashboard.”

In addition to rethinking orchestration and data quality, the team is also evaluating their broader tooling strategy. As part of the redesign, they made decisions about how to use Azure Data Factory, which isn’t designed to support transformations at the scale and complexity they require. “We use ADF mainly for data ingress and egress—API ingestion and SFTP transfers where Snowflake external stages don’t work,” says Nogle. “We don’t use ADF for transformation, as our entire transformation layer is native to Snowflake. Once our Data Vault rebuild is complete, we plan to centralize any remaining niche transformations in Coalesce.”

“Coalesce has completely rewired our brains as engineers in terms of how we think about coding, process, and compute.” —Paul Nogle, Lead Data Engineer, Redwood Logistics

Supercharging the supply chain

Results

Tasks that previously took four days, such as propagating changes to production, now take as little as 15–20 minutes
Cut data refresh time from 30 minutes to under 2 minutes by rewriting Coalesce node types to better fit new Data Vault approach
83% reduction in time to build a new financial workflow model

Within just a few months of becoming a customer, the team is already seeing Coalesce’s significant positive impact on their work. “Our original goal was to be able to handle more work items each sprint,” says Nogle. “Today, we’re no longer spending so much time writing boilerplate code—our developers are just clicking a button or making a simple change to a table definition. That was one of the things that really appealed to us about Coalesce.” Another is the ease with which they can now easily track data lineage. “We can give our users more visibility and trust into how we’re moving data from point A to point B,” he says.

Those seemingly small requests that used to take up to four days to accomplish—such as propagating a change and deploying it through to production—now take just 15 to 20 minutes to complete. “With Coalesce, we can just click a button and get it all done,” says Nogle.

On top of providing improved visibility into data lineage, Nogle is excited that Coalesce will help them tackle another big item that has been on their to-do list for a while now: developing a data dictionary. “It’s something we’ve always wanted to do but weren’t previously able to deliver,” he says. “Everybody wants to document as you go, but documentation can often fall by the wayside. There’s always that trade-off: do you want the actual product or documentation about the product? What’s great about Coalesce is that automated documentation is built into the process—no need to do it manually. That’s incredibly useful, and I know our end users are going to appreciate it.”

Nogle predicts their next big win with Coalesce will be improvements to how they process raw data, particularly the rearchitecture of a cumbersome, expensive ingestion pipeline. “Our initial process was very cumbersome—it took a long time and cost a lot of money, running across three different source systems in 15-minute intervals on a large warehouse every day,” he says. Most of this is operational transportation data, such as the costs associated with moving a truck from point A to point B, as well as the type of truck being used and the commodities it is transporting.

Using Coalesce, one of their engineers rearchitected this process to run on a smaller warehouse in the same amount of time. “He rewrote that entire segment of code just once and applied it to all 25 tables—it’s much more efficient than having to write that code table by table.” The new pipeline is already delivering results, with a 40–60% reduction in daily compute. Two more iterations of the same pipeline—tailored for slightly different sources instances—are planned, promising even greater savings. Says Nogle, “One of our engineers was able to complete both in about a week, a task that would likely have taken 3–4 weeks without Coalesce.”

The team is now taking a step back and completely rethinking how they approach Data Vault, writing custom node types that best fit their specific use case. In a preliminary test, Nogle says they were able to stand up an entire slice of raw data to the Data Vault and then to the data warehouse—running on a smaller instance—in just 1.5 to 2 minutes. He explains that historically there were two segments that took 30 minutes to run on medium or large warehouses: “So we’re drastically decreasing our actual time to refresh that data—about 30 minutes down to just 2 minutes max—not to mention significantly reducing compute resources by going to an extra small warehouse.”

“There’s a lot of flexibility with Coalesce if you’re willing to take advantage of it, and we’ve been able to easily make it work the way we want it to.” —Matt Norris, Principal Data Engineer, Redwood Logistics

With Coalesce becoming a regular part of their workflow, the team is already much more productive. “We’ve often heard Coalesce described as a ‘force multiplier,’ and we completely agree,” says Nogle. “Just the process of rebuilding our Data Vault is a good example. The original project took three full-time engineers over a year to complete, with a team of consultants initially writing much of the code. But now, it’s taken a junior developer just four weeks to rebuild one of our source systems. Yes, they’re working from existing code, but the speed of delivery has increased dramatically—that’s a massive time savings compared to before we had Coalesce.” Nogle notes that the slowest part of the rebuild is just taking a step back and making sure they are making the right design decisions: “Once we have a pattern, the team is able to absolutely blaze through node creation and stand objects up super quickly.”

In addition to the larger rearchitecting project, Nogle is starting to use Coalesce for smaller side projects that promise to benefit the business. One example is a newly built key financial workflow model that replaces a third-party solution. “I completely rebuilt the model and can now run calculations for the entire company in under a minute, with full data lineage and calculation documentation to support auditability and transparency,” says Nogle. The new system took 10 hours to build—a task that would normally have required 40–60 hours of programming time.

The Coalesce-native build empowers business users to understand and trust the output. “Users can now go into Coalesce, see how the amount was calculated, and suggest changes based on what we’re displaying through the Coalesce docs,” he says. “It’s powerful to be able to give that level of visibility into something as important as financial information.” He adds that Coalesce features such as column propagation, drag-and-drop columns, and the editor’s create/run-all tools make ongoing updates remarkably easy. “Coalesce Transform’s API endpoint also lets us embed the process into the Streamlit in Snowflake (SiS) app we built for the team—giving them true ownership of the workflow, while we rely on Coalesce for the transformation layer.”

As for the future, Nogle says that once Coalesce is fully implemented and they have a solid foundation in place, they will be able to move faster and start tackling some new projects. One area he hopes to improve is visibility into tracking, such as where trucks are in the process and what the full lifecycle of a truck’s load looks like. “We’re also working on bringing in item-level detail for internal reporting,” he adds. “Historically, the answer has been to tell people to look it up in the operational TMS (transportation management system), but we want to provide a more comprehensive, enterprise-wide view for our internal stakeholders.”

While a big part of Coalesce’s value is the ease of using its visual interface, both Nogle and Norris appreciate how flexible they have found the platform to be. “The functionality Coalesce provides—not just the UI element, but also the raw engineering capabilities to break open node types and write your own code with Jinja and YAML—has really elevated us to a new type of engineering,” says Nogle. “It has rewired how we approach problems.” Norris adds that the ability to easily customize Coalesce has been key: “There’s a lot of flexibility with Coalesce if you’re willing to take advantage of it, and we’ve been able to easily make it work the way we need it to. Every member of our team has positive things to say about the platform.”

Nogle recalls that, in the beginning, there was some initial hesitancy from his developers about using a low-code, UI-driven solution, who feared they would be giving up control. “But once our engineers saw how Coalesce automates the tedious parts of programming while still giving them full control—especially when it comes to creating our own node definitions—their perspective shifted,” he says. “It was very cool to watch that transition from uncertainty to excitement. We’re all super happy with Coalesce and want to tell the world how good it is!”

“Once our engineers saw how Coalesce automates the tedious parts of programming while still giving them full control—especially when it comes to creating our own node definitions—their perspective [about using a low-code, UI-driven solution] shifted. It was very cool to watch that transition from uncertainty to excitement.”

Paul Nogle
Lead Data Engineer, Redwood Logistics