Case Study: Forensics and Rightsizing Combined

76% faster builds overnight turnaround restored

Case Study: Forensics and Rightsizing Applied to the Same Pipeline

Scope note

This case concerns a single pipeline at one client, a large monolithic compliance build typical of regulated enterprise environments. The headline numbers are for that one pipeline, not a fleet average. Results on this scale are not typical; most pipelines do not start at 67 hours. I include this case because it shows how the two optimizations compound when applied in sequence on the same target.

Starting point

The workload originated at one company and was being migrated into another company's cloud infrastructure. The two organizations had different network topologies, different naming conventions, and different operational baselines. On the originating company's on-premise hardware the compliance-critical build had run in 35 to 42 hours. After the handover into the receiving company's Azure environment, the same build took 67 hours. The regression was reproducible across runs. The pipeline was called "nightly" but no longer finished within a night, so it had to be rescheduled to run every second day to prevent consecutive runs from overlapping.

Step one: Forensics

I instrumented the longest pipeline phase and correlated the timings against network, name-resolution, and infrastructure metrics. The build agents themselves were healthy. The added wall-clock time was traced down to operations that performed repeated outbound lookups against endpoints which had not been re-homed cleanly during the cross-company migration.

The root cause was a DNS misconfiguration on the receiving company's side that was not present on the originating company. Each lookup incurred a timeout-and-retry cycle, multiplied by the number of lookups per build stage. The cost compounded across a long monolithic build.

After the DNS correction, pipeline duration dropped from 67 hours to 29 hours, already faster than the original on-premise baseline at the originating company.

Step two: Rightsizing

With the build now healthy, I profiled resource utilization across the remaining 29 hours. The workload was heavily single-threaded and bound by clock speed, not core count. The provisioned hardware was oversized on cores and under-optimized on frequency.

I changed the hardware profile to one with fewer cores but a higher clock speed, matching the shape of the workload. The same rightsizing playbook applied in the separate rightsizing case study, reapplied here on the now-healthy pipeline.

Pipeline duration dropped from 29 hours to 16 hours.

Result

End-to-end: 67 hours to 16 hours, 76% faster overall
Forensics alone: 67 hours to 29 hours, 57% faster
Rightsizing on top: 29 hours to 16 hours, a further 45% faster
Compared against the pre-migration on-premise baseline of 35 to 42 hours at the originating company: more than 50% faster than before the cloud move
Cloud cost per run reduced proportionally by the shorter runtime plus a lower hourly hardware rate after the rightsizing change
Scope: one pipeline, one client, two optimization stages applied sequentially
Total timeline: root-cause investigation first, hardware rightsizing on the now-healthy build afterwards

Ready to find out what is slowing you down?

Book a 30-minute diagnostic call. No pitch, just questions about your setup and an honest assessment of whether I can help.

Book a diagnostic call

Connect on LinkedIn