Hackday XIX: Ship Happens! Winning Project - Git Luigi
Every year at Caplin Systems, there’s a day (two days, technically) when the usual rhythm of tickets, releases, and sprints is paused. Instead, engineers, designers, product managers, and anyone else who wants to join, participate in one of the company’s most anticipated traditions: Hackday.
An Annual Caplin Tradition
Hackdays have been running for years now, and always have a theme, are always playful, and are always a little chaotic in the best way. Past editions have had iconic titles like “The Fast and the Furious” - focusing obviously on the speed of project delivery, builds and tests; “Bringing It All Together” - focusing on integration as Caplin products are pieces in a vast ecosystem; or “Smarter Caplin” - aiming to develop ways of anticipating changes in clients and technology or software.
This year’s theme was Hackday XIX: Ship Happens! - a decisive perspective shift to delivery, deployment, and all potential improvement of shipping Caplin software to our clients. The ideas generated were quite ambitious given the timeline, but still very creative, including turbocharging our CI pipelines or AI-powered debugging tools.
The premise is simple: 24 hours, solution freedom (within the theme topic), and teams experimenting, building, and breaking things in pursuit of the Hackday Glory!!!
But Hackday is about much more than technical experiments…
It’s about breaking out of your usual roles, from junior developers with well-scoped day-to-day work to seniors who usually carefully refine and architect our tools. Hackday flips all roles: suddenly, every engineer, QA, product owner or SLT member is at the same level, starting from scratch with nothing but an idea and a ticking clock. Hackdays are not at all about following a Jira or even a plan; it’s about asking, “What could we build that we wish existed to help our daily work?”
And that freedom is exactly what makes Caplin’s Hackday so powerful. On paper, there’s little direct incentive for a company to pause deadlines and dedicate two days to experiments that might never see production. But Caplin recognises the indirect value: the team building, the creativity, the sense of ownership, and yes, the sheer pleasure of passion-fuelled development. Even when projects don’t ship (pun intended), they still have value, they start conversations, prove concepts, and sometimes are the seed that later grows to become internal tools.
Hackday isn’t the only way Caplin invests in this culture, either. Every month, we have a Dev Day, where engineers step away from sprint work to focus on process improvements, proof-of-concepts, or anything that could make their skillset, day-to-day experience or Caplin as a whole, better. Mostly, these are small, solo projects though sometimes they grow into collaborative experiments. Together, the small scale Dev Days and annual Hackdays send a clear message:
Caplin trusts its engineers to do what we’re best at: solving problems and finding creative solutions.
Hackday XIX: Ship Happens!
This year’s Hackday was no exception. We had a wide variety of ideas: from Kubernetes deployment tools to AI release notes; from speeding up test pipelines to taming the Cypress-to-Playwright test suite migration. Each of the projects tackled real pain points with a sprinkle of fun. And of course, this culminated in a showcase of everything teams built in the 24 hours, complete with prizes and bragging rights (conditional on writing a blog post).

Team - Super Pipeline Plumbers
Our Hackday team consisted of three engineers with well-balanced talents. Lewis, our senior engineer, leads much of the CI improvements in his daily-work and faces the struggles of diagnosing pipelines firsthand. Velina often engineers solutions for GitLab CI initiatives such as vulnerability scanning and has long felt the need for better diagnostic tools. And Akesh is an amazing fellow developer with a talent for untangling complex problems and driving change with persistence and enthusiasm.
Ordinarily, we are part of different teams, but Hackday gave us the chance to come together in solving a common problem with fresh perspectives and shared enthusiasm. Right from the start, the energy was high. We initially decided that we would be reasonable and limit ourselves to working hours… but soon enough we got hooked and threw ourselves into the project completely. We ended up treating the event as a proper hackathon - with long hours, late-night coding, plenty of coffee and snacks to boot.
Identifying a Problem Niche
When choosing our problem space, we didn’t have to look far. Working with our extensive test suite, one of the constant challenges is dealing with flaky tests, failed jobs and pipelines in GitLab. Every night, we have scheduled pipelines of comprehensive tests of the latest master branch. For engineers, each morning starts with checking the build-problems channel for the latest status update. At Caplin, the development teams take turns with the BuildCop role of turning any red failures to green and further investigating some core questions: why did the jobs fail, is there a flake we need to resolve, was it another resource dependency and how do we remedy the causes?
GitLab gives us a pipeline-level view, which is useful, but a little limited. What it doesn’t provide is a way to aggregate data across pipelines at the job level. That means if you want to track the behaviour of a particular job, for a example a test that unreliably fails at random, you can’t easily see how often it fails, when it started failing, or how its performance compares to other runs of the same job over time. Instead, you have to dig through individual pipelines one by one, which is both time-consuming and frustrating. Depending on the running conditions of a certain job, one might struggle to even find other pipelines that contain it.
An Aggregate Pipeline
To fill that gap, our idea started as an aggregated heat map view of jobs across multiple pipelines. Instead of thinking about pipelines as isolated job runs, we would combine many pipelines over a pre-selected time period and flatten these into a view containing just distinct job types with performance details.
How does this work?
An aggregated pipeline should contain all occurrences of jobs across all the selected pipelines.
So if we have Pipelines A and B (in Figure 1) with a selection of test, build and deploy jobs, the aggregate pipeline should contain one unique entry per job type by aggregating multiple instances’ metrics into one. As an example, the average duration of the deploy job type is 3 minutes across two runs with durations 1 and 5 minutes respectively. Other metrics would include run frequency and various performance indicators.
From there, we planned to apply a heat map colour scheme to make patterns instantly visible. Metrics like success rate, mean and standard deviation of duration or flakiness score could be represented as colour intensities, so BuildCop engineers could immediately spot problem areas without digging into endless logs.
To make sure we were all aligned, we sketched some rough designs up-front, mapping out what the aggregated dashboard might look like and how interactions between pages or back to GitLab would flow. This gave us time to iterate over our vision and anticipate flaws early on, even if the final version evolved into something wildly different as we hacked away.
Introducing GitLuigi
Our name was inspired by the infamous, taller Mario brother, whose pivotal role in fixing green pipelines cannot be understated. And our goal, in short, was simple but impactful: turn the noisy, pipeline-centric GitLab data into a clear, aggregated, job-focused view that engineers can actually use to debug, track, and prioritise test improvements. We focused our efforts on three key features:
-
Aggregated Pipelines - Instead of showing one pipeline at a time, GitLuigi pulls data from the last 30 runs of the master branch and merges them into a single, unified entry. This scope gave us a manageable dataset while still revealing meaningful patterns over time for a PoC.
-
Heat Map Visualisation - Jobs are displayed per stage, colour-coded by one selected metric, such as success rate, mean duration, or standard deviation of duration. The colour scheme provides an immediate visual signal of where problems exist, while all three metrics’ values are always visible on the job type cards as well. To keep things consistent, we added a legend that splits values into top and bottom third bands, so you can quickly tell which jobs are “hotspots.”
-
Job-Type Statistics Pages - Clicking on a job type card brings you to a dedicated page showing its aggregated statistics across multiple pipelines. Better yet, we included a list of the job’s historic occurrences across all pipelines. This was the missing view we had been craving in GitLab, and it turned out to be one of the most powerful aspects of the tool.

Beyond the feature set, we wanted to make sure the app was intuitive and usable, and thus incorporated the following aspects:
-
Navigation Breadcrumbs - making sure users can always orient themselves, seeing the bigger picture after diving deep
-
GitLab Coded - mirroring GitLab’s naming and URL format and directly linking each job source for seamless integration
-
Stage Groupings - easy navigation, and logical job order to contextualise while surfacing new insights
-
Badges - displaying metrics and the number of job instances aggregated on each job type card
-
Dark theme - satisfying most developers’ theme preference, with a dark background with bold colour-coded cards
Our tech stack and implementation was straightforward: we built GitLuigi using Next.js, taking advantage of its easy spin up, speed and flexibility. We integrated directly with the GitLab API (/pipelines and /jobs) to pull the data we needed, and implemented a simple caching layer to keep client responses fast while still working with the latest data.
For the core functionality of the tool, we wrote our own metrics and job type aggregation logic, reshaping the raw API responses, arranged by pipelines, into meaningful structures which were instead arranged by job type with their corresponding statistics.

Our biggest challenge was, unsurprisingly, time. With less than 24 hours, some of which was inevitably lost to food, commute, and the small matter of sleep, we had to cut corners, simplify our planned architecture, and take shortcuts we’d never allow in production. Features like the flakiness score or job log AI summary stayed as stretch goals, while we focused on delivering the essentials.
In the end, we had produced a fully functioning prototype for exploring metrics over an aggregated pipeline and a job-type view that GitLab simply doesn’t offer.
Stretch Goals or What’s Next?
Even though we managed to get GitLuigi working for the demo, we knew we were only scratching the surface of what it could become. Most of our time went into building the Heat Map home page, while the Job Type and Job Instance pages ended up as simple proof-of-concepts. Naturally, these would be the first features we’d want to improve and expand.
For example, here’s the barebones Corporate-Version Job Type page that leaves something to be desired.

Some of the ideas we didn’t manage to implement which are now future plans, include:
-
AI summaries of job logs using Gemini to get quick, natural-language explanation of steps and failures
-
Live updates for GitLab data
-
Filtering which pipelines to aggregate
-
Caching for faster and more scalable persistence (replacing our simple cache)
-
UI improvements to make the dashboards more intuitive
-
More statistics metrics
Our shortest-term priority was deploying GitLuigi in a stable state to make it a useful internal tool. Riding on the adrenaline of the hackday, Akesh actually got a minimal deployment functioning right after the Hackday demos.
What’s next? Our next priority is actually not more features, but a complete architectural refactor. The Hackday prototype was built quickly, with plenty of shortcuts, and now we want to revisit it with maintainability, scalability, and performance in mind. That means cleaning up the rough edges, rethinking our data flow, and making proper use of Next.js’ server-client rendering model. In short, we want to replace quick hacks with best practices so the tool truly has a solid foundation to grow on.
Demos and Awards Ceremony
Fast-forward to 12pm on the final day, the coding had just concluded and teams were in the midst of preparing their exciting demos. Each team had only ten minutes for presenting and questions. The energy was electric, the creativity on display was inspiring and the senior leadership team took on the difficult role of judging each innovative entry.
All the different teams had focused on different parts of the release process. To name just a few, projects included a vibe-coded automation tool for upgrading environment image versions, using playwright to generate documentation screenshots and exploring the use of cloud-native GitLab runners to parallelise test suites. The performance improvements illustrated were remarkable even as proof-of-concepts.
Two prizes were awarded: Overall Winner, and Community Award by company vote. The entries were judged on relevance to the theme, the real-world problem being solved, and how much was achieved in just 24 hours. Standing out and winning the Community Award, was a one-woman-army colleague, who built an in-house LLM model, tailored specifically to trading FX and Commodities within our application. As for the Overall Winner, we were beyond thrilled that GitLuigi impressed the judges and was awarded this prize - especially exciting because it was the very first Caplin Hackday for two of us.
Thank you all again!