No Niche Newsletter
Posts
21 hours of hell cost this vendor $20m

21 hours of hell cost this vendor $20m

One vendors trash is another vendors treasure

Jordan M
November 19, 2024

In partnership with

I was driving a mile from my house.

I heard a strange, mechanical SCREECH sound interrupt the Spotify playlist. A sound that a car should not make. A very concerning sound.

Bad. Sound.

My car started veering ever so slightly to the left with my centre of gravity tilting like Jupiter, and then a low grumble joined in the chorus from my back left wheel, which built into a raucous crescendo, like a really shit Hans Zimmer soundtrack. And then - BANG.

There was no 4 dimensional celestial realisation brought together by 30 musicians. Just a blown out tire.

And £350 flying out of my bank account.

If you drive, you may have experienced:

A spluttering engine?
Random warning lights?
A strange sound from the exhaust?
A breakdown in an extremely inconvenient location?

Why am I talking about this? Well…

Buying and maintaining a car has similarities to vendors & client partnerships:

Buying a car	Partnerships
Browsing →	A buyer researching reputable vendors via market reports.
Test drive →	Running a small pilot with a vendor to build confidence.
Getting insurance →	A documented service outage plan.
Signing contract →	Onboarding a new vendor.
Servicing →	Ongoing upgrades to the service the vendor provides.
Accidents & repair cover →	Actioning the disaster recovery plan (for the blown tire).

History time!

In 2010, Navitaire, an IT outsourcing vendor, failed on 2 counts: insurance and repair cover.

The flight booking system they provided to Virgin Blue Airlines in Australia went down:

116 flights cancelled
11 days of disruption
$20m revenue loss

I will now deep dive into this, so that you can avoid the same tragic mistakes, costly bills, and create ultra strong partnerships with your buyers.

11 days of disruptions, 116 cancelled flights, and a big fat $20m settlement

What happened?

On September 26, 2010, a failure in Navitaire’s reservation system led to a 21-hour outage, impacting Virgin Blue's operations. The airline was forced to revert to manual check-in processes, resulting in flight cancellations and delays for jet setters.

Where did it happen?

Australia. 🐨

Why did it happen?

Entering technical jargon mode -

It was caused by a failure in the solid-state disk server infrastructure (?!) used by Navitaire to host Virgin Blue's reservation system. The system pivotal to flight bookings.

This disrupted Virgin Blue's internet booking, reservations, check-in, and boarding systems. The lack of an immediate failover system led to the cancellation of at least 116 flights and delays for thousands of passengers.

(English translation further below!)

solid state disk malfunctioning

Who was involved?

Client: Virgin Blue Airlines, an Australian airline.
Vendor: Navitaire, providing reservation and distribution systems to airlines.

What were the consequences?

Financial Impact: Virgin Blue estimated a pre-tax profit impact of between $15 million and $20 million due to the disruptions.
Settlement: In April 2011, Virgin Blue reached a "mutually satisfactory agreement" with Navitaire regarding the outages, with reports suggesting a settlement amount of up to $20 million.
Reputational Damage: The incident highlighted vulnerabilities in Navitaire's systems, potentially affecting its standing with current and prospective clients.

No but really, why did this happen?

Let’s brainstorm by asking WHY, 5 times:

Why did this incident occur?
- The repair of the system took (a lot) longer than anticipated
- This caused the booking system used by Virgin Blue to fail, resulting in the flight cancellation balls up
Why did the repair take longer than expected?
- A lack of proper disaster scenario planning, to test repair times and extra teams needed to fix them
Why was there a lack of planning?
- A few theories:
  - The vendor was unaware system outage could happen (unlikely)
  - They lacked expertise to run disaster testing (unlikely)
  - They lacked resources for disaster testing (slightly more likely!)
  - The vendor did plan for system outages, documented, tested it, but didn’t account for this specific scenario (more likely!!)
  - The client and vendor prioritised ‘shiny toys’ in their partnership (ROI, timelines, scaling) vs a potentially unlikely outage (likely!!!)
Why did they not account for this scenario? Why did they not prioritise testing more thoroughly?
- They had not experienced it before. ‘You don’t know what you don’t know’
- An outage of this nature seemed so unlikely, they decided not to spend significant time and money testing & preparing for it
- The perceived minimal chance of experiencing this outage was far outweighed by deploying fast, scaling, and handling increasing passenger demand, using tech provided by the vendor

This leaves us with 2 final questions..

#1: Why did they not see the huge downside to leaving this scenario unaddressed vs benefit of immediate goals?

Humans are hilariously biased when it comes to assessing risk vs short-term gains - a phenomenon psychologists refer to as present bias. Immediate rewards just feel more tangible and emotionally gratifying than addressing abstract risks.

Like a recurring warning light on your cars dash.

❝

Ohhh, it’ll sort itself out :)

Soon to be out of pocket car owner (me)

#2: Why did they not consult with experts who could advise of all possible outage scenarios and required plans of action?

Humans are trusting, especially when working with perceived authorities or experts - aka authority bias.

They probably believed the vendor had it all covered. They are the experts, after all?

One vendors trash is another vendors treasure

Big scary public service failures cost more than money.

Reputations. Disruption. Alienation.

But you can look at the past to skyrocket your team’s future.

gold dust

Vendors who proactively adopt the below can stand out to enterprise decision makers:

Clear & honest mindset on worst case scenarios.
Putting value in getting a neutral opinion or consultation on the possible risks of your services to clients.
Proper, clear documentation so that in the event of service failure, your company and the client know exactly what will happen, when it will happen, and by who.

How does this look in real life? Let’s test it in the..

🔥 Practise Corner! 🔥

You’re in a discovery call…

You make the topic of service failure planning a core topic
You ask the prospect what their internal process for these events
You get an intro to their Head of Risk and Regulatory Compliance

You’re in a scoping call…

You bring an expert from your team to talk through information security & control, and welcome additional questions
You don’t just present an arbitrary framework in a slide. You actively talk through times this has been utilised in the past, and that you want to build this with them

You’re preparing documentation.

People forget stuff - you know this, they know this. So you solve it.
You send a 1st draft of a contingency planning document with R&Rs and a clear plan of action - you want feedback from their internal risk leaders

Now lets to apply this to your outsourcing business.

#1: Talent Staffing leaders who plan for staff shortages..

Backup Talent Pipelines: Build pre-vetted databases for quick hiring.
Cross-Training: Prepare teams to handle multiple roles in emergencies.
Tech Integration: Use AI tools to predict gaps and streamline recruitment.

#2: Business Process Outsourcing leaders who prioritise data security..

Verification Systems: Automate double-checking for data transfers.
Access Control: Restrict sensitive data access based on roles.
Staff Training: Educate teams on secure data handling practices.
Incident Response: Have clear plans for data breaches and quick recovery.

#3: System Outsourcing leaders who want to prepare for all possibilities

Disaster Testing: Run quarterly mock outages to identify risks.
Backup Systems: Maintain redundancy to ensure uptime during failures.
Real-Time Monitoring: Use tools to detect and resolve issues instantly.
Proactive Communication: Notify clients of risks or maintenance in advance.

I’ll leave you with this -

The #1 question I ask myself before working with people: are we super clear on what an excellent, secure, and future proofed partnership looks like?

That’s me done. Enjoy your Tuesday.

Want to brainstorm some more? Book a call!

Oh, also - check out the sponsor of today’s newsletter 👇🏻

Learn AI in 5 minutes a day

This is the easiest way for a busy person wanting to learn AI in as little time as possible:

Sign up for The Rundown AI newsletter
They send you 5-minute email updates on the latest AI news and how to use it
You learn how to become 2x more productive by leveraging AI

Reply

or to participate.