You can be on Entrepreneur’s cover!

Instead of Dreading a System Crash, Schedule One and Learn to Avoid Them The best defense against outages is to rehearse for the worst and accept real incidents as an opportunity to improve.

By Nisha Ahluwalia

entrepreneur daily

Opinions expressed by Entrepreneur contributors are their own.

According to a survey by CA Technologies, companies in North America and Europe lost more than $26.5 billion in revenue due to downtime, and that's from 2010!

There are various ways to calculate the monetary cost of system outages but the damage to a company's reputation is immeasurable. When Microsoft's Azure cloud-computing service experienced a major outage recently, experts speculated that it could be a major blow to the software giant's attempt to compete against rivals Google and Amazon.

Related: Safety Dance

Good CEOs and CIOs refuse to accept excuses for even small levels of downtime but it's not easy to hit five nines of reliability. Nonetheless, no matter how complex a company's systems and business, there are always ways to engineer and deliver higher reliability and quality of service. Below are the actions that CEOs need to take to boost their company's reliability:

1. Stop waiting for an outage. Create one.

If you wait for a customer to do something that causes a failure, you're too late. For example, Netflix has tackled unexpected outages using their "Simian Army," a set of automated tools that test applications for failure resilience. However, for most companies, the best way to handle this is to keep it simple.

Encourage your ops and dev teams to schedule a recurring meeting and create outages manually. Injecting failure reveals implementation issues that reduce resiliency while proactively uncovering deficiencies that would otherwise be the root cause of an outage.

Scheduled outages build a strong collaborative culture simply by bringing teams together on a regular basis. Working together to fix artificial failures will combat the idea that an actual failure can be ignored or justified with explanations.

2. Create (and protect) time for learning

No good engineer fixes the same problems without learning in the process. Make sure the teams responsible for resolving incidents have time to work through comprehensive postmortems.

Empower your teams to analyze what worked and what didn't, without forcing them to determine a root cause. All too often, human error is the focus of these conversations but that just isn't healthy. Blameless retrospectives allow teams to uncover the real issues and make proactive adjustments.

Businesses want to move fast but resist the temptation to move onto other issues when systems resume running or when everyone agrees on a "root cause." Invest the time needed to understand how your systems and teams work. See it as an opportunity for the contextual learning needed to make real-time decisions that will improve your company's mean-time-to-resolution.

Related: Does Your Website Have a Crash Plan?

3. Treat your ops and dev teams like sales and marketing. They drive revenue.

If you didn't support your sales teams with tools, training and incentives to hit their goals, people would think you were nuts. Despite their critical role in ensuring your customers are getting value from your company, ops and dev teams often get less attention than their customer-facing counterparts.

Give these employees the infrastructure and tools to achieve peak performance. That includes the latest operations management tools, time and resources for training and goals with incentives to meet them. If you don't provide them with necessary support and recognition, how can you expect them to deliver a high-value product with high availability?

4. Set a high bar for uptime

Even short periods of downtime have a material impact on your bottom line and market perception but once you're committed to supporting your engineering teams, you're in a much better position to set a higher bar for uptime. Build, buy or partner to get the technology and skill sets you need.

Unfortunately, many companies still use homegrown operations management systems without redundancy, and still use disparate tools and manual processes to meander through the incident lifecycle. A focus on reducing ops team costs instead of setting the right culture from the start simply doesn't make sense. The time spent on fixes alone will quickly become a greater cost for your company. Your product and services will suffer as a result.

CEOs who understand the importance of reliability in today's always-on world don't wait until there's an outage to improve operations. They don't ignore the rich learning that come from resolving incidents. They don't treat operations and development teams like the "back office." The CEOs of highly reliable companies invest in their operations infrastructure, processes and people because they care about the growth of their business and the loyalty of their customers.

Related: Go Daddy Outage: What You Can Do If Your Web Service Provider Goes Down

Nisha Ahluwalia

Vice President of Marketing at PagerDuty

Nisha is vice president of marketing, responsible for all things marketing including generating demand, building the PagerDuty brand and our community activities. She comes to PagerDuty with strong software-as-a-service experience, having built and managed several marketing functions at RingCentral and Cisco WebEx. Before she got into marketing, Nisha got her bachelors of science in Computer Science from San Jose State University.

Want to be an Entrepreneur Leadership Network contributor? Apply now to join.

Editor's Pick

Growing a Business

To Achieve Sustainable Success, You Need to Stop Focusing on Disruption. Here's Why — and What You Must Focus on Instead.

Instead of zeroing in solely on disruptive innovation, embrace a pragmatic approach to innovation, recognizing and leveraging the potential within ongoing industry shifts.

Business News

Mark Zuckerberg Says This CEO Is the 'Taylor Swift' of Tech

Meta's CEO posed with Nvidia CEO Jensen Huang on Instagram Wednesday.

Real Estate

3 Emerging Trends Shaping the Future of Real Estate

These three innovations are reshaping the real estate industry — discover tips for effectively covering these trends.

Leadership

What We Have to Gain By Talking About Grief and Loss At Work

I lost my husband to cancer during Covid — here's how it changed how I lead at work.

Side Hustle

This Mom Started a Side Hustle After a 'Shocking' Realization in the Toy Aisle. Her Product Was in Macy's Within the Year — Seeing Nearly $350,000 in Sales.

Elenor Mak, now founder of Jilly Bing, didn't plan to start a business — but the search for a doll that looked like her daughter inspired her to do just that.

Fundraising

Avoid These 9 Pitch Deck Mistakes When Asking Others For Money

Crafting an efficient pitch deck requires serious effort, but at least it's not wandering in the dark since certain rules are shaped by decades of relationships between startups and investors.