The Challange of Identifying and Resolving Outages

You're reading Entrepreneur India, an international franchise of Entrepreneur Media.

Businesses well understand the importance of providing seamless customer journeys, but we've seen a growing spate of digital service outages and software performance problems in recent months. While some of these problems have been minor inconveniences, with the likes of online video streaming services or social media sites going down, others have caused far more serious concern. There have been online banking outages that have left customers unable to pay bills on time. Problems with major payments systems have left shoppers unable to use their bank cards at the checkouts. Even the daily commute has been impacted, with rail ticketing website outages leaving people unable to buy a ticket to travel. These problems all seriously disrupt peoples' ability to live their day-to-day lives, so they're becoming a growing concern for businesses and consumers alike. So, if businesses understand the importance of preventing these scenarios so clearly, why are they happening more often?

Converging complexity

The soaring complexity of technology ecosystems is the biggest contributor to the rise in service outages and software performance problems. Modern digital services reside in complex hybrid multi-cloud environments, spanning multiple platforms and technologies. They're powered by applications running in dynamic microservices and containers, creating constant change. A single web or mobile transaction now cross an average of 35 different technology systems or components, compared to 22 just five years ago. With digital transactions crossing such a diversity of components in a dynamic technology stack, it's gone beyond human capability to manage performance effectively. They struggle to maintain visibility into everything that's happening in their environment and to find the root cause of any performance problems that arise quickly. It's moved beyond finding a needle in a haystack, to finding a needle in a thousand haystacks in a hurricane.

Unfortunately, this trend is showing no signs of reversing or even slowing. Digital ecosystems are becoming even more complex, and IT teams are under more pressure than ever to quickly identify and resolve the root cause of any problems that arise, before customers feel any impact. If they fail to do so, the spate of digital performance problems and service outages that we've seen recently will only become more pronounced and occur more often. This will become increasingly critical with the advent of driverless cars and connected medical devices, which could wreak major damage if they're impacted by performance problems.

Overcoming the outage obstacles

There's a number of reasons why it's become impossible for businesses to manage the complexity of their digital ecosystems manually. Firstly, new technologies, infrastructure and platforms are constantly being layered onto IT stacks, requiring more monitoring tools to provide visibility and enable IT, teams, to manage performance. However, the digital ecosystems that have arisen around these IT stacks are also highly dynamic. Whilst this creates the agility that businesses need to thrive, it also makes it impossible for humans to stay on top of performance using traditional monitoring tools, which were built for static environments.

On top of this, these traditional monitoring tools are bombarding teams with alerts, most of which are just white noise. But understanding what's white noise and what's important is time-consuming – time that most organisations simply don't have. Given that it's impossible for humans to overcome this challenge manually, organisations need to be able to automate as many IT operations processes as possible. They need the ability to automatically detect issues in real-time and, most importantly, use AI to pinpoint the root cause with precision. These capabilities can also help organisations onto the path of auto-remediation, so their monitoring system can detect problems and apply fixes to prevent or resolve the issue before it escalates into a full-blown outage. This, in turn, will take the pressure off IT teams, enabling them to focus on driving innovation rather than spending endless hours in war rooms to determine where a performance problem is stemming from.

No turning back

While moving to the cloud has made businesses far more agile, it's added exponential complexity to their digital ecosystems. This has had a huge impact on organisations' ability to successfully monitor performance and rectify any issues quickly and efficiently. We've already seen an increase in the regularity of digital performance problems and service outages impacting on businesses and their customers. AI is crucial to combat the problem. It can make the process of detecting and rectifying software performance problems much faster and more effective. Ultimately, this will enable IT, teams, to provide more consistent and positive user experiences, relegating the nightmare of major outages and late-night war rooms to the past.

Michael Allen, Dynatrace's EMEA Sales VP and regional Chief Technology Officer, has successfully tackled the challenges of building and operating pan-European product line focused sale organization for over 17 years. With a proven track record in the development and management of profitable revenue growth.

At Compuware, Michael is responsible for pan-EMEA sales and technical sales teams for both direct and partner led business. In addition, Michael has been involved in strategic development and direction setting of Dynatrace's APM offering and directly involved in many of Compuware’s historical key acquisitions (DynaTrace was formally Compuware's APM business Unit).

Under his 17 years of stewardship, Compuware’s European Application Performance Management business unit (now DynaTrace) has derived high double-digit growth year on year.

Michael is a well-known and talented industry keynote speaker and is a regular commentator in the IT and business publications. He has presented at over 480 industry events in last decade, and is regularly addressing audiences in excess of 200+ people.

Over the years, Michael has built and enabled numerous strategic commercial relationships with OEM partners, services providers (e.g. Infonet / BT; CSC; Fujitsu; Easynet; Orange Business Services; ATOS) and with value added resellers and distributors across EMEA.

Before joining Compuware in 1998, Michael was the manager of global high touch premier customer engagement program at Madge Networks as well as maintaining Madge’s technology alliances, with the like of Dell, Microsoft and Compaq.

The Anatomy of an Outage – How Software Performance Problems are Affecting the Digital World AI is crucial to combat the problem as it can make the process of detecting and rectifying software performance problems much faster

Most Popular

70 Small Business Ideas to Start in 2025

Creating a Brand: How To Build a Brand From Scratch

It's Time to Rethink Research and Development. Here's What Must Change.

How to Better Manage Your Sales Process

AI Agents Can Help Businesses Be '10 Times More Productive,' According to a Nvidia VP. Here's What They Are and How Much They Cost.

Passion-Driven vs. Purpose-Driven Businesses — What's the Difference, and Why Does It Matter?