Observability’s Role in Creating Resilient Cyberinfrastructures

Proper observability encompasses best practices such as multi-factor authentication, encryption, and employee training to mitigate phishing emails.

Krishna Sai

June 9, 2025

5 min read

More AI tools have been democratized, lowering barriers to entry for today’s threat actors. Phishing scams and social engineering have become increasingly sophisticated, putting more organizations at risk of unauthorized system access.

The Skinny

Cyber-resilience hinges on observability: Modern cyber-resilience is no longer just about defense, but also about proactive visibility, leveraging real-time data, analytics, and AI to detect and respond to IT disruptions swiftly.

Hybrid IT and AI adoption increase complexity: With the rise of hybrid/multi-cloud environments and widespread use of generative AI, IT infrastructures are more complex and more challenging to monitor, making observability more critical than ever.

Fragmented tools undermine resilience: Disconnected observability systems and siloed response tools reduce efficiency and response time. A unified, integrated observability strategy enables organizations to remediate issues faster and reduce downtime risk.

Across the span of my career, I’ve noticed new buzzwords entering the collective lexicon every decade or so. The idea of ‘cyber-resilience’ has gained momentum and describes how organizations should protect their IT systems to avoid costly downtime or limit disruption of critical services. A recent report from Splunk reveals that for Global 2000 companies, the cost of downtime is estimated at US$ $400 billion annually, or about 9 percent of profits. At the same time, larger enterprises may have the resources to recover from significant losses resulting from IT outages; small to mid-sized companies often do not. Even beyond cybersecurity, operational resilience is key to those organizations that stand the test of time.

Although more industries are embracing the concept of resilience in their IT systems, the scenario continues to evolve in response to the changing threat landscape. It’s no longer enough to have technology or best practices that help secure the IT infrastructure. Instead, organizations must now develop a comprehensive and proactive approach that places observability at the foundation of a resilient, layered IT system.

A Refresher on Observability

Although there are slightly different ways to define the term, I consider observability to be the capacity to gather insights, analytics, and actionable information through both real-time and historical metrics, logs, and trace data. A modern observability function should be able to collect these insights using multi-domain data correlation, machine learning (ML), and AIOps. The ultimate goal is to have the clearest picture of your IT systems through the outputs from the observability function.

Observability and Today’s IT Landscape

So, what does this have to do with today’s IT threats? For many businesses, their current IT environment is more complex than it has ever been. Due to the world’s growing dependence on digital solutions and workflows, IT environments are larger than ever. We’ve come a long way from the massive migration to the cloud during the height of the COVID-19 era. In a June 2024 International Data Corporation (IDC) report, about 80% of respondents said their companies were planning to grow IT environments to larger levels than ever due to the world’s increasing dependence on digital solutions and workflows requiring some level of repatriation—or moving workloads from public clouds to on-premises data centers—within a year. This suggests many companies may now be deploying a hybrid or multi-cloud strategy, which makes it harder to monitor each area of an IT environment.

Due to the world’s growing dependence on digital solutions and workflows, IT environments are larger than ever. We’ve come a long way from the massive migration to the cloud during the height of the COVID-19 era.

Organizations’ IT environments are also leveraging AI more than ever before. A McKinsey survey in March 2025 indicates 71% of companies use generative AI in at least one business function, up from 65% in 2024. This demonstrates that companies have more automated workflows in their IT environment.

While the onset of greater scale, complexity, and automation in IT systems points to a boon in innovation, a positive development for many organizations, it comes at a time when the current threat landscape is evolving.

More AI tools have been democratized, lowering barriers to entry for today’s threat actors. Phishing scams and social engineering have become increasingly sophisticated, putting more organizations at risk of unauthorized access to their systems. This could result in a greater chance of system downtime, loss of revenue, or even damage to the brand's reputation. Data from a recent SolarWinds public sector survey shows specific public organizations are concerned about external threats and internal company security practices. According to the study, 58% of respondents expressed concern about cybersecurity mistakes made by ‘untrained insiders.’

More entry points make your IT environment vulnerable. You need an observability approach that can respond quickly and mitigate breaches. This will help build the resilient systems today’s businesses need.

The Right Approach to Observability and Cyber-Resilience

When done correctly, your approach to observability should look like a well-run hospital. Think about the halls of a busy emergency room (ER). Once a patient comes into the ER, it’s not enough for the doctors and nurses to diagnose the issue. They must respond quickly and accurately, determine if an operating room is available, assess the number of personnel required to treat a particular patient, and determine whether a patient needs to be seen immediately. The way an ER works—quickly and with purpose—exemplifies a resilient system that can handle each problem.

Specific organizations’ approach to observability can be disjointed, with one observability solution used to diagnose unusual activity and another used to address it. This is like someone with a common cold visiting the ER, being diagnosed with pneumonia, and sent to a hospital two blocks away for treatment. Hampering resiliency further, an organization may have multiple, disconnected observability tools to support its on-premises and cloud environments, leading to increased confusion.

Instead of taking a comprehensive approach to observability, you can limit the mean time to remediate (MTTR) and quickly improve the health of your IT system. The right observability solution will have integrations for both on-prem data centers and cloud solutions, along with the remediation services necessary to solve IT issues. This also helps prevent silos in incident remediation—silos can lead to an uncoordinated response and a worse attack outcome.

Proper observability encompasses best practices such as multi-factor authentication, encryption, and employee training to mitigate phishing emails. When you establish a comprehensive observability function, you can quickly identify and address system issues, minimizing the time it takes to recover from operational disruptions. The actual test of cyber and operational resilience is to speed recovery and reduce the impact of an incident.

About the Author

Krishna Sai

Chief Technology Officer, SolarWinds

Krishna Sai is Senior Vice President and Chief Technology Officer at SolarWinds, leading the company's engineering, technology, and architecture teams. He is a seasoned leader and entrepreneur with over two decades of experience scaling global engineering teams and building winning products across multiple industries. Sai has held leadership roles at Atlassian, Groupon, and Polycom, co-founded two technology companies, and holds several patents.

17 Questions to Ask Your Vendors at GSX 2025

AI Governance: The Critical Path to Security, Compliance, and Competitive Advantage

Sponsored

Say Goodbye to Paper: The All-in-One Solution for System Integrators

Sponsored

Observability’s Role in Creating Resilient Cyberinfrastructures

The Skinny

A Refresher on Observability

Observability and Today’s IT Landscape

The Right Approach to Observability and Cyber-Resilience

About the Author

Krishna Sai

Chief Technology Officer, SolarWinds

Related

17 Questions to Ask Your Vendors at GSX 2025

AI Governance: The Critical Path to Security, Compliance, and Competitive Advantage

Say Goodbye to Paper: The All-in-One Solution for System Integrators

The End of Guesswork: Perfect Camera Placement, Every Time

Voice Your Opinion!

To join the conversation, and become an exclusive member of Security Info Watch, create an account today!

Trending

Milestone Advances AI Strategy With Generative Plug-In for XProtect

City of Lubbock Repeals Alarm-Permit Ordinance Amid Public Backlash

Safeguarding the CISO: Executive Liability Protections in an Era of Cyber Accountability

Sponsored Picks

Are Manual Site Surveys Costing You?

The End of Guesswork: Perfect Camera Placement, Every Time

Say Goodbye to Paper: The All-in-One Solution for System Integrators