Nearly everyone has spent some dreadful time on the phone with a technical support engineer trying to troubleshoot and determine why your computer product that once worked perfectly, is now not working properly and you have no idea why. They typically walk you through the same basic steps that you have already tried — such as turning the system off, then turning on again — and if they can’t fix it, they then take you through an endless loop of departments and support persons. When that does not work, they have you ship the product back for repair.
If you think troubleshooting was difficult for a simple piece of software, diagnosing issues with your network is exponentially more complicated. Network problems need to be isolated, because root analyses need to be determined, as well as the impact to the network’s users.
Thanks to the rapid proliferation of IP-based security equipment, security professionals are being thrust into the dual role of physical and IT security manager — which should make troubleshooting strategies essential learning material. Even with the most complex networks, there are troubleshooting strategies that can make your life easier.
Being proactive about knowing how your network is configured is important. There are many tools in nearly every IT professional’s toolkit that can do that for you. One is a baseline analyzer known as the Belarc Advisor (http://belarc.com), which does a deep interrogation into the computer hardware and provides information about the inventories on the hardware, software, firmware, and detailed configuration parameters such as system name, IP addresses, hard disk space available, etc. This tool can also provide information as to what patches have been installed and license keys used to activate software. It can be set to run on machines at periodic intervals, or collected from a single point in time if the configurations are not expected to change.
Another tool (this one is free), is known as Nmap (http://nmap.org), which generates a graphical map that performs electronic discovery of a network, and provides a detailed representation of how the network looks in an easy-to-read format. The software also performs security analysis, and can be customized to enable the addition of field and custom descriptions that can help identify how one system interacts with others.
Network Monitoring and Management
Network Monitoring can be managed from an easy-to-use dashboard that can show indicators relating to server health and network communications errors, and creates logs that collect information from various devices on your network.
One such tool is “What’s up Gold” by Ipswitch (www.ipswitch.com) — a tool that can send alerts via email, SMS, and other devices if communications between devices fail. The software can, in some instances — such as in a disk near-full condition — provide a pre-failure notification based on user configurable options.
Isolating Network Failures
When a network problem is experienced, how does the IT security professional hear about it? In small networks, the notification is usually from an end-user who either calls IT directly or enters a request into a help desk system which generates a trouble ticket. In larger network environments, it is usually an automated alert which provides advisory notifications and electronic maps which resemble traffic control rooms with a series of green lights that turn yellow or red which caution when your outage conditions occur.
Regardless of how IT gets the information of a problem, the first job is to provide service resumption of the system back to its normal state as quickly as possible — so to impact the user community as little as possible. Some of the questions IT will need to find out will include:
• When did the problem start?
• Was there a recent configuration or state changes that could have impacted the system?
• What were the baseline configurations of the system prior to the error?
• Is there a means to rollback the system to the previous system configuration to the pre-error state?
If IT reviews these questions and cannot determine meaningful information as to the cause of the analysis and restore the system, they must then resort to problem analysis to determine the root cause of why the event occurred. Valuable information is often collected when the IT department documents the issues and resolutions so that if they ever occur again they can follow a systematic process of reaching a resolution quickly.
IT Security Network Troubleshooting: Six Steps
As there are many users on the network, a fine line of which services take priority and which do not are often calculated in the strategy for how to troubleshoot a network. The first step is to define the severity of the incident — usually classified by major, moderate and minor. These classifications range from a “major” service interruption that disrupts core operations, to “minor” issues which may only affect a single user.
Many aspects of network troubleshooting have Service Level Agreements (SLAs) that determine which actions need to be taken if the solution cannot be achieved within a specific time period, and at which point the issue is required to be escalated and notification made to upper management.
There are six fundamental steps to troubleshooting:
Step One: Physically inspect the problem devices if possible to make sure that power, connections and devices are properly connected and powered. This often is the basic and easiest problem to resolve. If you cannot see any cabling or connection/power issues, the next action is to restart the devices. Be sure to watch the system carefully as quite often the source of error will be displayed during the device startup process.
Step Two: Analysis troubleshooting requires being able to associate what is causing the problem so that it can be resolved. Questions to be asked may include: is the device too hot or too cold, does it exhibit symptoms which are different than other systems, and can the problem be isolated?
Step Three: Now, you must delve deep into your troubleshooting ninja skills. You must be able to use all available information and to determine a theory on what caused the issue. Using available network logs, documentation and recreating steps that can cause an error would be based on theories that make sense to determine a root cause analysis. The best way to prove a root cause analysis is to test the theory by duplicating or imitating the conditions that started the problem. If this can be done without creating additional issues on your network, this is a great practice. Having a lab to test the theory is invaluable to recreate scenarios that cause IT problems.
Step Four: Isolate the trouble components by putting them into various classifications and categories to develop theories and perform testing based on hardware, software, peripheral or configuration issues. This phase is usually time-consuming and frustrating, which is why it is most important to have available documentation such as network inventory mapping and baseline configuration information. Run the tools again to see if there was a change in configuration or change in state from when the system was previously working. These details can provide a depth of understanding as to what areas can be problem areas, and areas that are working properly.
Step Five: During your process of isolating components, you may find that you need to repair or replace something. Start with a known good baseline and insert the product into your network environment. Be sure to follow appropriate security guidelines and do not use default passwords that can expose your network to hackers or any other unauthorized access. Be sure to configure the devices according to information security policies, procedures, standards and industry best practices. You may need to replace and configure multiple devices until you have “rooted” or “weeded” out the bad components in your system.
Step Six: In the event that your basic and moderate IT troubleshooting efforts do not prove successful, there may be a need to deploy advanced technical tools such as port scanners, sniffers and network traffic management devices. Many of these devices may require advanced support from industry professionals, and may include placing a support call to a third-party provider. One of the major differences that you will see from vendors that provide advanced network diagnostics are that they quite often will be highly trained experts, who can provide secure remote connectivity (if your company permits), and perform diagnostics directly on your system.
Darnell Washington, CISSP, is President and CEO of SecureXperts Inc. (www.securexperts.com). He is responsible for implementing secure integrated physical and logical enterprise infrastructures for federal, state and enterprise commercial environments, and his blog can be found on SecurityInfoWatch.com.