Redundancy through Clusters
The concept of redundancy is frequently extended to networked application servers, such as network video recorders, access control and/or security servers. In many cases, servers/recorders may use fault-tolerant RAID (redundant arrays of inexpensive disks) techniques, allowing for the redundant storage of data. RAID, with various levels of availability, is offered by many vendors. Several of the RAID levels ensure data is not lost even if a hard drive fails. For the sake of brevity, I am assuming you are familiar with RAID storage, so this article will not go into more detail on that topic. In any case, fault-tolerant redundancy may also be extended across multiple servers through “clustering.”
Clustering enables two separate servers to appear as one, generally with one acting as the primary application server and the second as a back-up. Software or a portion of the server software known as a service (such as Microsoft Cluster Services) monitors the health of the server and its backup server. If there is a problem detected, the service can signal the back-up server to take over primary operation. The service also controls the IP addresses of the server cluster such that other networked devices or clients will be unaffected in the event of a server failure when the back-up server takes over.
There are several ways to configure redundant servers to deliver a “highly available” system. In the most basic configuration, sometimes referred to as “cold-standby,” only the primary server is working, or active, and the back-up server is “offline.” In this configuration, the standby server is not supporting any transactions or operations. In the event of a primary server failure, the standby server may require additional configuration prior to taking over. This may include restoration of the data in the primary server’s database. Any computation or transaction that the primary server was handling during failover may have to be re-initiated when the standby server becomes operational, depending on the frequency of the database replication function.
In “warm-standby” configurations, the back-up server may have been partially configured, but some parameters may require updating before the application can resume normal operation. This scenario assumes that any relevant existing database was uncorrupted or that database replication minimized any data loss. The benefit of a warm-standby configuration over cold-standby is generally the time savings for the system to return to an operational state.
In some scenarios, both servers can be configured to operate concurrently (sometimes called an active-active configuration, or “hot-standby”) with each server acting as the back-up for the other. In this case, the servers may be running the same computation or function, so the failover is completely invisible. Each server has a current version of the database (data in each of the servers is completely synchronized), so normal operation continues even during the primary server’s failure.
For an additional measure of high availability, the servers in any of the configurations could be operating in different locations. As such, a catastrophic event or failure at one server location would not affect the operation at the backup server’s location.
For an access control application using a Microsoft-based server, cluster services are supported in Microsoft Windows 2003. From a database perspective, the servers would also require the Microsoft SQL Server to support “cluster awareness.” Finally, the access control system vendor’s application software may also need to be configured to be aware of a back-up server failover capability. In some cases, a single vendor license for the application may be required as only one access control system is supported at any given time. But in other cases, it may be necessary to have a second license for the back-up server.
Resiliency Through Message Queues
Many network protocols or application communications assume that a direct connection exists between hosts (i.e., servers and/or edge devices) at all times. Unfortunately, if the link fails, a message or alert may never be received by the intended host. This is where other resiliency features can complement some of the high availability features discussed thus far. By using a messaging protocol known as Microsoft Message Queuing (MSMQ), applications on disparate servers keep a list (or queue) of recent events, alarms or other alerts such that they can be sent to another application or device once a communication link is restored. As a result, MSMQ provides reliable and resilient (but not necessarily timely) delivery of messages between hosts and applications.
Today’s IT networks are ideally suited to support various applications, from efficiently supporting financial markets transacting equity trades in the billions of dollars, to IP telephony, to physical security systems. They have proven their availability in times of crisis such as the Sept. 11 attacks or Hurricane Katrina — being the only systems to remain up, or the last system to go down and the first to come back up.
Implementing a high-availability solution should take into consideration the criticality of security to a given organization. A portion of system down-time risk can be mitigated simply by selecting vendors whose systems are more reliable and by following installation and maintenance best practices. At a minimum, the application and its database should be backed up regularly and religiously.
You may find that your IT group has already implemented some of these features and capabilities. If not, some physical security vendors and systems integrators offer technical/professional services and support to handle the system configuration for you. As a result, you may be able to enjoy the benefits of a highly available security system for a relatively modest incremental investment.
Bob Beliles is vice president of enterprise business development for Hirsch Electronics (www.HirschElectronics.com), a manufacturer of IP-based access control and identity management systems. Prior to joining Hirsch, Mr. Beliles co-founded Cisco Systems’ physical security initiative and led a number of product development efforts. He can be reached at firstname.lastname@example.org.