RSF-1, Resilient Server Facility, makes applications and services highly available by automatically
switching between servers (2-32) should a server or service fail. RSF-1 provides a multi-directional
redundant facility, allowing servers to constantly monitor and "shadow" each other.
No Redundant Investment
Rather than keeping a standby option idle as the failover server, RSF-1 allows operational systems to
act as standby servers for others in the cluster or vice-versa. This assures the customer that the
investment of additional servers is not wasted. In the event that a server fails, its shadow will
initialise the failed service(s), thus ensuring cointinued availability.
Highly Available Services
RSF-1 not only provides a highly available platform, but also offers highly available services. This
allows individual applications to be stopped, started, and migrated between nodes without affecting
other applications execution environments or requiring shadow systems to be rebooted.
Automatic or Manual Switch-over
When a server or service fails, RSF-1 detects the breakdown. RSF-1 can either (1) restart all associated
services on an available cluster node, or (2) signal an operator. RSF-1 also monitors the health of
individual services on a node via service agents. Should RSF-1 detect a failure, it can either restart
the service or fail the service over to another node in the cluster.
Monitoring and Control
RSF-1 includes a Java-based system administration tool, RSF-1 Cluster Manager, which allows the
monitoring of RSF-1 heartbeats in real time. These monitoring tools display the status of RSF-1
instances available on the network, and provide manual switch-over functionality. An SNMP agent sends
RSF-1 status information to management consoles (such as Sun Net Manager and HP OpenView). Standard
system logging allows log messages to be intercepted and redirected via email or pager using
additional software.
Multi-protocol Heartbeat
RSF-1 uses disk, TCP/IP and RS232 connections to establish heartbeat mechanisms providing independent
monitors. All connections must fail before a machine is deemed "unavailable". The RS232 component
uses a custom protocol and is completely independent from network service daemons.
Secondary IP addressing
Floating host names and IP addresses are used to alleviate the need for any changes to client
configurations after failovers. The node in the cluster that restarts the services simply assumes
the identity of the failed machine.
Load Sharing
RSF-1 treats applications as individual process units and executes on the servers as defined by the
RSF-1 configuration. Typically, each server acts as a primary node for a group of services and is
initiated on one server at system start-up. Once running, the RSF-1administrator can switch individual
services to other nodes in the cluster. This can be used for server optimisation, load balancing,
or to grant more resources to certain applications. For example, a Payroll application may require
more system resources on certain days. By using this RSF-1 feature, the administrator can move all or
some process units to a shadow server to achieve the desired effect.
|