Sun Cluster 3.1 Overview

Sun Cluster Features

Data Service Support

There are many data service agents for Sun Cluster:

CMM

The cluster membership monitor (CMM) is kernel-resident on each node and detects major cluster status, such as loss of communication. Heartbeat messages are transmitted across the private network shared by all nodes within the cluster. If a node is not detected within a timeout period the node would be classed as failed then the node has to renegotiate cluster membership again.

Network Fault Monitoring

Public network management uses the PNM daemon, pnmd monitors the functionality of IPMP groups.

Cluster transport interfaces are monitored on each node. If the active interface on any node is determined to be inoperative, all nodes switch to the backup interface and attempt to re-establish communications.

Data Service Monitoring

Each data service has predefined fault monitoring routines associated with it. When the resource group is brought online, the resource group management (RGM) software automatically starts the appropriate fault monitoring processes. The data service fault monitors are referred to as probes.

The fault monitoring performs two functions, monitor for abnormal exit of data service processes and checking the health of the data server.

Cluster Configuration Repository

General cluster configuration information is stored in global configuration files collectively referred to as the cluster configuration repository (CCR). The CCR must be kept consistent between all nodes and is a critical element that enables each node to be aware of its potential role as a designated backup system. The files are timestamped and checksum information is used to keep the cluster integrity. The repository is accessed when a change in the cluster configuration occurs.

The CCR structures contain the following types of information:

Scalable Services

The sun cluster scalable service rely on the following components:

A scalable data service application is designed to distribute an application workload between two or more cluster nodes, as shown below

Disk ID Devices

During the installation the DID driver probes devices on each node and creates a unique DID device name for each disk or tape device. See web page for global device commands.

Global Devices

Sun Cluster uses global devices to provide cluster-wide highly available access to any device within the cluster, from any node, without regard to where the device is physically attached. If a path fails then the cluster will automatically discover another path to the device and redirects access to that path.

The Sun Cluster mechanism that enables global devices is the global namespace includes the /dev/global hierarchy as well as the volume manager namespace. The global namespace basically points to the /devices and /dev filesystems on each node and as the global namespace can be seen from all node then all devices can be seen as well. See web page for global device commands.

Cluster Filesystem

Cluster filesystems are dependent on global devices with physical connections to one or more nodes. To make a filesystem global, the mount point will be located in /global and you would use the following command:

  mount -g /dev/vx/dsk/nfsdg/vol01 /global/nfs

After which the filesystem is available to all nodes within the cluster.

The filesystem is based on the proxy filesystem (PXFS) which has the following features:

Note: PXFS is not a distinct filesystem type, that is, the client see the underlying filesystem type as ufs or vxfs, etc

Resource Groups

At the heart of any sun cluster is the concept of a resource group. Resource group definitions are created and are associated with a particular data service.

The resource group definition provides all the necessary information for a designated backup system to take over the data service of a failed node.

Resource group information is maintained in the globally available CCR database. See web page for resource group commands.