There are a number of conditions that cause a failover as described below. The assumption is made in the following sections that the two interface copies are in a steady-state condition and a change in state forces a failover.
Primary Interface Shutdown
This scenario refers to a graceful shutdown of the primary interface. The chain of events is as follows.
The primary interface is stopped.
On exit, the primary interface writes a value of 0 to its heartbeat point and to the active ID point.
The backup interface reads the ActiveID. As soon as it detects a value of 0, it assumes the role of the primary interface.
The primary interface may stop just after the backup checked the value of the control points on the data source. It will then take an additional failover update interval before the backup realizes the primary has shut down. This will not result in a loss of data because the backup has two failover update intervals of data in its queue. When the backup transitions to the primary state, it sends its two intervals worth of queued data. This will result in between one and two failover update intervals worth of overlapping data (e.g., for the default failover update interval of 1 second, there will be up to 2 seconds of overlapping data).
The interface in backup mode writes its failover ID to the active ID point.
It waits two failover update intervals and then verifies that the ActiveID is the same as its failover ID.
If the ActiveID is equal to the other copy’s failover ID, the interface remains in backup mode. If the ActiveID is not the other copy’s failover ID and it is not this copy’s failover ID, the interface sets the ActiveID to its failover ID and waits another two failover update intervals. If the interface ActiveIDremains the same, the backup interface sends its queued data to PI and starts operating as the primary interface.
The timing chart in Figure 2 shows the primary interface gracefully shutting down and the backup interface failing over to assume the role as primary. Time increases from left to right according to the scale at the bottom of the figure. Every number along the horizontal axis represents one failover update interval. In the explanations that follow the charts, the failover update interval is the default of one second. Key events in time are denoted by the labeled arrows A, B, C, and D.
Time
Action
T+0
Both interfaces are running with IF1 in the primary role.
T+2.5
IF2 reads the active ID point and IF1’s heartbeat point. Both are good so IF2 discards data in its queue older than time 0.5.
T+2.8
Event A: IF1 shuts down gracefully, setting the active ID point and its heartbeat point to 0.
T+3.5
Event B: IF2 notices the active ID point is 0 and immediately transitions to primary, sending the data in its queue to PI.
Overlap data from time 0.5 to 2.8
Figure 2: Timing chart for when the primary interface gracefully shuts down and the backup interface assumes the role of primary.