with using dataguard (Oracle’s hot/cold standby database solution) you have the challenge to make a failover or a roleswitch between the primary and the standby database transparent to the clients. If you don’t, you will have a (partially) loss of service: Clients tnsnames.ora’s will need reconfiguration or the application has to use another TNS connection name now. This paper shows a different, very smart way. If you don’t want to read the whole thing and just have OCI and sql*plus clients to service, have a look at my quick summary. I tested it and use it for production now.
- a working dataguard setup with two database servers, one is primary, the other one is (physical) standby
- one or more client(s) with oracle client installed – I used 10.2.0.3, don’t know for sure if older versions are possible, too.
Use this as service entry in your client’s tnsnames.ora:
TESTDG = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = host1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1521)) (LOAD_BALANCE = no) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = HA-SERVICE) ) )
With this syntax, you make sure that the client always uses the ADDRESS entry that provides the specified SERVICE_NAME. If not, the entry will be left out.
Now we have to specify the SERVICE_NAME parameter on the database dynamically: If the system is the primary, please add “HA-SERVICE” to the SERVICE_NAME parameter, and if it is the standby, remove the entry from there. Since I don’t like to do that by myself, I talked Oracle into doing it for me. So first, we create a service – why? Because if a service is started, its network name is added to the SERVICE_NAME parameter list!🙂
Execute this on your primary DB (you might need to execute
first to create DBMS_SERVICE):
exec DBMS_SERVICE.CREATE_SERVICE( service_name => 'HA-SERVICE', network_name => 'HA-SERVICE', aq_ha_notifications => true, failover_type => 'SELECT', failover_retries => 180, failover_delay=> 1);
The procedure should work out-of-the-box. Now check, if you can start and stop the service, and if its NETWORK_NAME shows up in the SERVICE_NAME parameter of your primary DB:
exec DBMS_SERVICE.START_SERVICE('HA-SERVICE'); show parameter service_name; exec DBMS_SERVICE.STOP_SERVICE('HA-SERVICE');
If you made a mistake, DBMS_SERVICE.DELETE_SERVICE or DBMS_SERVICE.MODIFY_SERVICE might be your friend. But if all went well, let’s proceed. Now, the joke continues: You can set up a trigger “after startup on database” in schema SYS – do you hear me?
create or replace trigger manage_HASERVICE after startup on database DECLARE role VARCHAR(30); BEGIN select database_role into role from v$database; if role = 'PRIMARY' then DBMS_SERVICE.START_SERVICE('HA-SERVICE'); else DBMS_SERVICE.STOP_SERVICE('HA-SERVICE'); end if; END;
We are asking V$DATABASE for the instance’s role, and decide whether to start the freshly created service or not. We are done – I love it. Don’t forget to fire a log switch if you don’t use real time apply – it’s no harm if you do it anyway:
alter system switch logfile; alter system checkpoint;
Now test the scenario with a graceful switchover and check if the SERVICE_NAME parameter is ste as expected. If yes, perform the end to end test with your client(s). With a recent client and
failover_type => 'SELECT'
specified (it is if you did what I suggested above), you can start a SQL select on node1 as primary, perform a dataguard switchover while it runs, and get your statement finished on the new primary node2.
Additionally just a litte backgroud to the TNS (Transparent Network Substrate): It’s a classical layer 5 protocol. That’s good, because during the switchover/failover process your TCP connections will be terminated by option
aq_ha_notifications => true
mentioned above. But your SQL session will persist as long as your client-specific timeouts in SQLNET.ORA are allowing that. So TNS makes a session failover possible at all: The client does a reconnect on layer 4 (tcp), finds HA-SERVICE not provided by node1 and continues to node2. Smart thing, all you need to do is waiting for the new primary machine. By the way, this is exactly how client failover in RAC environments works.