http://blog.perftuning.com/oracle-goldengate-12-2-heartbeat-table-issue-resolved/
Oracle GoldenGate 12.2 Heartbeat Table Issue Resolved
Monitoring lag in GoldenGate has always been an important part of monitoring GoldenGate. Lags are reported in several ways. When using the ggsci command lag only the latest lag is reported with a 1 second resolution providing the last reported lag. This isn’t accurate and does not provide a history for the Oracle GoldenGate 12.2 Heartbeat. In the past, GoldenGate implementers have created heartbeat tables manually.
In Oracle GoldenGate 12.2 a built-in heartbeat table feature has been added. This heartbeat table allows for more accurate heartbeats and heartbeat history. It works by creating an artificial transaction every minute that contains timing information that is used for heartbeats. The heartbeat tables are accessed via views that provide accurate lag data and lag history.
Recently I setup a pair of Oracle databases with Oracle GoldenGate 12.2. I setup these systems in hope of testing the new Heartbeattable feature that was introduced in GoldenGate version 12.2. The internal heartbeat mechanism is a great improvement in that it provides automatic and accurate lag times and includes a lag history.
I implemented GoldenGate 12.2 between two Oracle 11.2.0.4 databases using standard parameter files that I typically use as a starting point for GoldenGate projects or testing. In addition, database connectivity was set up using the GoldenGate credentialstore. In this case, for testing I have set up GoldenGate to replicate the HR example tables. Unfortunately the heartbeat mechanism failed to work. This blog entry describes the issue that I had and potential solutions.
Configuring GoldenGate
In order to configure replication I used the following GLOBALS and extract parameter files:
GLOBALS
GGSCHEMA ggadmin
EXT1HR
———————————-
— Local extract for HR schema
————————————
Extract EXT1HR
SETENV (NLS_LANG = AMERICAN_AMERICA.AL32UTF8
USERIDALIAS ggadm
ReportCount Every 30 Minutes, Rate
Report at 01:00
ReportRollover at 01:15
DiscardFile dirrpt/EXT1HR.dsc, Append
DiscardRollover at 02:00 On Sunday
DDL INCLUDE MAPPED
TRANLOGOPTIONS EXCLUDEUSER ggadmin
GETTRUNCATES
ExtTrail dirdat/la
Table HR.*;
At first look I thought that this should work. The GLOBALS parameter GGSCHEMA ggadmin is required for both DDL replication and the heartbeattable. Unfortunately this turned out to be part of the problem.
The pump parameter file was configured for passthrough as shown here:
PUMP1HR
————————————
— Pump extract for HR schema
————————————
Extract PUMP1HR
PASSTHRU
USERIDALIAS ggadm
ReportCount Every 1000 Records, Rate
Report at 01:00
ReportRollover at 01:15
DiscardFile dirrpt/PUMP1HR.dsc, Append
DiscardRollover at 02:00 ON SUNDAY
RmtHost target, MgrPort 7809
RmtTrail dirdat/ra
Table HR.*;
The initial extract and pump were registered using an obey file containing the following commands:
dblogin useridalias ggadm
add extract EXT1HR, tranlog, begin now
add exttrail dirdat/la, extract EXT1HR, megabytes 100
add extract PUMP1HR, exttrailsource dirdat/la
add rmttrail dirdat/ra, extract PUMP1HR, megabytes 100
On the target GoldenGate 12.2 was setup using the following GLOBALS and replciat parameter files:
GLOBALS
GGSCHEMA ggadmin
REP1HR
————————————
— Replicat for HR schema
————————————
replicat REP1HR
SETENV (NLS_LANG = AMERICAN_AMERICA.AL32UTF8)
USERIDALIAS ggadm
BATCHSQL
HandleCollisions
— Only one of these is used at a time
AssumeTargetDefs
ReportCount Every 30 Minutes, Rate
Report at 01:00
ReportRollover at 01:15
DiscardFile dirrpt/REP1HR.dsc, Append
DiscardRollover at 02:00 ON SUNDAY
Map HR.COUNTRIES, Target HR.COUNTRIES ;
Map HR.DEPARTMENTS, Target HR.DEPARTMENTS ;
Map HR.EMPLOYEES, Target HR.EMPLOYEES ;
Map HR.JOBS, Target HR.JOBS ;
Map HR.JOB_HISTORY, Target HR.JOB_HISTORY ;
Map HR.LOCATIONS, Target HR.LOCATIONS ;
Map HR.REGIONS, Target HR.REGIONS ;
This replicat was registered using an obey file that contained the following commands:
dblogin USERIDALIAS ggadm
add checkpointtable ggadmin.rep1hr_chkpt
add replicat REP1HR, exttrail dirdat/ra, checkpointtable ggadmin.rep1hr_chkpt
All of the GoldenGate processes were started and test transactions run. Once GoldenGate was verified to be running correctly it was time to set up the 12.2 heartbeat table and see if it worked.
Setting up the Built-In Heartbeat Table
The heartbeat table was setup first on the target system and then on the source and the results of that are shown here:
ADD HEARTBEATTABE on target
GGSCI > add heartbeattable
2016-02-18 10:00:13 INFO OGG-14001 Successfully created heartbeat seed table [“GG_HEARTBEAT_SEED”].
2016-02-18 10:00:13 INFO OGG-14032 Successfully added supplemental logging for heartbeat seed table [“GG_HEARTBEAT_SEED”].
2016-02-18 10:00:13 INFO OGG-14000 Successfully created heartbeat table [“GG_HEARTBEAT”].
2016-02-18 10:00:13 INFO OGG-14033 Successfully added supplemental logging for heartbeat table [“GG_HEARTBEAT”].
2016-02-18 10:00:13 INFO OGG-14016 Successfully created heartbeat history table [“GG_HEARTBEAT_HISTORY”].
2016-02-18 10:00:13 INFO OGG-14023 Successfully created heartbeat lag view [“GG_LAG”].
2016-02-18 10:00:13 INFO OGG-14024 Successfully created heartbeat lag history view [“GG_LAG_HISTORY”].
2016-02-18 10:00:13 INFO OGG-14003 Successfully populated heartbeat seed table with [ORCLS].
2016-02-18 10:00:13 INFO OGG-14004 Successfully created procedure [“GG_UPDATE_HB_TAB”] to update the heartbeat tables.
2016-02-18 10:00:13 INFO OGG-14017 Successfully created procedure [“GG_PURGE_HB_TAB”] to purge the heartbeat history table.
2016-02-18 10:00:13 INFO OGG-14005 Successfully created scheduler job [“GG_UPDATE_HEARTBEATS”] to update the heartbeat tables.
2016-02-18 10:00:13 INFO OGG-14018 Successfully created scheduler job [“GG_PURGE_HEARTBEATS”] to purge the heartbeat history table.
ADD HEARTBEATTABLE on source
GGSCI > add heartbeattable
2016-02-18 10:02:36 INFO OGG-14001 Successfully created heartbeat seed table [“GG_HEARTBEAT_SEED”].
2016-02-18 10:02:37 INFO OGG-14032 Successfully added supplemental logging for heartbeat seed table [“GG_HEARTBEAT_SEED”].
2016-02-18 10:02:37 INFO OGG-14000 Successfully created heartbeat table [“GG_HEARTBEAT”].
2016-02-18 10:02:37 INFO OGG-14033 Successfully added supplemental logging for heartbeat table [“GG_HEARTBEAT”].
2016-02-18 10:02:37 INFO OGG-14016 Successfully created heartbeat history table [“GG_HEARTBEAT_HISTORY”].
2016-02-18 10:02:37 INFO OGG-14023 Successfully created heartbeat lag view [“GG_LAG”].
2016-02-18 10:02:37 INFO OGG-14024 Successfully created heartbeat lag history view [“GG_LAG_HISTORY”].
2016-02-18 10:02:37 INFO OGG-14003 Successfully populated heartbeat seed table with [ORCLP].
2016-02-18 10:02:37 INFO OGG-14004 Successfully created procedure [“GG_UPDATE_HB_TAB”] to update the heartbeat tables.
2016-02-18 10:02:37 INFO OGG-14017 Successfully created procedure [“GG_PURGE_HB_TAB”] to purge the heartbeat history table.
2016-02-18 10:02:37 INFO OGG-14005 Successfully created scheduler job [“GG_UPDATE_HEARTBEATS”] to update the heartbeat tables.
2016-02-18 10:02:37 INFO OGG-14018 Successfully created scheduler job [“GG_PURGE_HEARTBEATS”] to purge the heartbeat history table.
Once the heartbeat table was created it should have been an easy matter to go to the target system and query the ggadmin.gg_heartbeat and ggadmin.gg_heartbeat_history tables to see the automated heartbeats. Unfortunately at this point there were no rows in these tables. It would take a little bit of investigation in order to determine what the issue was.
Debugging the Issue
Since there are a number of parts to the heartbeat table including the extract, pump and replicat I had to decide where to start. I knew that the heartbeat mechanism took advantage of the replication that was currently configured in order to move its heartbeat information from the source to the target. I also believed that it was using GoldenGate replication.
In order to see if anything was moving in the trail file I used the GoldenGate stats command against the extract. I only saw the transactions that I had run for my own testing of the replication. This led me to believe that it was a problem at the source side. I also ran logdump and looked at the trail files and I saw no “heartbeat” records in the trail.
I then looked in the database at the database schedule to see if the GG_UPDATE_HEARTBEATS job was running. It was and it was updating the GG_HEARTBEAT_SEED HEARTBEAT_TIMESTAMP column. So, the scheduler job was running and the column was being updated so there was nothing in the trail file, so it was most likely an issue with replication.
Looking back at the extract parameter file it became apparent that this might be related to the parameter TRANLOGOPTIONS EXCLUDEUSER ggadmin parameter. So, I commented out that parameter and suddenly the GG_HEARTBEAT and GG_HEARTBEAT_HISTORY table began populating on the target side. In addition, after a while a stats command against the extract showed updates to the GG_HEARTBEAT_SEED table.
*** Daily statistics since 2016-02-18 00:00:00 ***
Total inserts 0.00
Total updates 603.00
Total deletes 0.00
Total discards 0.00
Total operations 603.00
In addition the gg_lag_history view showed the data that I was looking for:
column heartbeat_received_ts format a30
column incoming_path format a40
column incoming_timestamp format 9.99999999
select heartbeat_received_ts, incoming_path, incoming_lag from gg_lag_history;
18-FEB-16 11.41.41.893670 AM EXT1HR ==> PUMP1HR ==> REP1HR 4.868161000
18-FEB-16 11.42.42.944752 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.925239000
18-FEB-16 11.43.41.993427 AM EXT1HR ==> PUMP1HR ==> REP1HR 4.970112000
18-FEB-16 11.44.42.041501 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.021219000
18-FEB-16 11.45.43.091402 AM EXT1HR ==> PUMP1HR ==> REP1HR 6.065255000
18-FEB-16 11.46.42.140396 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.121159000
18-FEB-16 11.47.41.189519 AM EXT1HR ==> PUMP1HR ==> REP1HR 4.136698000
18-FEB-16 11.48.43.240700 AM EXT1HR ==> PUMP1HR ==> REP1HR 6.215541000
18-FEB-16 11.49.43.289275 AM EXT1HR ==> PUMP1HR ==> REP1HR 6.268965000
18-FEB-16 11.50.42.338042 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.312789000
18-FEB-16 11.51.42.386426 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.368242000
18-FEB-16 11.52.42.435722 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.412282000
18-FEB-16 11.53.43.487633 AM EXT1HR ==> PUMP1HR ==> REP1HR 6.470509000
18-FEB-16 11.54.42.538333 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.523112000
18-FEB-16 11.55.42.588896 AM EXT1HR ==> PUMP1HR ==> REP1HR 5.563893000
This was now successful.
Problem Recap
Because I was setting up for bi-directional replication and using the ggadmin user for both source and target I had configured the excludeuser parameter. This keeps GoldenGate from re-replicating replicated transactions by ignoring transactions submitted by the ggadmin user. This was fine for normal transactions but I didn’t expect that the heartbeat transactions would be excluded as well.
The excludeuser ggadmin caused updates to the GG_HEARTBEAT_SEED table to not be replicated. You cannot exclude the ggadmin user in the extract. The heartbeat schema is defined by the GGSCHEMA parameter in the GLOBALS file. In addition, the GGSCHEMA parameter defines the schema for DDL replication. This causes a bit of a problem for bi-directional replication when you want to use the ggadmin user for both extract and replicat.
Solutions
I thought about a number of different solutions to this problem and consulted some of my colleagues. We decided that the best approach to this problem was to simply use a different Oracle database user for extract and replicat. This would allow us to still maintain the same GGSCHEMA user for heartbeat and DDL replication. The new user account used for the replicat would be excluded in the TRANLOGOPTIONS EXCLUDEUSER parameter and everything should work well.
This would allow the ggadmin user at the source to submit both DDL and heartbeats, since you can only have one setting for GGSCHEMA which both must use. The different user that is used for replicating back to the source will be excluded via TRANLOGOPTIONS EXCLUDEUSER .
I really would like to see the GoldenGate developers take a look at this and internally allow replication of the GoldenGate lag tables to be excluded from the excludeuser option.
Managing Heartbeat Data
As seen above the heartbeat table is created via the ADD HEARTBEATTABLE command within ggsci. By default a heartbeat is generated every minute, retained for 30 days then purged. The frequency of the heartbeat, the history retention and how often the purge process runs is configurable. This is done via the ALTER HEARTBEATTABLE command.
Viewing Heartbeat Data
Viewing the heartbeat table is done via the two heartbeat views; GG_LAG and GG_LAG history. These views provide information on lags for each set of ext -> pump -> replicat that is configured. This information as well as the history is valuable for monitoring the performance of the GoldenGate configuration.
I have implemented viewing these tables via the following scripts. The scripts and output are shown here:
lag.sql
col local_database format a10
col current_local_ts format a30
col remote_database format a10
col incoming_path format a30
col incoming_lag format 999,999.999999
select local_database, current_local_ts, remote_database, incoming_path, incoming_lag from gg_lag;
Output of lag.sql
TARGET on GG16B:ggadmin > @lag
LOCAL_DATA CURRENT_LOCAL_TS REMOTE_DAT INCOMING_PATH INCOMING_LAG
———- —————————— ———- ——————————
ORCLS 19-FEB-16 08.23.40.715171 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 5.848900
Lag_history.sql
set pagesize 100
col local_database format a10
col heartbeat_received_ts format a30
col remote_database format a10
col incoming_path format a32
col incoming_lag format 999,999.999999
select local_database, heartbeat_received_ts, remote_database, incoming_path, incoming_lag from gg_lag_history;
Output of lag_history.sql
ORCLS 19-FEB-16 08.30.40.678817 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 3.656291
ORCLS 19-FEB-16 08.31.41.702019 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.681604
ORCLS 19-FEB-16 08.32.41.724873 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.700153
ORCLS 19-FEB-16 08.33.41.747616 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.729306
ORCLS 19-FEB-16 08.34.41.770055 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.748563
ORCLS 19-FEB-16 08.35.41.793370 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.769183
ORCLS 19-FEB-16 08.36.41.816100 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.798405
ORCLS 19-FEB-16 08.37.41.839139 PM ORCLP EXT1HR ==> PUMP1HR ==> REP1HR 4.817937
The output of the lag history can be used to monitor lags over long periods of time and be used for alerting and monitoring. The lag history can be imported into a spreadsheet and graphed as well.
GGLagGraph