If a column is used to calculate the range, why won’t the downstream trails be divided exactly by the column value

Goal

If a column is used to calculate the range, why won’t the downstream trails be divided exactly by the column value?

 

Here is a customer case that demonstrates this configuration. A customer wanted to split the load on a heavily used table across more downstream Replicats. The table had 8 partitions which were exactly equal in size and contained the same amount of rows. There was a Column PART_ID in the primary key which identified the partition. The application filled the partition round robin, so the data was evenly distributed across the 8 partitions. The FILE statement was specified as:

 

Exttrail \SYS.$GGATE.GGSDAT1.ET

file $DATA.APPDATA.TABLE,

filter (@range (1,8 , PART_ID));

 

Corresponding FILE statements were specified for the remaining 7 target trails.

 

The customer expected that each downstream trail file would be filled with the data of a single partition, so that he then could have 8 Replicats reading the trail files and processing the same number of records. But the data was distributed as follows:

 

•Trail 1 contained the data from partition 1

•Trail 2 contained no data

•Trail 3 contained the data from partition 2

•Trail 4 contained the data from partitions 3, 4 and 5

•Trail 5 contained no data

•Trail 6 contained the data from partition 6

•Trail 7 contained the data from partition 7

•Trail 8 contained the data from partition 8

Is there a way to direct the @RANGE function to split the data evenly across the target trail files?

 

Solution

We set up a test to provide a basis for evaluating this question. We created a table where the primary key was comprised of two columns, the first of which contained the value of the table partition. We inserted 358 rows into each partition of the table. The Extract parameters were:

 

exttrail $data04.mcsdat.z1

file $data04.mcsdat.parttest, filter (@range (1, 4, partition_id));

 

exttrail $data04.mcsdat.z2

file $data04.mcsdat.parttest, filter (@range (2, 4, partition_id));

 

exttrail $data04.mcsdat.z3

file $data04.mcsdat.parttest, filter (@range (3, 4, partition_id));

 

exttrail $data04.mcsdat.z4

file $data04.mcsdat.parttest, filter (@range (4, 4, partition_id));

Similar to the customer’s results, the breakdown of data into the target trails was:

 

Total # records written to \ZEUS.$DATA04.MCSDAT.Z1 358

\ZEUS.$DATA04.MCSDAT.PARTTEST # inserts: 358

 

Total # records written to \ZEUS.$DATA04.MCSDAT.Z2 0

 

Total # records written to \ZEUS.$DATA04.MCSDAT.Z3 358

\ZEUS.$DATA04.MCSDAT.PARTTEST # inserts: 358

 

Total # records written to \ZEUS.$DATA04.MCSDAT.Z4 716

\ZEUS.$DATA04.MCSDAT.PARTTEST # inserts: 716

 

 

Here is an explanation of how the @RANGE function operates, and why we saw these results from this particular configuration. @RANGE computes a hash value of all the columns specified, or if no columns are specified, the primary key columns of the source table. In this case, there are only 4 unique values in that table for the partition_id column, (0,1,2,3), which means there will only be 4 possible hash values.

 

The hash value modulo num ranges gives an exttrail index for the record. Modulo 4 means only the lower 2 bits of the hash value are used to determine the index.

 

value hash low bits ranges trail(zero based index)

——- ———— —— ——- ——

0 4052366646 2 MOD 4 = 2

1 1132341824 0 MOD 4 = 0

2 3647185055 3 MOD 4 = 3

3 3440395251 3 MOD 4 = 3

 

Total # records written to $DATA04.MCSDAT.Z1 358

Total # records written to $DATA04.MCSDAT.Z2 0

Total # records written to $DATA04.MCSDAT.Z3 358

Total # records written to $DATA04.MCSDAT.Z4 716

 

So the @RANGE function is not going to be appropriate for splitting the data exactly across partition lines. When you want to split the data precisely by a particular column value, it would be better to use a WHERE clause. However, you do need to make sure that if you use a WHERE clause that all possible key values are included in the set of WHERE criteria; otherwise data may be skipped. This is not an issue with @RANGE, since it will automatically include every source record in one of the downstream targets.

 

Regarding performance impact, there is no performance benefit to using @RANGE over WHERE, so either is a viable option when you need to split data across multiple targets. Both @RANGE and WHERE have to evaluate every TABLE statement for every record to determine if it passes, so the amount of resources it takes Extract to process either RANGE or WHERE is comparable.

Sobre Alexandre Pires

ORACLE OCS Goldengate Specialist, OCE RAC 10g R2, OCP 12C, 11g, 10g , 9i e 8i - Mais de 25 anos de experiência na área de TI. Participei de projetos na G&P alocado na TOK STOK, EDINFOR alocado na TV CIDADE "NET", 3CON Alocado no PÃO DE AÇUCAR, DISCOVER alocado na VIVO, BANCO IBI e TIVIT, SPC BRASIL, UOLDIVEO alocado no CARREFOUR e atualmente na ORACLE ACS atendendo os seguintes projetos: VIVO, CLARO, TIM, CIELO, CAIXA SEGUROS, MAPFRE, PORTO SEGURO, SULAMERICA, BRADESCO SEGUROS, BANCO BRADESCO, BASA, SANTANDER, CNJ, TSE, ELETROPAULO, EDP, SKY, NATURA, ODEBRESHT, NISSEI, SICREDI, CELEPAR, TAM, TIVIT, IBM, SMILES, CELEPAR, SERPRO,OKI,BANCO PAN, etc
Esse post foi publicado em GOLDENGATE. Bookmark o link permanente.

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s