Pre-Loading Oracle Text indexes into Memory

Pre-Loading Oracle Text indexes into Memory
Introduction
Oracle Text indexes are stored in Oracle Database relational tables. These tables normally reside on disk. Systems with large memory configurations generally benefit greatly from Oracle caching – a second or subsequent search for a particular term is generally much faster than the first search.
But what about the first search? Can we pre-load the database cache in order to make sure that most of what we need is already in memory before the first search?

This paper explores some techniques which make this possible.

Prerequisites
This paper assumes a good understanding of the various underlying tables of an Oracle Text index. If you are not aware of the functions of the various tables ($I, $K etc), then the paper should still offer some useful techniques, but to gain full benefit you should review documents such as this.
The Keep Pool
Normally the database buffer cache size is set with the parameter DB_CACHE_SIZE. If our intention is to load all of our indexes into memory and keep them there, we would probably be better off putting them into the “keep pool”. This is a buffer cache from which objects will never be aged out. The size of the keep pool is determined by the parameter DB_KEEP_CACHE_SIZE.
A table or index may be forced to reside in the keep cache using the commands

ALTER TABLE STORAGE (BUFFER POOL KEEP);ALTER INDEX STORAGE (BUFFER_POOL KEEP);
What to cache?
If we have unlimited memory available, then we would want to put all of the following into the keep pool:

$I token table
$X index on $I table
$R table
The base table itself (assuming we are selecting more than just ROWID, or use a sort on a real column).
If our table uses local partition indexes, we would need to store these tables for each partition. Additionally, if our application uses functional lookups in the text table we would need the $K table, and if we’re using prefix indexing we would need the $P table.

However, in real life it very likely that all of these tables will exceed the amount of memory we can afford to allocate to the keep pool. We would therefore need to select only certain tables. If we are using local partition indexes, and our queries are normally sorted by the partition key, it might make sense to just cache the tables for the first few partitions.

Alternatively, if our table isn’t partitioned, it might be better to cache certain tables and not others.

In decreasing order of importance, we could rank them as

$X
$R
equivalence ($I, $K, base table)
$P (but way below)
Now usually, $R is very small, and $X is small compared to the size of the base table and $I, so it is suggested that these two ($R andf $X) always be loaded.
$I, base table and $K are perhaps equally important, – though the sizes are clearly not comparable, and different usage scenarios would tend to favor different approaches. In the normal case (with little or no functional invocation), we would suggest base-table, $I, $K in that order. If the index is badly fragmented, but we’re still not using functional invocation, then we’d flip the base table and $I. If there’s a large amount of functional invocation being used, $K goes to the head of the list.

Determining the size of an index
The CTX_REPORT index package can be used to determine the size of a index, which will aid us in planning how much we can cache.
For testing purposes, I built a local partition index on a table called TEST with one million rows – each row consisting of just the unique key ID, and a filename pointing to an external XML file.

The output of CTX_REPORT for this table is as follows:

SQL> set long 50000
SQL> select ctx_report.index_size(‘test_idx’) from dual;
CTX_REPORT.INDEX_SIZE(‘TEST_IDX’)
——————————————————————————-
-===========================================================================
INDEX SIZE FOR TESTUSER.TEST_IDX
===========================================================================

TOTALS
FOR INDEX TESTUSER.TEST_IDX
—————————————————————————
CTX_REPORT.INDEX_SIZE(‘TEST_IDX’)
——————————————————————————–
TOTAL
BLOCKS ALLOCATED: 206608
TOTAL BLOCKS USED: 198235
TOTAL BYTES ALLOCATED: 1,692,532,736 (1,614.13 MB)
TOTAL BYTES USED: 1,623,941,120 (1,548.71 MB)

So we can see that the total index size is about 1.6GB. Since the machine has “only” 2GB of memory, we can’t really afford to put all of that into memory. So we’re going to cache a subset.
Determining what is in the buffer pool
One way to find out what we need to cache is to make the normal buffer cache as large as possible, bounce the database, run a representative query, and find out what ended up in the buffer cache.

 SQL> connect sys as sysdba
 SQL> alter system set db_cache_size = 1G;
 SQL> shutdown
 SQL>startup
 SQL> connect testuser/testuser
 SQL> select /* FIRST_ROWS */ * from ( select id, filename from test
 where contains (filename, 'district and region) > 0
 order by id desc )
 where rownum= 900000;

There’s a couple of points to note here. The final select from the base table is fetching specific rows that I know are in the last two partitions. You would need to vary this according to your system design.
Also note that the $R loading is incomplete. This is related to those system LOB objects we saw earlier – and we’ll be coming back to that later.

Running these queries on my system took just over a minute. That’s to load one-fifth of the partitions for a million-row table. Here’s the output after rebooting the machine to avoid any effects of disk caching:

SUM(TOKEN_COUNT) SUM(LENGTH(TOKEN_INFO))
 ---------------- -----------------------
 6963707 37472389
 Elapsed: 00:00:27.31
 SUM(TOKEN_COUNT) SUM(LENGTH(TOKEN_INFO))
 ---------------- -----------------------
 6587719 35349528
 Elapsed: 00:00:27.77
 SUM(LENGTH(TOKEN_TEXT))
 -----------------------
 8994078
 Elapsed: 00:00:03.74
 SUM(LENGTH(TOKEN_TEXT))
 -----------------------
 8855116Elapsed: 00:00:03.21
 SUM(ID)
 ----------
 9.5001E+10Elapsed: 00:00:01.74
 SUM(ROW_NO)
 -----------
 231
 Elapsed: 00:00:00.12
 SUM(ROW_NO)
 -----------
 231


Now we’ll run our query against V$BH again to see what’s been cached. Remember, this is after pre-loading the various tables rather than running a simple query.

 OBJECT_NAME OBJECT_TYPE OWNER NUMBER_OF_BLOCKS
 ------------------------------ --------------- --------------- ----------------
 DR#TEST_IDX0009$R TABLE TESTUSER 2
 DR#TEST_IDX0010$R TABLE TESTUSER 2
 TEST TABLE PARTITION TESTUSER 612
 DR#TEST_IDX0009$X INDEX TESTUSER 5,905
 DR#TEST_IDX0010$X INDEX TESTUSER 6,018
 DR#TEST_IDX0009$I TABLE TESTUSER 14,607
 DR#TEST_IDX0010$I TABLE TESTUSER 15,090
 7 rows selected.


Now this looks similar to our previous output, except that the number of blocks is greater for most of the tables (because we’ve loaded ALL blocks, rather than just the ones needed by the query), and we don’t have those mysterious SYS_LOB objects (again we’ll come back to those later).
We’re going to try to put all those tables into the keep pool, so we need to know how much space they’re taking to decide how big to make our keep pool. We can sum up the NUMBER_OF_BLOCKS column above, or use:

SELECT COUNT(*) TOTAL_BLOCKS, COUNT(*)*8/1024 MEGABYTES
FROM DBA_OBJECTS o, V$BH bh
WHERE o.DATA_OBJECT_ID = bh.OBJD
AND o.OWNER NOT IN ('SYS', 'SYSTEM', 'SYSMAN', 'XDB', 'IX', 'WMSYS', 'CTXSYS')
ORDER BY COUNT(*);
TOTAL_BLOCKS MEGABYTES
------------ ----------
 49503       386.742188

So we need just under 400MB of keep pool for these tables. We’ll adjust this via the DB_KEEP_CACHE_SIZE parameter. I’m going to run out of memory if I just bump that up without reducing my large DB_CACHE_SIZE, so I’ll do that at the same time. I’m using an spfile rather than init.ora, so I’ll do:
ALTER SYSTEM SET DB_CACHE_SIZE = 500M SCOPE=SPFILE;

— Down from 1G before

ALTER SYSTEM SET DB_KEEP_CACHE_SIZE = 400M SCOPE=SPFILE;
And then of course I’ll bounce the database.
Forcing Objects to the KEEP POOL
Next we need to ensure that our tables go into the Keep Pool rather than the normal buffer cache. We do that with ALTER TABLE or ALTER INDEX statements as appropriate:

 alter table DR#TEST_IDX0010$I storage (buffer_pool keep);
alter table DR#TEST_IDX0009$I storage (buffer_pool keep);
alter index DR#TEST_IDX0010$X storage (buffer_pool keep);
alter index DR#TEST_IDX0009$X storage (buffer_pool keep);
alter table DR#TEST_IDX0010$R storage (buffer_pool keep);
alter table DR#TEST_IDX0009$R storage (buffer_pool keep);
alter table DR#TEST_IDX0010$R storage (buffer_pool keep);
alter table DR#TEST_IDX0009$R storage (buffer_pool keep);
alter table TEST modify partition p10 storage (buffer_pool keep);
alter table TEST modify partition p9 storage (buffer_pool keep);

We’ll run these statements, bounce our database, then run our pre-loading queries again. It would be handy if we could use the query against V$BH to make sure that the tables ARE in fact in the keep pool, but this information is not stored in V$BH – the various buffer caches are considered together.
Using TKPROF to monitor disk reads
Now we’re going to run our query again and see what effect it has. This time, we’re going to generate a trace file and examine it too see the number of physical I/O’s. In SQL*Plus:

 @?/rdbms/admin/utlxplan.sql -- avoid tkprof plan table error
alter session set sql_trace=true;
-- Use a bind variable to cut down on recursive SQL from optimizer:variable bnd varchar2(4000)
exec :bnd := 'district and region'
select /* FIRST_ROWS */ * 
from ( select id, filename 
      from test  
      where contains (filename, :bnd) > 0 order by id desc ) 
where rownum  0 
then
 begin
 loop
 dbms_lob.read(b, siz, off, buff);
 cntr := cntr + 1;
 off := off + 4096;
 end loop;
 exception when no_data_found then
 if cntr > 0 then
 dbms_output.put_line('4K chunks fetched: '||cntr);
 end if;
 end;
 end if;
 end loop;
 end loop;
 end;
 /


You would call this procedure from SQL*Plus like this:
exec LoadAllDollarR(‘test_idx’)
So now we have everything we need to preload the first two partitions of our index.
Finishing Up
Let’s run back over our our full set of commands for moving tables to the keep pool, and then loading them into memory. I’m going to do all of this each time I restart the database (the ALTER TABLES only really need to be run once, but they’re quick and this will catch any situation where the index – or individual tables – have been rebuilt).
— This script preloads the last two partitions from a 10-way– partitioned table and its text index.
The table is called TEST,

-- the index is called TEST_IDX. It is owned by user TESTUSER.
 connect testuser/testuser
 ----------------------------------------------------------
 First make sure all tables go into the keep buffer
 ----------------------------------------------------------
 alter table DR#TEST_IDX0010$I storage (buffer_pool keep);
 alter table DR#TEST_IDX0009$I storage (buffer_pool keep);
 alter index DR#TEST_IDX0010$Xstorage (buffer_pool keep);
 alter index DR#TEST_IDX0009$X storage (buffer_pool keep);
 alter table DR#TEST_IDX0010$Rstorage (buffer_pool keep);
 alter table DR#TEST_IDX0009$R storage (buffer_pool keep);
 alter table DR#TEST_IDX0010$R storage(buffer_pool keep) modify lob (data) (storage (buffer_pool keep));
 alter table DR#TEST_IDX0009$R storage (buffer_pool keep) modify lob (data) (storage (buffer_pool keep));
 alter table TEST storage (buffer_pool keep);
 set timingon
 ----------------------------------------------------------------
 Then perform the necessary queries to preload the tables
 ----------------------------------------------------------------
 SELECT /*+ FULL(ITAB) */
 SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO)) FROM DR#TEST_IDX0010$I ITAB;
 SELECT /*+ FULL(ITAB) */ SUM(TOKEN_COUNT), SUM(LENGTH(TOKEN_INFO))
 FROM DR#TEST_IDX0009$IITAB;
 SELECT /*+ INDEX(ITAB) */ SUM(LENGTH(TOKEN_TEXT))
 FROM DR#TEST_IDX0010$I ITAB;
 SELECT /*+ INDEX(ITAB)*/ SUM(LENGTH(TOKEN_TEXT))
 FROM DR#TEST_IDX0009$I ITAB;
 SELECT /*+ FULL(BTAB) */ SUM(ID)
 FROM TEST
 WHERE ID >= 900000;
 SELECT SUM(ROW_NO) FROM DR#TEST_IDX0010$R;
 SELECT SUM(ROW_NO) FROM DR#TEST_IDX0009$R;
 exec LoadAllDollarR('test_idx')
 If we run this after bouncing the database and before running our test query, we should find that now there are NO reads from the $R tables at all. In fact, in my testcase I found the system reads went away as well, and my query now completed without any physical I/O at all.
 Conclusion
 Hopefully this paper will have provided you with a better understanding of the issues around caching tables in memory, together with some diagnostic tools to help figure out where disk reads are occuring, and why. .

Sobre Alexandre Pires

ORACLE OCS Goldengate Specialist, OCE RAC 10g R2, OCP 12C, 11g, 10g , 9i e 8i - Mais de 25 anos de experiência na área de TI. Participei de projetos na G&P alocado na TOK STOK, EDINFOR alocado na TV CIDADE "NET", 3CON Alocado no PÃO DE AÇUCAR, DISCOVER alocado na VIVO, BANCO IBI e TIVIT, SPC BRASIL, UOLDIVEO alocado no CARREFOUR e atualmente na ORACLE ACS atendendo os seguintes projetos: VIVO, CLARO, TIM, CIELO, CAIXA SEGUROS, MAPFRE, PORTO SEGURO, SULAMERICA, BRADESCO SEGUROS, BANCO BRADESCO, BASA, SANTANDER, CNJ, TSE, ELETROPAULO, EDP, SKY, NATURA, ODEBRESHT, NISSEI, SICREDI, CELEPAR, TAM, TIVIT, IBM, SMILES, CELEPAR, SERPRO,OKI,BANCO PAN, etc
Esse post foi publicado em BUFFERS, INDEX TUNING, PLSQL SCRIPTS, PLSQL TUNING, SGA e marcado . Guardar link permanente.

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s