Como fazer uma verificação se meu cluster RAC 11GR2 não inicializar?

We have found one of our Cluster 7 nodes environment down and we can’t bring it up again.
  1. -bash-3.2$ crsctl status res -t
  2. CRS-4535: Cannot communicate with Cluster Ready Services
  3. CRS-4000: Command Status failed, or completed with errors.
  4. -bash-3.2$
  1. [root@srvdd01 ~]# cd /opt/11.2.0/grid/bin/
  2. [root@srvdd01 bin]# ./crsctl start cluster
  3. CRS-5702: Resource ‘ora.evmd’ is already running on ‘srvdd01’
  4. CRS-2800: Cannot start resource ‘ora.asm’ as it is already in the INTERMEDIATE state on server ‘srvdd01’
  5. CRS-4000: Command Start failed, or completed with errors.
  6. [root@srvdd01 bin]#

From the above output, it looks like there is an issue with ASM.
This means that we need to troubleshoot the clusterware log files and not the database log files. Ok let ‘s check all the log files located under $GRID_HOME/log.

Under the $GRID_HOME/log//client directory, I started looking into the ocr check log files:

  1. /opt/11.2.0/grid/log/srvdd01/client/ocrcheck_24461.log file:
  2. Oracle Database 11g Clusterware Release – Production Copyright 1996, 2010 Oracle. All rights reserved.
  3. 2011-10-11 12:45:15.126: [OCRCHECK][258170240]ocrcheck starts…
  4. 2011-10-11 12:45:15.246: [ OCRASM][258170240]proprasmo: kgfoCheckMount return [6]. Cannot proceed with dirty open.
  5. 2011-10-11 12:45:15.246: [ OCRASM][258170240]proprasmo: Error in open/create file in dg [OCRVOTE]
  6. [ OCRASM][258170240]SLOS : SLOS: cat=6, opn=kgfo, dep=0, loc=kgfoCkMt03
  7. 2011-10-11 12:45:15.246: [ OCRASM][258170240]ASM Error Stack :
  8. 2011-10-11 12:45:15.286: [ OCRASM][258170240]proprasmo: kgfoCheckMount returned [6]
  9. 2011-10-11 12:45:15.286: [ OCRASM][258170240]proprasmo: The ASM disk group OCRVOTE is not found or not mounted
  10. 2011-10-11 12:45:15.286: [ OCRRAW][258170240]proprioo: Failed to open [+OCRVOTE]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
  11. 2011-10-11 12:45:15.286: [ OCRRAW][258170240]proprioo: No OCR/OLR devices are usable
  12. 2011-10-11 12:45:15.287: [ OCRASM][258170240]proprasmcl: asmhandle is NULL
  13. 2011-10-11 12:45:15.287: [ OCRRAW][258170240]proprinit: Could not open raw device
  14. 2011-10-11 12:45:15.287: [ OCRASM][258170240]proprasmcl: asmhandle is NULL
  15. 2011-10-11 12:45:15.287: [ default][258170240]a_init:7!: Backend init unsuccessful : [26]
  16. 2011-10-11 12:45:15.287: [OCRCHECK][258170240]initreboot: Failed to initialize OCR in REBOOT level. Retval:[26] Error:[PROC-26: Error while accessing the physical storage
  17. ]
  18. [OCRCHECK][258170240]initreboot: Attempting to initialize OCR in DEFAULT level and update a key so that vote information is updated.
  19. 2011-10-11 12:45:15.288: [ OCRMSG][258170240]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
  20. 2011-10-11 12:45:15.288: [ OCRMSG][258170240]GIPC error [29] msg [gipcretConnectionRefused]
  21. 2011-10-11 12:45:15.288: [ OCRMSG][258170240]prom_connect: error while waiting for connection complete [24]
  22. 2011-10-11 12:45:15.289: [OCRCHECK][258170240]initreboot: Failed to initialize OCR in DEFAULT level. Retval:[32] Error:[PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]]
  23. 2011-10-11 12:45:15.289: [OCRCHECK][258170240]Failed to access OCR repository. Retval [26] Error [PROC-26: Error while accessing the physical storage
  24. ]
  25. 2011-10-11 12:45:15.289: [OCRCHECK][258170240]Failed to initialize ocrchek2
  26. 2011-10-11 12:45:15.289: [OCRCHECK][258170240]Exiting [status=failed]…

From the output above, the problem is in the storage. The easy way to check is to connect to ASM instance and check the diskgroups:

As grid infrastructure user owner;

  1. sqlplus / as sysasm
  4. —————————— ———–
  8. SQL> exit
  9. Disconnected from Oracle Database 11g Enterprise Edition Release – 64bit Production
  10. With the Real Application Clusters and Automatic Storage Management options

ok, OCRVOTE disk is dismounted, let ‘s try to mount it manually using asmcmd command line tool;

  1. -bash-3.2$ asmcmd mount OCRVOTE
  2. ORA-15032: not all alterations performed
  3. ORA-15040: diskgroup is incomplete
  4. ORA-15042: ASM disk “0” is missing from group number “1” (DBD ERROR: OCIStmtExecute)
  5. -bash-3.2$

ok, it looks like the disk number “0″ is missing, let ‘s verify it;

  1. -bash-3.2$ crsctl query css votedisk
  2. ## STATE File Universal Id File Name Disk group
  3. — —– —————– ——— ———
  4. 1. OFFLINE 9b8dfd5843e44ff7bf80a3d48b47adb5 () []
  5. 2. ONLINE a43b6f1d945f4feebffe1945f292b5b1 (/dev/mapper/asm_ocrvote02_part1p1) [OCRVOTE]
  6. 3. ONLINE 5cc7668e23d24f24bf21bce3662e7c3d (/dev/mapper/asm_ocrvote03_part1p1) [OCRVOTE]
  7. Located 3 voting disk(s).
  8. -bash-3.2$

crsctl query css votedisk command displays the voting disks used by Cluster Synchronization Services, the status of the voting disks, and the location of the disks.

ok , the missing device needs to be brought up:

  1. 1. OFFLINE 9b8dfd5843e44ff7bf80a3d48b47adb5 () []

OCRVOTE needs to be mounted to start crsd. OCRVOTE is the diskgroup that contains the OCR and Voting disk information.
The OCR is the repository that contains the cluster node list, the services, database instance, instance and node mapping information. Oracle Clusterware uses the voting disk to verify cluster node membership and status.

Voting disks and OCR must be placed in a shared storage. In my case, OCR and Voting disks are stored in the OCRVOTE ASM diskgroup. with normal redundancy (3 copies).

To fix the issue, the storage administrator has moved the missing disk ocr_vote01 volume from LUN0 to LUN5. After, reboot the 7 nodes, the cluster is up again.

Scridb filter



Sobre Alexandre Pires

ORACLE OCS Goldengate Specialist, OCE RAC 10g R2, OCP 12C, 11g, 10g , 9i e 8i - Mais de 25 anos de experiência na área de TI. Participei de projetos na G&P alocado na TOK STOK, EDINFOR alocado na TV CIDADE "NET", 3CON Alocado no PÃO DE AÇUCAR, DISCOVER alocado na VIVO, BANCO IBI e TIVIT, SPC BRASIL, UOLDIVEO alocado no CARREFOUR e atualmente na ORACLE ACS atendendo os seguintes projetos: VIVO, CLARO, TIM, CIELO, CAIXA SEGUROS, MAPFRE, PORTO SEGURO, SULAMERICA, BRADESCO SEGUROS, BANCO BRADESCO, BASA, SANTANDER, CNJ, TSE, ELETROPAULO, EDP, SKY, NATURA, ODEBRESHT, NISSEI, SICREDI, CELEPAR, TAM, TIVIT, IBM, SMILES, CELEPAR, SERPRO,OKI,BANCO PAN, etc
Esse post foi publicado em ORACLE 11gR2, RAC. Bookmark o link permanente.

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do

Você está comentando utilizando sua conta Sair /  Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair /  Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair /  Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair /  Alterar )

Conectando a %s