HealthCheck Exadata

HealthCheck_user_guide.txt
Version 1.2.2 – Release Date 05/16/2011

ALWAYS review My Oracle Support note 1070954.1 and download the latest scripting for this
HealthCheck before executing any of the scripting.

Current HealthCheck Version
—————————
As of 05/16/2011, the current HealthCheck version is 1.2.2, reflected in the filename ‘HealthCheck_1_2_2_tar_gz’.
Version numbers are also contained in the headers of the uncompressed files.

Target Oracle Database Machine Impact
————————————-
The Oracle Database Machine HealthCheck consists of read only commands.  Other than the writing of the output files
and an empty locking file to help guard against more than one HealthCheck executing at a time, the impact to the
target machine is minimal.

The operating system, hardware, and firmware checks running all options take about 4 minutes on and HP quarter rack
and about 3 minutes 30 seconds.

The asm checks take less than 10 seconds.

The manual InfiniBand switch commands execution time varies with typing skill.

Note Well:
==========
Execute only one HealthCheck at a time in a database machine.

For example, if you have a full rack configured with one cluster, then run one HealthCheck
on the first database server in the cluster.

For another example, if you have a full rack divided into two clusters, run one HealthCheck on the first database
server in the first cluster, wait for it to complete, then run one HealthCheck on the first database server
in the second cluster.

Environment and Configuration Settings
————————————–
This HealthCheck assumes a deployment according to standard Oracle Database Machine naming and location conventions.
This section details some of those conventions, and other information regarding the command syntax and structures
in this HealthCheck.

DCLI Group Files
—————-
This HealthCheck requires the “root” userid to have the following dcli group files present in its home directory:

dbs_group, contains the Ethernet host names of the Oracle Database servers.
cell_group, contains the Ethernet host names of the Exadata Cells.
all_group, contains the Ethernet host names of both the Exadata Cells and the Oracle Database servers.
all_ib_group, contains the private InfiniBand host names of both the Exadata Cells and the Oracle Database servers.
cell_ib_group, contains the private InfiniBand host names of the Exadata Cells.

Linux Convention
—————-
The Linux ~/ convention is used to indicate the home directory of the current user.

Parameters
———-
This HealthCheck uses the following input parameters to simplify some of the command structure:

run_os_commands_as_root.sh:
—————————
-a <the location of the HealthCheck source files. eg: /home/oracle/HealthCheck>
-b <the location of the CRS home. eg: /u01/app/11.1.0/crs>
-c <the location of the ASM home. eg: /u01/app/oracle/product/11.1.0/asm_1>
-d <the location of the DB home. eg: /u01/app/oracle/product/11.1.0/db_1>
-e <>
-f <>
-g <>

Note: for 11.2.x deployments, enter the grid home location for both parameters -b and -c.

Note: the -e parameter takes no arguments.  When -e is added to the parameters, the scripting
adds -s -q to each dcli command to attempt to suppress SSH login banners.

Note: the -f parameter takes no arguments.

HealthCheck by default on HP hardware does not stop the MS Server on the storage cells in order
to run CheckHWnFWProfile or execute hpaducli commands for
“Determining SAS Backplane Version on storage cells:” and “Verifying disk health on storage cells:”.

If you wish to execute those HealthCheck sections and CheckHWnFWProfile on the storage cells,
it is recommended that you:

1) Schedule an outage.
2) Shutdown the entire Oracle stack running in the cluster.
3) Re-execute the HealthCheck with the “-f” input parameter.
4) Restart the entire Oracle stack running in the cluster.

Note: the -g parameter takes no arguments.

HealthCheck by default does not execute either CheckHWnFWProfile or CheckSWProfile.sh on the
database servers. They should only be executed immediately after the first build of the database
machine or a fresh image. Specify the “-g” parameter to execute CheckHWnFWProfile and
CheckSWProfile.sh on the database servers.

run_asm_commands_as_oracle.sh:
——————————
-a <the location of the HealthCheck source files. eg: /home/oracle/HealthCheck>

-b <the asm instance SID. eg: +ASM1>

Path
—-
This HealthCheck requires the root user and the oracle user to include
/usr/local/bin:/usr/bin:/usr/sbin:/bin in the $PATH environment variable.

Command Execution Location
————————–
The Automatic Storage Management HealthCheck script is recommended to be executed on the database server
where the +ASM1 instance exists, typically the first node in the target cluster.  Unless stated otherwise,
the HealthCheck assumes all commands are executed on the database node where the +ASM1 instance exists.

Command Line Prompts
——————–
A command run by the “root” userid is indicated by a “#” prompt, or may be explicitly stated in the directions.

A command run by the “oracle” userid is indicated by a “$” prompt, or may be explicitly stated in the directions.

Note: When constructing commands, do not copy the “#” or “$” used in these examples.

Secure Shell Equivalence
————————
This HealthCheck requires that there is Secure Shell (SSH) equivalence configured for the “root” userid
between the first database server and all other database servers, and between the first database server
and the storage servers.  The scripts will not work without the required SSH equivalence.

Pre-execution Steps
——————-
1) Verify ssh equivalence for the root user:
For a standard Oracle Database Machine, the required equivalence was created during the onsite deployment
and may have been left in place, if requested. If you are uncertain of your configuration, you can execute
the following two commands:
# dcli -g ~/all_group -l root hostname
# dcli -g ~/all_ib_group -l root hostname

If you are challenged to authenticate, ctrl-c out of the commands and establish the required SSH connectivity
as follows:

1.1) Create a private/public key file using the following command:
# ssh-keygen -t dsa

The output will be similar to the following:

Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
09:79:e9:78:14:bf:29:40:66:ec:94:25:a9:7d:93:3e

The passphrase has been left empty so that when an SSH connection is established, a passphrase is not required.

Note:  Linux provides two methods of encryption, RSA and DSA.  This Healthcheck uses DSA encryption.

1.2) Use the dcli utility to distribute the public key in a reliable manner using the following commands:
# dcli -g ~/all_group -l root -k -s ‘-o StrictHostKeyChecking=no’
# dcli -g ~/all_ib_group -l root -k -s ‘-o StrictHostKeyChecking=no’

The output will be similar to the following:

dataab01s: Warning: Permanently added ‘dataab01,152.68.120.251’ (RSA) to the list of known hosts.
dataab01s: ssh key added
dadtaab02s: Warning: Permanently added ‘dataab02,152.68.120.252’ (RSA) to the list of known hosts.
dataab02s: ssh key added
dadtaab03s: Warning: Permanently added ‘dataab03,152.68.120.253’ (RSA) to the list of known hosts.
dataab03s: ssh key added

Note:  If you are prompted for a password on the remote server, enter the appropriate value.

1.3) Test SSH equivalence using the following commands:
# dcli -g ~/all_group -l root hostname
# dcli -g ~/all_ib_group -l root hostname

The output will be similar to the following:

dataas01s-priv: dataas01s.us.oracle.com
dataas02s-priv: dataas02s.us.oracle.com
dataas03s-priv: dataas03s.us.oracle.com

2) Download and uncompress the “HealthCheck_bundle.zip” file from MOS note 1070954.1 to your desktop or
laptop computer.  This is because there are example files and a spreadsheet that can be viewed there.

2.1) Transfer the HealthCheck_1_2_2_tar_gz file to the /home/oracle directory on the first database server
in the cluster where this HealthCheck is to be executed.

Note: Do not decompress the files onto a Windows environment, read the files in an editor, and then transfer
the decompressed files to your Linux environment.  This activity may insert stray characters into the scripts.
It is strongly recommended to decompress the tar file only in your Linux environment and read the files in vi,
if so desired.

Note: When you uncompress the files to your desktop / laptop, check to see if the file receives an extra “.gz”
file extension (e.g: HealthCheck_1_2_2_tar_gz.gz).  If it did, then rename the file to remove the extra “.gz”
file extension.

2.2) Extract the files using the following command:

Note: These instructions assume the HealthCheck is being installed for the first time. If you have been running
HealthCheck on your system, it is recommended that you save both the prior scripting and the output files before
you install a newer version of HealthCheck. If you do not, the older files will be overwritten. For example,
assuming that you wish to retain the prior HealthCheck installation online for reference, one method to preserve
the prior installation is to use the mv command in the /home/oracle directory to rename the existing installation.
Eg: mv HealthCheck HealthCheck_03182010.

$ tar -zpxvf HealthCheck_1_2_2_tar_gz

2.3) Verify file creation using this command in the /home/oracle directory:
$ ls -ltr | grep HealthCheck

The output should look similar to this (date and timestamp will vary):

drwxr-xr-x 3 oracle oinstall 4096 Mar 11 12:05 HealthCheck

HealthCheck is the base directory that contains the command files and the output_files subdirectory.

Note:  The operating system and Automatic Storage Management scripts, as well as the Voltaire commands screen
capture write their output to the /home/oracle/HealthCheck/output_files directory with a date and timestamp
embedded in the file names so that an output history can be easily maintained.

Note: Files written to the /home/oracle/HealthCheck/output_files directory by the root user are owned by the root user.  If file cleanup is desired, the root user will have to perform the actions.

Operating System Healthcheck
—————————-
Execute the following command as the root user from the /home/oracle/HealthCheck directory on the first database
server in the cluster from which this HealthCheck is being driven:

Note: The following command is a sample, and you must substitute the correct parameter values as discussed earlier
in the “Parameters” section.  If you try this verbatim on your system, it may not work!

# ./run_os_commands_as_root.sh -a /home/oracle/HealthCheck -b /u01/app/11.2.0/grid -c /u01/app/11.2.0/grid -d /u01/app/oracle/product/11.2.0/dbhome_1

The output will scroll by on your screen as the scripting executes, and an output file will be written to the /home/oracle/HealthCheck/output_files directory.

Automatic Storage Management Healthcheck
—————————————-
Execute the following command as the oracle user from the /home/oracle/HealthCheck directory on the driving node
in the cluster from which this HealthCheck is being driven:

$ ./run_asm_commands_as_oracle.sh -a /home/oracle/HealthCheck -b +ASM1

The output will scroll by on your screen as the scripting executes, and an output file will be written to the /home/oracle/HealthCheck/output_files directory.

InfiniBand Switch Healthcheck
—————————–
It is not possible to script the execution of the InfiniBand switch commands.

To execute the InfiniBand commands and capture the output, perform the following steps on the driving node in the cluster from which this HealthCheck is being driven:

1) create a log file of your terminal activity with a date and timestamp in its name in the
/home/oracle/HealthCheck/output_files directory:

# script -a -q /home/oracle/HealthCheck/output_files/IB_switch_commands_`date +%m%d%y_%H%M%S`.lst

2) For each of the managed switches, connect by ip address and perform the following commands
(output deleted here for clarity):

For HP Oracle Database Machine:
# ssh 10.204.72.90 -l enable
enable@10.204.72.90’s password: <default should be voltaire>
ISR9024D-36d6# version show
ISR9024D-36d6# config
ISR9024D-36d6(config)# sm
ISR9024D-36d6(config-sm)# sm-info show
ISR9024D-36d6(config-sm)# exit
ISR9024D-36d6(config)# ntp
ISR9024D-36d6(config-ntp)# ntp show
ISR9024D-36d6(config-ntp)# clock show
ISR9024D-36d6(config-ntp)# exit
ISR9024D-36d6(config)# exit
ISR9024D-36d6# exit
ISR9024D-36d6> exit
Connection to 10.204.72.90 closed.

3) When you have processed all of the managed switches, stop logging the terminal output and close
the output file using this command:

# exit

Output File Analysis
——————–
In the output files, after the output from each individual command, there will be one of two types
of expected result provided:

Direct text
A link to another My Oracle Support note

The direct text is used when the expected output is fixed or simple, and the link is used if the
expected output interpretation is complex or varies over time (e.g. firmware versions for
different Exadata Storage Cell Software versions).

If you discover variances between the current values reported for your Oracle Database Machine and
the expected values detailed in the output or referenced files, and are uncertain of how to proceed,
contact Oracle Support for assistance.

The file “sample_output_files.zip” contains a sampling of outupt files for the operating
system scripts and the Automatic Storage Management scripts.

The file “HealthCheck_command_table.xls” is a spreadsheet listing the included checks.

History
V. Wagman 05/16/11 incorporated all directions here instead of the
MOS note.
V. Wagman 05/12/2011 Set verstion to 1.2.2 in file header section

Sobre Alexandre Pires

ORACLE OCS Goldengate Specialist, OCE RAC 10g R2, OCP 12C, 11g, 10g , 9i e 8i - Mais de 25 anos de experiência na área de TI. Participei de projetos na G&P alocado na TOK STOK, EDINFOR alocado na TV CIDADE "NET", 3CON Alocado no PÃO DE AÇUCAR, DISCOVER alocado na VIVO, BANCO IBI e TIVIT, SPC BRASIL, UOLDIVEO alocado no CARREFOUR e atualmente na ORACLE ACS atendendo os seguintes projetos: VIVO, CLARO, TIM, CIELO, CAIXA SEGUROS, MAPFRE, PORTO SEGURO, SULAMERICA, BRADESCO SEGUROS, BANCO BRADESCO, BASA, SANTANDER, CNJ, TSE, ELETROPAULO, EDP, SKY, NATURA, ODEBRESHT, NISSEI, SICREDI, CELEPAR, TAM, TIVIT, IBM, SMILES, CELEPAR, SERPRO,OKI,BANCO PAN, etc
Esse post foi publicado em EXADATA. Bookmark o link permanente.

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s