Friday, March 6, 2015

Oracle RAC: Node evictions

Description

Oracle RAC Cluster integrity must be maintained at all time. The eviction of a particular node can be triggered by the RAC cluster master daemon following the detection of certain failure conditions.

Sample Error

Java Application Servers connecting directly to the affected Oracle RAC node can get the following errors following a Node Eviction by Oracle RAC:

Ex: Oracle RAC installed on Linux OS and accessed through Oracle WebLogic 11g

<Warning> <JDBC> <BEA-001129> <Received exception while creating connection for pool 

"JDBCPool":

  • ORA-27101: shared memory realm does not exist
  • ORA-01034: ORACLE not available
  • Linux-x86_64 Error: 2: No such file or directory

Possible causes

  • Hardware failure (disk, defective RAM...) of the server hosting the node.
  • Various scenarios triggering heartbeat failures such as Oracle DB node hang, OS kernel locks/bugs triggering kernel "panics".
  • A wrongly enabled USB network device can also lead to frequent and intermittent Oracle RCA node evictions. This problem can arise since when the IP is automatically started leading to  disrupted HAIP (Highly Available Internet Protocol). The solution normally involved disabling the USB device in order to prevent this problem upfront.

 References and Case Studies