Cache Fusion
I mentioned above Cache Fusion in my GRD section, here I go into great detail on how it works, I will also provide a number of walk through examples on my RAC system.
Cache Fusion uses the most efficient communications as possible to limit the amount of traffic used on the interconnect, now you don't need this level of detail to administer a RAC environment but it sure helps to understand how RAC works when trying to diagnose problems. RAC appears to have one large buffer but this is not the case, in reality the buffer caches of each node remain separate, data blocks are shared through distributed locking and messagingoperations. RAC copies data blocks across the interconnect to other instances as it is more efficient than reading the disk, yes memory and networking together are faster than disk I/O.
The transfer of a data block from instances buffer cache to another instances buffer cache is know as a ping. As mentioned already when an instance requires a data block it sends the request to the lock master to obtain a lock in the desired mode, this process is known as blocking asynchronous trap (BAST). When an instance receives a BAST it downgrades the lock ASAP, however it might have to write the corresponding block to disk, this operation is known as disk ping or hard ping. Disk pings have been reduce in the later versions of RAC, thus relaying on block transfers more, however there will always be a small amount of disk pinging. In the newer versions of RAC when a BAST is received sending the block or downgrading the lock may be deferred by tens of milliseconds, this extra time allows the holding instance to complete an active transaction and mark the block header appropriately, this will eliminate any need for the receiving instance to check the status of the transaction immediately after receiving/reading a block. Checking the status of a transaction is an expensive operation that may require access (and pinging) to the related undo segment header and undo data blocks as well. The parameter _gc_defer_time can be used to define the duration by which an instance deferred downgrading a lock.
In the GRD section I mentioned Past Images (PIs), basically they are copies of data blocks in the local buffer cache of an instance. When an instance sends a block it has recently modified to another instance, it preserves a copy of that block, marking as a PI. The PI is kept until that block is written to disk by the current owner of the block. When the block is written to disk and is known to have a global role, indicating the presents of PIs in other instances buffer caches, GCS informs the instance holding the PIs to discard the PIs. When a checkpoint is required it informs GCS of the write requirement, GCS is responsible for finding the most current block image and informing the instance holding that image to perform a block write. GCS then informs all holders of the global resource that they can release the buffers holding the PI copies of the block, allowing the global resource to be released. You can view the past image blocks present in the fixed table X$BH
PIs | select state, count(state) from X$BH group by state; Note: the state column with 8 is the past images. |
Cache Fusion I is also know as consistent read server and was introduced in Oracle 8.1.5, it keeps a list of recent transactions that have changed a block.the original data contained in the block is preserved in the undo segment, which can be used to provide consistent read versions of the block.
In a single instance the following happens when reading a block
In an RAC environment if the process of reading the block is on an instance other than the one that modified the block, the reader will have to read the following blocks from the disk
Before these blocks can be read the instance modifying the block will have to write those's blocks to disk, resulting in 6 I/O operations. In RAC the instance can construct a CR copy by hopefully using the above blocks that are still in memory and then sending the CR over the interconnect thus reducing 6 I/O operations.
As from Oracle 8 introduced a new background process called the Block Server Process makes the CR fabrication at the holders cache and ships the CR version of the block across the interconnect, the sequence is detailed in the table below
![]() |
|
While making a CR copy, the holding instance may refuse to do so if
Read/Write contention was addressed in cache fusion I, cache fusion II addresses the write/write contention
![]() |
|
A quick recap of GCS, a GCS resource can be local or global, if it is local it can be acted upon without consulting other instances, if it is global it cannot be acted upon without consulting or informing remote instances. GCS is used as a messaging agent to coordinate manipulation of a global resource. By default all resources are in NULL mode (remember null mode is used to convert from one type to another (share or exclusive)).
The table below denotes the different states of a resource
Mode/Role | Local |
Global |
Null (N) | NL |
NG |
Shared (S) | SL |
SG |
Exclusive (X) | XL |
XG |
States |
||
SL | it can serve a copy of the block to other instances and it can read the block from disk, since the block is not modified there is no need to write to disk | |
XL | it has sole ownership and interest in that resource, it has exclusive right to modify the block, all changes to the blocks are in the local buffer cache and it can write the block to the disk. If another instance wants the block it can to come via the GCS | |
NL | used to protect consistent read block, if an instance wants it in X mode, the current instance will send the block to the requesting instance and downgrades its role to NL | |
SG | a block is present in one or more instances, an instance can read the read from disk and serve it to other instances | |
XG | a block can have one or more PIs, the instance with the XG role has the latest copy of the block and is the most likely candidate to write the block to the disk. GCS can ask the instance to write the block and serve it to other instances | |
NG | after discarding PIs when instructed to by GCS, the block is kept in the buffer cache with NG role, this serves only as the CR copy of the block. |
Below are a number of common scenarios to help understand the following
We will assume the following
for example a code of SL0 means a global shared lock with no past images (PIs)
Reading a block from disk |
|
![]() |
instance C want to read the block it will request a lock in share mode from the master instance
|
Reading a block from the cache |
|
![]() |
Carrying on from the above example, Instance B wants to read the same block that is cached in instance C buffer.
|
Getting a (Cached) clean block for update |
|
![]() |
Carrying on from the above example, instance A wants to modify the same block that is already cached in instance B and C (block 987654)
|
Getting a (Cached) modified block for update and commit |
|
![]() |
Carrying on from the above example, instance C now wants to modify the block, if it tries to modify the same row it will have to wait until instance A either commits or rolls back. However in this case instance C wants to modify a different row in the same block.
|
Commit the previously modified block and select the data |
|
![]() |
Carrying on from the above example, instance A now issues a commit to release the row level locks held by the transaction and flush the redo information to the redologs
|
Write the dirty buffers to disk due to a checkpoint |
|
![]() |
Carrying on from the above example, instance B writes the dirty blocks from the buffer cache due to a checkpoint (this is were it gets interesting and very clever)
|
Master instance crashes |
|
![]() |
Carrying on from the above example
|
Select the rows from Instance A |
|
![]() |
Carrying on from the above example, now instance A queries the rows from that table to get the most recent data
|
The above sequence of events can be seen in the table below
Example |
Operation on Node |
Buffer Status |
||||||
A |
B |
C |
D |
A |
B |
C |
D |
|
1 |
read block from disk |
SCUR |
||||||
2 |
read the block from cache |
CR |
SCUR |
|||||
3 |
update the block |
XCUR |
CR |
CR |
||||
4 |
update the same block |
PI |
CR |
XCUR |
||||
5 |
commit the changes |
PI |
CR |
XCUR |
||||
6 |
trigger checkpoint |
CR |
XCUR |
|||||
7 |
instance crash |
|||||||
8 |
select the rows |
CR |
XCUR |
Previous | Menu | Next |