Why Existing Databases (RAC) are So Breakable

wmf · on Dec 1, 2009

This article is a very poor interpretation of the facts. RAID works (as long as it isn't RAID-5). High-end disks may have the same failure rate as low-end disks, but the higher performance and lower capacity gives a much lower MTTR, ultimately improving reliability.

shrike · on Dec 1, 2009

The author makes a huge leap,

"The core, implicit assumption behind Oracle RAC, IBM DB2, and many other database clustering solutions, is that failure can be avoided by purchasing high-end disk storage and using expensive hardware (fiber optics, etc). As can be seen from the research I mentioned earlier, this core assumption doesn’t correlate with the failure statistics. Hence I argue that the database clustering model is inherently breakable."

Taking the fact that drives fail and extrapolating that to mean that large drive arrays are "inherently breakable" is nonsense. Using an EMC Symmetrix as an example I am very familiar with, a large properly implemented storage array is far from breakable. Inside that box you buy is a whole collection of n+2, n+3 and/or n+4 components all built specifically to never, ever break. The only problem with this type of solution is it is priced in the multi-millions. Each.

wmf · on Dec 1, 2009

Jeff Darcy tears into this one: http://pl.atyp.us/wordpress/?p=2555

"Letting fear of conventional storage drive the creation of 'solutions' that are just as complex without the benefits of being as general, as well tested, or as well documented is a mistake. (Open but undocumented and untested can be worse than closed, BTW, if the cost of reverse-engineering and fixing the implementation is greater than the cost of licensing would have been.) Such attempts generally lose even when considered alone, and even more so when the effects of fragmentation and incompatibility are considered."

ponnap · on Dec 2, 2009

"Most existing database clustering solutions rely on a shared disk storage to maintain their cluster state, as can be seen in the diagram below."

In the case of RAC, although state is written to a file called 'voting disk', the actual state of the cluster is communicated through a high speed inter connect which is redundant. Only if the network connection(s) are down, state is exchanged through the 'voting disk'.

Femur · on Dec 1, 2009

One feature that this article overlooks is Oracle's Automatic Storage Management feature (ASM) which is very commonly used with RAC. This feature is basically a software level RAID alternative in which you can define striping and mirroring independent of what the hardware reflects.

This article also does not really reflect what is commonly done with SAN implementations.

ponnap · on Dec 2, 2009

Actually most production databases tend not to use ASM's High or Normal redundancy because the software level redundancy provided by ASM tends to perform very badly when compared to a storage controller driven RAID. External redundancy is usually chosen. ASM is primarily used to easily manage the disk groups when the volumes in the disk group tend to run out of space.

ecq · on Dec 1, 2009

RAC + Dataguard if you use Oracle