<?xml version="1.0" encoding="utf-8" ?>

<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   >
<channel>
    <title>Coffee|Code : Dan Scott, Caffeinated Librarian Geek - Comments</title>
    <link>http://coffeecode.net/</link>
    <description>Coffee|Code : Dan Scott, Caffeinated Librarian Geek - Many ideas crammed into bits...</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 1.3 - http://www.s9y.org/</generator>
    <pubDate>Sun, 18 May 2008 07:38:42 GMT</pubDate>

    <image>
        <url>http://coffeecode.net/templates/default/img/s9y_banner_small.png</url>
        <title>RSS: Coffee|Code : Dan Scott, Caffeinated Librarian Geek - Comments - Coffee|Code : Dan Scott, Caffeinated Librarian Geek - Many ideas crammed into bits...</title>
        <link>http://coffeecode.net/</link>
        <width>100</width>
        <height>21</height>
    </image>

<item>
    <title>Colleen: Two! 2! Too! Tu! Tout!</title>
    <link>http://coffeecode.net/archives/157-Two!-2!-Too!-Tu!-Tout!.html#c1089</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/157-Two!-2!-Too!-Tu!-Tout!.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=157</wfw:comment>

    

    <author>dan@coffeecode.net (Colleen)</author>
    <content:encoded>
    Yea! Yea!  Yea!  I am so happy to see photos of your beautiful little girl.  Look at her so cute and turning TWO!  The cat cake is spectacular. 
    </content:encoded>

    <pubDate>Mon, 12 May 2008 12:24:31 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/157-guid.html#c1089</guid>
    
</item>
<item>
    <title>Dan Scott: Weeding 2.0</title>
    <link>http://coffeecode.net/archives/158-Weeding-2.0.html#c1088</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/158-Weeding-2.0.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=158</wfw:comment>

    

    <author>dan@coffeecode.net (Dan Scott)</author>
    <content:encoded>
    Ah yes - we (well to be honest mostly Lynn) have been avid users of that device for a couple of years now - and it certainly falls into the &quot;2.0&quot; category, albeit of the more literal sort of weeding activity! We&#039;re mostly successful in keeping ahead of the weeds on our lawn because the ground seems to be toxic enough that very little (including grass) willingly grows on it. I think the previous owners forced growth by heavy dosages of fertilizer - we&#039;re taking the much slower nitrogenification method of encouraging clover to take over. In the mean time, of course, we&#039;ll be the scandalous talk of the neighbourhood. 
    </content:encoded>

    <pubDate>Mon, 12 May 2008 09:05:09 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/158-guid.html#c1088</guid>
    
</item>
<item>
    <title>Mike Z: Weeding 2.0</title>
    <link>http://coffeecode.net/archives/158-Weeding-2.0.html#c1087</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/158-Weeding-2.0.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=158</wfw:comment>

    

    <author>dan@coffeecode.net (Mike Z)</author>
    <content:encoded>
    Silly me, I thought it might be about some outdoor activity where one pulls weeds from the lawn. (I&#039;m having some success with this device [see &quot;Homepage&quot; link] on dandelions and thistles - although it would be faster if I wasn&#039;t keeping Luca from running out of the backyard, or encouraging Matteo to NOT push his brother down the slide or fall off the top of the playset.)

Might have something to do with the time of day (night)... 
    </content:encoded>

    <pubDate>Mon, 12 May 2008 00:45:18 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/158-guid.html#c1087</guid>
    
</item>
<item>
    <title>Mita: Weeding 2.0</title>
    <link>http://coffeecode.net/archives/158-Weeding-2.0.html#c1086</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/158-Weeding-2.0.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=158</wfw:comment>

    

    <author>dan@coffeecode.net (Mita)</author>
    <content:encoded>
    Hey Dan,

Just a quick note of support and encouragement to nurture your idea! I also think a &quot;use dashboard&quot; would be mighty handy tool. 

Its curious that academic libraries have developed all sorts of standardized workflows around the acquisition of books but the when and how weeding is done tends to be left up to the individual collection librarian. 
    </content:encoded>

    <pubDate>Sun, 11 May 2008 20:50:21 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/158-guid.html#c1086</guid>
    
</item>
<item>
    <title>Lissa: Weeding 2.0</title>
    <link>http://coffeecode.net/archives/158-Weeding-2.0.html#c1085</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/158-Weeding-2.0.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=158</wfw:comment>

    

    <author>dan@coffeecode.net (Lissa)</author>
    <content:encoded>
    Makes sense. Of course, as a denizen of the PL world, I&#039;m picturing the ability to specify that all members of the group &quot;travel guide&quot; (possibly could be set by a combination of MARC tags, locations, something) are kept for 3 years, so when the new copy comes, put a hold on the oldest and route it to the pages.

FRBR would help, maybe, but I&#039;m not holding my breath on that one. 

In the PL world, we&#039;d need number of circs and last circ date, as well as last copy.  

I&#039;m not a fan of faceted searching, but, in theory, you could grab a few virtual shelves of books, then use facets to create different kinds of subsets, which you can either send to various purgatories.

All this assumes the catalog reflects reality fairly well (not a given, and a distant fantasy where I work), so there is still going to have to be some stack time, I fear.

Interesting idea, Dan. 
    </content:encoded>

    <pubDate>Sun, 11 May 2008 20:38:30 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/158-guid.html#c1085</guid>
    
</item>
<item>
    <title>Dan Scott: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1083</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (Dan Scott)</author>
    <content:encoded>
    Hi Gord:

Well, apples to apples and all that. Without an independent benchmark test suite to run, with control over the number of concurrent searches / updates / reports, or asterisks beside numbers for immediate updates vs. updates that take place overnight due to required reindexing, the number of bibs or potential users is fairly meaningless.

I think it would be great if there was an independent benchmark organization a la TPC (http://tpc.org) in the database world to set benchmark tests and monitor results in the library world, so that we could actually have meaningful quantitative comparisons. But I suspect the library world is too small of a market to justify or sustain such an organization.

So my comments were thinking more of the public library consortium world (dozens or hundreds of concurrent staff client sessions updating the data while hundreds or thousands of simultaneous user search sessions were going on, with various reports running in the background), rather than the academic library world (maybe a dozen concurrent staff client sessions and a few dozen concurrent searches, with the odd report running occasionally). 

And of course, even that is guess work and needs to be benchmarked - one of the goals we have as part of Project Conifer is to run realistic stress testing scenarios with representative workloads to get a better idea of what our real hardware needs will be. 
    </content:encoded>

    <pubDate>Fri, 25 Apr 2008 15:36:11 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1083</guid>
    
</item>
<item>
    <title>Gord Ripley: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1082</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (Gord Ripley)</author>
    <content:encoded>
    Hi Dan:

This is a non-technical musing-type question regarding your comment &quot;This scenario is fine for development and testing with a limited number of users, but if you intend to do any sort of stress testing with this server or throw it open to the public, performance will likely grind to a halt. &quot;

We run our current Unicorn system on a single HP Proliant DL380R03 (4GB RAM). With roughly 800,000 holdings and 6,500 users, performance has yet to `grind to a halt&#039;, or even slow down. 

Do Evergreen&#039;s hardware requirements so far exceed those of Unicorn?

Gord 
    </content:encoded>

    <pubDate>Thu, 24 Apr 2008 09:49:35 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1082</guid>
    
</item>
<item>
    <title>Chiya: Today's trivial tech tip: Getting RSS feeds from diaryland.com blogs</title>
    <link>http://coffeecode.net/archives/77-Todays-trivial-tech-tip-Getting-RSS-feeds-from-diaryland.com-blogs.html#c1081</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/77-Todays-trivial-tech-tip-Getting-RSS-feeds-from-diaryland.com-blogs.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=77</wfw:comment>

    

    <author>dan@coffeecode.net (Chiya)</author>
    <content:encoded>
    I was debating whether to keep my old blog (where I did everything myself but didn&#039;t have RSS or anything) or go with wordpress (where I couldn&#039;t change the way it looks) when I remembered about diaryland.

So thanks for this! 
    </content:encoded>

    <pubDate>Sun, 20 Apr 2008 17:23:50 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/77-guid.html#c1081</guid>
    
</item>
<item>
    <title>John Craig: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1077</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (John Craig)</author>
    <content:encoded>
    Mike, I haven&#039;t seen ASCII art like that in ages!

(But seriously, congrats on a very clear explanation.)

If I might suggest a moral to this story: if you want to use any kind of RAID 0 &amp;amp; 1 combination, before you buy it, check the &lt;u&gt;documentation&lt;/u&gt; of a candidate controller and see how it suggests/requires combining mirroring and striping: if it does not appear to allow the more optimal mirror-then-stripe approach, buy a different controller (or just skip the striping step altogether; see below). Some controllers, at least, will only let you choose the less optimal stripe-then-mirror approach.

You might also try leaving the striping out of the equation and just using mirroring (if you have no choice of controller [Buhler?... Dell? ... Dell?]). RAID 10&#039;s advantage over straight mirroring (for Evergreen DBs) is probably a ways down into the hard-to-measure range anyway (striping is most useful for big sequential reads [think video streaming]--which isn&#039;t characteristic of much of the info in Evergreen&#039;s DB--or Postgres, in general, for that matter). Striping is also less useful with current disk technology (better caching, lower rotational latency, and faster seeks) than it once was (see the storagereview.com article for lot of techie bits).

And Mike, Geek On, bro!

John 
    </content:encoded>

    <pubDate>Sat, 12 Apr 2008 12:04:53 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1077</guid>
    
</item>
<item>
    <title>Jason Etheridge: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1076</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (Jason Etheridge)</author>
    <content:encoded>
    Hey, wouldn&#039;t a magical SAN black box take care of everything? &lt;strong&gt;duck, run&lt;/strong&gt;

-- Jason 
    </content:encoded>

    <pubDate>Sat, 12 Apr 2008 01:20:37 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1076</guid>
    
</item>
<item>
    <title>Mike Rylander: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1075</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (Mike Rylander)</author>
    <content:encoded>
    Unfortunately, I&#039;ve been bitten by the &quot;strip first, then mirror&quot; setup.  The performance is the same when there are no drive failures, but when one drive does fail in this configuration then you lose the advantage of the mirroring for read performance.

I&#039;ll attempt some ASCII art of a 4 drive RAID set to illustrate (if it doesn&#039;t come out I can put some images up):

  O+1 (strip, then mirror)
|-----m-----|
  -s-    -s-
  |1|   |3|
  |2|   |4|
  ---    --- 
|-------------|


So, you can see that the mirroring step essentially sees 2 drives, the two stripe sets [s].  If physical drive 3 fails, as far as the mirror [m] is concerned, the entire right &quot;side&quot; of the mirror is bad, and only drives 1 and 2 stay in use.  This leads to two problems.  First, the entire right side must be rebuilt when drive 3 is replaced, and second, before drive 3 is replaced you can&#039;t lose either 1 or 2, or the entire RAID set dies.


  1+0 (mirror, then stripe)

|-----s------|
  -m-  -m-
  |1|   |3|
  |2|   |4|
  ---    --- 
|-------------|

Here, if we stripe mirrors, we can lose physical drive 3 and only the data residing on the right side of the stripe set acts in a degraded fashion with regard to read performance.  What&#039;s more, we can still lose either of physical drive 1 or 2 and not lose any data.

Again, I&#039;ve had this happen to me in the past with a Dell OEM controller (I think it was a Mylex variant, but you never can tell with Dell...).  At least I can say that I inherited that setup. &lt;img src=&quot;http://coffeecode.net/templates/default/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;

John is 100% correct that both setups will survive a single-disk failure, but there is a distinct difference in the worst-case for each, and degraded performance under 1+0 will be better than under 0+1.  That is the point that I was attempting to illuminate ... badly, it would seem. &lt;img src=&quot;http://coffeecode.net/templates/default/img/emoticons/smile.png&quot; alt=&quot;:-)&quot; style=&quot;display: inline; vertical-align: bottom;&quot; class=&quot;emoticon&quot; /&gt;  Also, 1+0 will survive certain combinations of 2 disk failure (or more, with more inner mirror sets inside the outer stripe).

As for software RAID, I don&#039;t think anyone would ever recommend that for a production system except, as John pointed out, if it is backed by hardware mirroring.  In that case it can be very useful, as expanding the RAID set by adding more mirrored sets to the stripe can be simpler using an OS-supplied storage virtualization scheme than a controller-based scheme -- think Veritas storage virtualization or Linux LVM.

Man ... I haven&#039;t geeked out over hardware like this in ages.  Thanks, Dan!

--miker 
    </content:encoded>

    <pubDate>Fri, 11 Apr 2008 21:41:00 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1075</guid>
    
</item>
<item>
    <title>John Craig: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1074</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (John Craig)</author>
    <content:encoded>
    I&#039;m sure Mike&#039;s right about RAID 5 vs. RAID 10 for particular controllers and benchmarks. I maintain that RAID 5 is essentially rendered obsolete by available disk capacities (when talking about the size of databases involved in the library world--as Mike suggests).

RE RAID 1+0 vs. 0+1, though, I want to suggest that the naming may not match the patterns as exactly as Mike implies. I&#039;ve not encountered the type of combination of stripes and mirroring that Mike refers to in which a one drive failure kills the whole array, but I have encountered different terms for the same conceptual beast among the different vendors. And also different ways of setting up the combination of striping and mirroring (which can make the terminology all that much more confusing--especially if you have in mind that one way provides fault-tolerance and one does not).

Depending upon the controller manufacturer&#039;s terminology, RAID levels 0 + 1, 1 + 0, and 10 are somewhat interchangable. And, although the way the setup actually happens suggests that it should be called one thing or the other, the end result is the same--and is fault-tolerant. (3Ware and Adaptec, for instance, use opposite setup approaches, as I recall--befuddled me no end, initially because I was concerned about just the sort of issue Mike raised with regard to fault-tolerance). It turns out that the end result is the same: striped and mirrored with fault-tolerance of a single-disk failure.

With one of these vendor&#039;s cards, you have to set up your 4 drives (the minimum for either of these approaches) as RAID 0 (2 sets of 2 striped drives each) and then mirror the one striped pair onto the other striped pair: disks 0 &amp;amp; 1 are a striped pair; 2 &amp;amp; 3 are a striped pair; and then 2 &amp;amp; 3 are set up to mirror 0 &amp;amp; 1. Most logically, IMO, this might be referred to as 0 + 1: RAID 0 [striped] followed by the addition of RAID 1 [mirrored]. With the other manufacturer, you set up 2 mirrored pairs and then stripe across the 2 mirrored pairs: 0 &amp;amp; 1 are mirrors; 2 &amp;amp; 3 are mirrors; and 2 &amp;amp; 3 stripe 0 &amp;amp; 1. Now, for naming, I&#039;d choose 1 + 0 for this. But, the thing is, even though the sequence is different between 3Ware and Adaptec, the terminology isn&#039;t consistent with what my brain thinks it should be: so they may call it the same thing even though the levels get applied in a different order. (Is the terminology a FIFO: first applied level listed first; or a FILO: last applied level first? And if you have that straight, doesn&#039;t 
it seem that you&#039;d attend to which order you apply the levels? Well, apparently not.)

The important point is that whatever your RAID &lt;u&gt;controller&lt;/u&gt; manufacturer calls this RAID setup combining striping and mirroring (which is generally as good as you can get for a DB, in terms of general-purpose performance--the advantage of the striping being relatively minimal in most cases), you do not lose the whole array with a 1-drive failure. Each drive in such a setup contains a copy of the stripes for half the array. Failure of one drive means you have to rely only on the other half of the mirror for that set of stripes (until your hot spare comes on line--hot spares are generally worth the cost, I think). So, regardless of whether you apply the stripes and then mirror, or the other way around (which would possibly imply the different naming mentioned by Mike), you do have fault tolerance for single-drive losses.

I&#039;m not disagreeing with Mike that there are non-fault-tolerant ways to set things up, but, in general, anything you can do via a RAID &lt;u&gt;controller&lt;/u&gt; in terms of combining drives in any kind of configuration that includes a 1 (mirroring): 1 + 0; 0 + 1; or 10, will, based on my experience with 3 different manufacturer&#039;s controllers, give you the ability to survive a single disk failure--regardless of what they choose to call it and regardless of what order the setup happens in. 

If it&#039;s a software RAID setup, I don&#039;t have enough experience to offer an opinion. Given Mike&#039;s comments, it sounds like some care would be in order in that case. But if you&#039;re configuring a server for a serious production system, you wouldn&#039;t want software RAID for anything except perhaps the OS and application installations anyway (RAID 1). And I would not recommend using any type of RAID 0 (striping) in a software RAID situation anyway: the only way it&#039;s any good is in hardware with mirroring.  storagereview.com  is an excellent site for many things disk related. Their discussion of RAID 0 is enlightening (they basically show that the received wisdom about striping is not correct).

Happy RAIDing. 
    </content:encoded>

    <pubDate>Fri, 11 Apr 2008 13:10:43 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1074</guid>
    
</item>
<item>
    <title>Tom: Oh, Vista has _acquired_ SirsiDynix...</title>
    <link>http://coffeecode.net/archives/111-Oh,-Vista-has-_acquired_-SirsiDynix....html#c1073</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/111-Oh,-Vista-has-_acquired_-SirsiDynix....html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=111</wfw:comment>

    

    <author>dan@coffeecode.net (Tom)</author>
    <content:encoded>
    You totally missed the idea that Unicorn will die and Horizon will become the ILS of focus moving into the future... 
    </content:encoded>

    <pubDate>Thu, 10 Apr 2008 23:17:46 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/111-guid.html#c1073</guid>
    
</item>
<item>
    <title>Mike Rylander: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1072</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (Mike Rylander)</author>
    <content:encoded>
    Just to fan the flames of confusion a bit more, there is empirical evidence to suggest that RAID5 will perform as well as RAID10 given enough drives and a sane controller -- the absolute minimum for this, from all that I&#039;ve seen, is 6 drives.

The reason for this is that once you add more than 3 drives the data can be read from multiple sets of 3-at-a-time, bringing performance back toward the RAID10 level.  Of course, if you&#039;re going to put 6 or 12 or 24 drives in one RAID set then you might as well use RAID10, because you&#039;re obviously not concerned about disk cost.  Unless your controller is better at RAID5 parity calculation than it is at concurrent read on RAID10.  Such animals are said to exist.

With storage costs being what they are these days, the biggest consideration with RAID5 is that you need to have a really good controller with fast and sane hardware parity calculation.  Of course, you need a good controller for RAID10 too, but RAID10 is easier to implement well than is RAID5.

One other thing, there&#039;s a big difference between RAID 1+0 and RAID 0+1 ... the former is Good(tm), but the latter will cause you to lose an entire &quot;side&quot; of your mirror if one drive goes bad -- decidedly ungood.  The RAID 0+1 is usually only an option with software RAID, though, for this very reason.

Did that answer any questions or clear anything up?  Nope.  It really just means this: test with your real workload, and when all else fails add mor RAM.

And with that hardware nerdiness out of the way, I bid you all adieu!

--miker 
    </content:encoded>

    <pubDate>Thu, 10 Apr 2008 15:46:22 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1072</guid>
    
</item>
<item>
    <title>John Craig: Test server strategies</title>
    <link>http://coffeecode.net/archives/155-Test-server-strategies.html#c1071</link>
            <category></category>
    
    <comments>http://coffeecode.net/archives/155-Test-server-strategies.html#comments</comments>
    <wfw:comment>http://coffeecode.net/wfwcomment.php?cid=155</wfw:comment>

    

    <author>dan@coffeecode.net (John Craig)</author>
    <content:encoded>
    I doubt if the TCP-C benchmarking question involves whether IBM or Oracle or any other participant would &quot;give up  a drop of performance&quot; in a benchmark situation. My strong suspicion is that the benchmark rules require a RAID 5 disk configuration. Although I haven&#039;t been able to verify this conclusively from the tcp.org site, check out page 148 of this document: http://www.tpc.org/tpc_app/spec/TPC-App_V1.3.pdf
--my reading is that RAID 5 is a requirement of the benchmark. Historically, RAID 5 no doubt made sense, and now, when it&#039;s of much less practical use, they may just want to keep things consistent with older test results (back in the day of 2GB drives, you needed RAID 5 to get a big array). 

But, just looking at raw performance, a simple measure of disk subsystem performance (using say IOMeter&#039;s File Server simulator) with RAID 1 vs. RAID 5 will definitely show that the disk subsystem performs substantially better with RAID 1 (particularly a fair number of queued IOs and for a high read/write ratio--and a library system is a fairly extreme case of reads to writes). Note that the raw performance of the disk subsystem is independent of how the OS interacts with the drives vs. how the DB does: there&#039;s no getting around the fact that a RAID 5 system will have higher seek times (even with good command queuing, positioning heads on 1 drive will beat positioning them on 3--although it won&#039;t be a simple 1 to 3 ratio, of course) and the XORs to recreate the actual data read from the RAID 5 array can&#039;t be free, no matter how efficient they are.

I&#039;ll be interested to see your test results, if you have time to do them (without using something like IOMeter, it may be moderately difficult to simulate an appropriate load at which the performance would really diverge--but more on that below), but the only reason for using RAID 5 over RAID 1 or RAID 0 + 1 is that it&#039;s more efficient from a cost-per-GB point of view than RAID 1 or 0+1. (The fact that you can have a hot spare drive configured for either setup means the fault tolerance is equivalent: either survives a single-drive failure.) This cost-per-GB fact was important in years past, perhaps, and for truly huge databases it might still be a consideration. But, for a library DB (all but the absolute biggest ones I&#039;ve encountered in 17 years in the business are less than the size of one 73GB drive--most a LOT less), with current hard drive technology, the cost difference of more spindles (taking system cost as a whole into consideration) is insignificant. There&#039;s also the issue that RAID 5 performance varies tremendously from RAID controller to controller--but since RAID 1 is so much simpler, if you don&#039;t happen to have the best RAID card available, you&#039;ve got a better chance of getting good performance from whatever you&#039;ve got if you take the XORs and more complex head-positioning out of the mix by not using RAID 5.

I certainly wouldn&#039;t recommend anyone ever assume that they&#039;re getting the best performance from a DB server if they host the DB on a RAID 5 setup. We have a test server at one of our clients&#039; sites, it has copies of the live DB on both RAID 5 and just individual disks (no RAID--which is slower for reads than RAID 1, but faster for writes). The individual disks are definitely faster (and not just a little faster--I mean at least 3 to 5 times faster). The RAID 5 performance is acceptable (especially at low load levels), but we would never use that arrangement for a production system. (We have the one copy on the RAID 5 array because there&#039;s extra space there in addition to the backups we keep on the array.) It is true that the RAID 5 performance of the Adaptec controller in that box isn&#039;t stellar.

I agree with the &quot;try it yourself&quot; philosophy too, and my actual experience with the same DB server hitting different DBs on different RAID arrangements has been crystal clear: RAID 1 or 0 + 1 gives substantially better performance. At low loads you might not be able to see the difference, but in the case of Evergreen&#039;s keyword searching, for instance, which is highly random in terms of disk accesses, my guess is that doing cache-cold searches for common terms would immediately show a perceptible (not just a measurable) difference.

My real-world experience, plus the fact that disk space (even employing the fastest available drives) is a relatively inexpensive part of a system, leads me to conclude that there&#039;s no reason to use RAID 5 to store DB data. And if I were spec&#039;ing a system now, rather than a RAID 5 array for backups, I&#039;d consider using mirrored big and inexpensive SATA drives for that too. For databases of the size typically encountered in the library world, RAID 5 has no practical benefits. 
    </content:encoded>

    <pubDate>Thu, 10 Apr 2008 15:04:55 -0400</pubDate>
    <guid isPermaLink="false">http://coffeecode.net/archives/155-guid.html#c1071</guid>
    
</item>

</channel>
</rss>