About MeI'm Dan Scott, barista, library geek, and open source dabbler.
You may know me from such projects as PHP
(PEAR's File_MARC package and
PDO documentation),
Apache Derby, and the
Evergreen open-source ILS project.
I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. License![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Canada License. Syndicate This Blog |
Monday, April 14. 2008Tuning PostgreSQL for Evergreen on a test serverUpdate 2008-05-01: Fixed a typo for sysctl: -a parameter simply shows all settings; -w parameter is needed to write the setting. Duh. Once you have decided on and acquired your test hardware for Evergreen, you need to think about tuning your PostgreSQL database server. Once you start loading bibliographic records, you might notice that after 100,000 records or so that your search response times aren't too snappy. Don't snarl at Evergreen. By default, PostgreSQL ships with very conservative settings (something like machines with 256 MB of RAM!) so if you don't tune those settings you're getting a false representation of your system's capabilities.
The "right" settings for PostgreSQL depend significantly on your hardware and deployment context, but in almost any circumstance you will want to bump up the settings from the delivered defaults. To give you an idea of what you need to consider, I thought I would share the settings that we're currently using on our Evergreen test server at Laurentian University. You might be able to use these as a starting point and adjust them accordingly once you've run some representative load tests against your configuration. And it's useful documentation for me to fall back on in a few months, when all of this has escaped my grasp The defaults (as shipped in Debian Etch)The defaults in Debian Etch are quite conservative. Consider that our test server has 12GB of RAM. The default only allocates 1MB of RAM to work memory (which is critical for sorting performance) and only 8MB of RAM to shared buffers. Following are the defaults set in /etc/postgresql/8.1/main/postgresql.conf: # - Memory - #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each #temp_buffers = 1000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). #work_mem = 1024 # min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each Our test server settingsOur test server has 12 GB of RAM. Assuming that the PostgreSQL defaults were set for a system with 1 GB of RAM, we should be able to multiply the memory-based settings by at least a factor of 12. We're a little bit more aggressive than that in our settings. Note, however, that this is a single-server install of Evergreen, so we're also running memcached, ejabberd, Apache, and all of the Evergreen services as well as the database - oh, and a test instance of an institutional repository, among other apps - so we're not nearly as aggressive as we would be in a dedicated PostgreSQL server configuration. Please note that I'm making no claims that this is the optimal set of configuration values for PostgreSQL even on our own hardware! # shared_buffers: much of our performance depends on sorting, so we'll set it 100X the default # some tuning guides suggest cranking this up to as much 30% of your available RAM shared_buffers = 100000 # 8K * 100000 = ~ 0.8 GB # work_mem: how much RAM each concurrent process is allowed to claim before swapping to disk # your workload will probably have a large number of concurrent processes work_mem=524288 # 512 MB # max_fsm_pages: increased because PostgreSQL demanded it max_fsm_pages = 200000 After you change these settings, you will need to restart PostgreSQL to make the settings take effect. Kernel tuningIn addition to PostgreSQL complaining about max_fsm_pages not being high enough, your operating system kernel defaults for SysV shared memory might not be high enough to support the amount of RAM PostgreSQL demands as a result of your modifications. In one of our test configurations, we had cranked up work_mem to 8GB; Debian complained about an insufficient SHMMAX setting, so we were able to adjust that by running the following command as root to set the kernel SHMMAX to 8GB (8*1024^2): sysctl -w kernel.shmmax=8589934592 To make this setting sticky through reboots, you can simply modify /etc/sysctl.conf to include the following line: # Set SHMMAX to 8GB for PostgreSQL #kernel.shmmax=8589934592 Other measuresDebian Etch comes with PostgreSQL 8.1. The first version of PostgreSQL 8.1 was released in November 2005. That's a long time in computer years. Version 8.2, which was released less than a year later, "adds many functionality and performance improvements" (according to the release notes). If you're not getting the performance you expect from your hardware with Debian Etch, perhaps a backport of PostgreSQL 8.2 would help out. Further resourcesThis is just a shallow dip into PostgreSQL tuning for Evergreen - hopefully enough to alert you to some of the factors you need to consider if you're putting Evergreen into a serious testing environment or production environment. Here are a few places to dig deeper into the art of PostgreSQL tuning:
Sunday, September 16. 2007The Linux Documentation Project, DB2, XML, and nostalgiaI took a walk down memory lane this evening. I thought I might as well bore you with the details. One of my first forays into the open source world was to participate in the Linux Documentation Project (TLDP). At the time (circa 1999), I was working for IBM as a technical writer for DB2 database. IBM was releasing DB2 on the Linux platform, I was part of the pre-release testing team, and I had turned to TLDP to provide me with an introduction to the world of Linux as a total n00b. It was a godsend of information. When DB2 was officially released on the Linux platform, it only officially supported a handful of distributions. Given the normal technical writing and release cycle, the officially supported versions of the distributions were woefully out of date in the official documentation. That, and much of the required installation and configuration information was either missing, or wrong. Don't lay any blame on the people involved; that was just the way that the release process (including translation into umpteen languages that were all available on release day) forced the end product to be. My focus was on application development, but I had to get test environments set up so I could ensure what I was writing actually worked (that's the way I roll as a tech writer). Of course, I chose an unsupported-by-DB2 but much more current distribution (Mandrake Linux 5.3 "Venus" I believe) simply because it would install on my hardware, when Red Hat 5.2 would not. It struck me that my install experiences would help other DB2 users as well. I realized it would also give IBM a way around the barrier imposed by the restriction that the official documentation for a given release was published once per release - no updates. By contributing a DB2 HOWTO to TLDP, I would not only be able to provide documentation on the distributions that people were actually using, I would also be able to update the HOWTO as circumstances warranted. My manager supported the project, and helped me stickhandle some obstacles. The result, I believe, was beneficial all around; I contributed some code to TLDP to help improve the PDF output and helped mentor some TLDP n00bs; DB2 got some usable documentation when it really needed it; and I had the opportunity to learn a technical writing DTD that made sense (DocBook) and play with an impressive open-source publishing toolchain. Over time, my friend and co-worker Ian Hakes picked up the ball and drove the next iteration of the DB2 HOWTO with my help. It has been over a year and a half since I left IBM, so I haven't paid any attention to the DB2 HOWTO. Recently, however, as I was playing around with an updated version of the DocBook toolchain, I discovered that Ian has released a brand new version of the DB2 HOWTO to cover installation of DB2 Express-C on various distributions. He included a touching tip of the hat to me, as well. What a swell guy! On one hand, I'm not sure that TLDP has nearly as much of a mandate as it did eight years ago. There are scads of books, a handful of good magazines, blogs, wikis, and web sites all publishing information about Linux these days. On the other hand, there's something to be said for a corpus of documentation maintained and edited by volunteers who just want to get information into the hands of people who need help -- without compensation, without publicity, and generally without thanks. So, given how much TLDP has helped me - thank you, TLDP volunteers. And thank you, Norm Walsh and the entire DocBook community, for providing an open-source publishing toolchain that starts with semantic XML and results in professional-looking documentation. P.S. I made another commit to the DB2 HOWTO tonight - just balancing out an XML element that was missing to return the document to valid XML state. And let me tell you, it felt good! Monday, December 11. 2006Java 6 is out; now including JavaDB (aka Apache Derby)A quick little note to mention the official release of Java 6. Okay, Dan, but why do you care?, you might ask. Good question, oh person-who-does-not-read-headlines. The reason I care is that Sun chose to bundle Apache Derby in this release -- take a peek in the /db/ subdirectory of the Java SDK. Bundling Derby is going to mean a huge boost to the visibility and usage of the little Java database that could. It will be the de facto default database for Java developers; and if they haven't already used it, I suspect they're going to be pleasantly surprised at Derby's robustness and ability to perform. I was chatting with a few of the DSpace developers a week or so ago, and mentioned my hope (in all my spare time) to port the DSpace institutional repository to Derby as a possible default database. Right now, you see, the default database for DSpace is PostgreSQL, and unfortunately correctly configuring PostgreSQL seems to be the biggest barrier new users encounter while deploying DSpace. Switch to an embedded Derby database, and those headaches go away. On the other hand, it seems that at least one of the DSpace developers have done a bit of experimenting with Derby in the past, as he claimed its performance suffered after 500,000 rows of data or so. Well, even if that is an insurmountable limit, that's a pretty good start for most institutional repositories -- and I suspect that the Derby developers would be highly motivated to show that Derby can, in fact, scale beyond that limit. So, if you're a Java developer or dabbler, get on out there and give Derby + Java 6 a try. You're going to have a lot of company. Oh yeah, and if you need a good book on Derby... Tuesday, November 14. 2006Neat-o: Archimède uses Apache DerbyA while back I mentioned on the DSpace-devel mailing list that I was interested in adapting DSpace to use embedded Apache Derby as the default database, rather than PostgreSQL, as a means of lowering the installation and configuration barriers involved with setting up access to an external database. I haven't had time yet to actually carry out my musing, but today I had the chance to set up the Archimède institutional repository on a test server -- and imagine my surprise when I saw a derby.log file sitting in the Archimède repository. It looks like someone else at Université Laval had the same idea as me much further back. It's still on my horizon to adapt DSpace to Derby; seeing that it works well for Archimède confirms my belief that it's the right direction to go. Tuesday, June 13. 2006In-depth _and_ official DB2 and PHP documentationI should have mentioned this before, but now that I noticed Chris Jones' post on the Underground PHP and Oracle Manual, I felt obliged to point out that one of the final fruits of my labours at IBM is now visible in the DB2 "Viper" Information Center -- a set of task-oriented documentation that describes how to do all of the things that you really need to do with DB2 and PHP, using either the ibm_db2 or PDO_ODBC modules. By "task-oriented" I mean that, instead of documenting a set of objects and methods, the docs take the perspective of a developer and describe how to accomplish specific tasks (like "Connecting to a DB2 database from PDO" or "Calling a stored procedure" or "Retrieving multiple result sets"). I hope it works as both a good introduction to PHP development for DB2 users, and a good introduction to DB2 for PHP developers. And, of course, the same approach will work for Apache Derby databases as well. I find it interesting that Oracle has positioned their PHP documentation as "underground", while IBM has chosen to incorporate their PHP documentation into their official set of DB2 documentation. Oracle gets the points for coolness, but IBM's approach will make the pointy-headed types a bit more comfortable.
Update: Corrected bad XHTML (unescaped ampersand in URL). Bad Dan. And corrupted an intermediate version with garbage from another posting. Even worse. Friday, February 10. 2006ADOdb: getting good support for IBM DB2, Cloudscape, and Apache DerbyThe stable release of the ibm_db2 PECL extension for IBM DB2, Cloudscape, and Apache Derby brought a high performing, highly functional database connectivity alternative to Unified ODBC for PHP 4 and 5 users. However, in and of itself a database extension does not enable you to use the many PHP applications that you might want to use. You either have to add a specific driver for each application that implements its own portability layer (such as phpMyFAQ), or if the application relies on one of the standard database abstraction layers (PEAR DB, MDB2, or ADOdb), then a driver needs to be added to the corresponding database abstraction layer. To date, the standard database abstraction layers have offered support for DB2 only through the Unified ODBC extension (and despite substantial overlap in names, MDB2 does not offer support for DB2 at all). Due to some limitations of the Unified ODBC extension, access to DB2 would seem slow and buggy -- and access to Apache Derby or Cloudscape would be frought with minefields, as Unified ODBC does not provide a way of differentiating between the databases to which you are connected and their corresponding features. The ibm_db2 extension offers the db2_server_info() function which can tell you whether you are connected to DB2 on Linux, DB2 on a zSeries machine, or an Apache Derby database, and let your application or database abstraction layer perform the appropriate workarounds. Now, however, as part of Larry Menard's efforts to enable Gallery 2, an ADOdb driver built on top of the ibm_db2 extension will, in all probability, be made available as part of a future ADOdb release. Undoubtedly there will be further testing to do, and tweaks and performance optimizations in the future code--for example, differentiating between the capabilities of Apache Derby and DB2--but this is a huge first step! Thanks to Larry and the Gallery 2 team for making this contribution. Wednesday, January 18. 2006(Un)common but potentially mind-twisting DB2 issuesIt is perhaps a sad reflection on my life that I'm using a personal blog to offer technical information about a product that I work on, rather than telling you about what I ate last night (spinach and goat cheese salad with a raspberry vinaigrette dressing, quinoa, and a rather tomatoey coq au vin), how the weather is (freezing rain and snow last night), or the last time we got together with friends (Peter and Deb came over last night for a lovely visit over wine and dinner). Instead, here are a few gotchas for DB2 on Linux that might otherwise take you a few hours to figure out. Shiny new DB2 for Linux (2.6 kernel) requires compat RPMs?DB2 for Linux Version 8.2 (aka Version 8.1 FixPak 7) has been validated on distributions running both 2.4 and 2.6 Linux kernels. However, starting with FixPak 9 (aka Version 8.2.2, which has replaced Version 8.2 if you order a brand new copy of DB2 with physical media today), you have to choose a 2.4 or 2.6 kernel-specific version of DB2 on the x86 and x86-64 architectures. If you're unlucky, you might find that once you install this newer version of DB2 meant for a shiny new distribution running a 2.6 Linux kernel, DB2 suddenly stops running and complains that it can't find the libstdc++.so.5 library. The reason is because the new versions of DB2 are built for compatibility with the first enterprise Linux distribution that offered the 2.6 kernel -- SLES 9 -- and guess what level of libstdc++ SLES 9 comes with? Yeah, you guessed right. And the reason the older versions of DB2 don't need libstdc++ is because they shipped with their own copy of the Intel C++ libraries. Fun stuff eh? The moral of the story is: install your compat-libstdc++ libraries (or whatever they're called), and you won't get hurt when you upgrade DB2. DB2 for Linux on POWER aka PPC64 aka pSeries: searching for libibmc++Here's another missing library problem for which Google turned up no help... After installing DB2 for Linux on POWER, the user tried to run any DB2 command and received the following error message: db2: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory The problem occurs because DB2 for Linux on POWER is compiled against the IBM XL C++ libraries. This means that you have to install the IBM XL C/C++ Advanced Edition V7.0 for Linux Runtime Environment as described in a DB2 technote. Tuesday, November 8. 2005DB2 validated for use on Ubuntu
Good news for all you DB2 for Linux people who love Ubuntu -- IBM just validated Ubuntu as a Linux distribution on which IBM is willing to offer its enterprise level support. You can see that Ubuntu was excited in their press release about the accomplishment. And rightully so: no other database server has announced production-level support for Ubuntu. Cool.
I've issued some updates to the DB2 for Linux HOWTO that will hopefully be published soon; I had based the original Ubuntu install instructions on my Gentoo experiences and there were a few embarassing references to that that I hadn't cleaned up (and which were painfully evident now that I'm running Ubuntu 5.10 "Breezy Badger" on the dual-boot side of my laptop).
In a future update to the HOWTO, I plan to offer a little more fine-grained installation advice; the current instructions install everything under the sun, including support for 29 different languages, warehouse agents, and assorted other features that probably aren't required for most developers or users. But at least you can get up and running relatively easily So if you want to try Ubuntu, go to http://ubuntulinux.org. It has replaced SuSE Workstation 9.2 on my laptop -- and I'm even using Gnome again, which I had given up on a few years ago. If you want to try DB2, you can download a 90-day trial of DB2 Express. Tuesday, November 1. 2005Apache Derby: Off to the Races in the (paper) flesh!
Just got back from a gruelling Paris - Amsterdam - Berlin - Paris - Toronto series of flights (don't ask) to wrap up my European October getaway and found this sitting on my doorstep.
Yes, sirs and madams, it is the official hard-bound 500+ page book known as Apache Derby: Off to the Races -- the book that gives you everything you need to know to develop applications with this scrupulously standards-compliant little database (and even do a little administration of this normally zero-administration beastie). As you may already know, I wrote a good chunk of this book: chapters on PHP, Perl, Python, ODBC network communication, maintenance, and tuning. It was written during a time that was pretty gruelling for me, personally and professionally, and at times it felt like it would never end -- but now that I can hold the thing in my hands, it seems worthwhile. (I hope Lynn thinks so as well!) And the experience of working with Paul and George was something that I'm glad to have had as well; I've learned a lot from them. So: if you don't have your own copy yet, it's time to buy a copy (yep, and if you use that link you'll even contribute an extra couple of pennies to my pocket). GlitchesAh yes, I have found a glitch already. Actually I had half suspected this was coming, due to the last-minute change I requested to keep pace with the rapid pace of PHP Data Objects (PDO) development and evolution... a global change to using class constants in PDO (from PDO_ATTR_BLAH to PDO::ATTR_BLAH) happened about a day before our final proofs were being sent off to the printer. I asked for the corresponding global change, and, of course, the name of the PDO_ODBC module now reads PDO::ODBC. It's not the end of the world, but it's a tiny bit embarassing. Tuesday, October 18. 2005SQLEUCCM error message for DB2Hopefully this proves useful to someone else in the same position in the future... I have been working on a simple PHP application that connects to DB2 using the ibm_db2 extension. Recently I decided to try scaling it up to test the application with a more realistic (10GB) database. I created a new test database on a partition that had enough room, but then encountered a very strange error message just trying to connect to the database: [IBM][CLI Driver] SQL0969N There is no message text corresponding to SQL error "-2079391743" in the message file on this workstation. The error was returned from module "SQLEUCCM" with original tokens "". SQLCODE=-2079391743 Needless to say, this message really didn't help me much. Googling and Yahooing for the terms also gave me no useful results (there was a thread on the International DB2 User Group mailing list, but the site requires a user ID and login and I just wanted to get on with my day). The DB2 support site didn't provide any hits, and neither did the DB2 Information Center. It took me a while to figure this out, but apparently the ownership and permissions on the directory that contains your tablespaces (if you're using filesystem-based, or "system managed" tablespaces) have to be "just so". DB2, for the uninitiated, uses operating system accounts and groups for database authentication and privileges. My DB2 instance user was "db2inst1", belonging to the "db2iadm1" group -- default values -- however, the directory in which I created the tablespaces belonged to user "dan" and group "db2iadm1". localhost php4 # ls -l /opt/photos/db2base -d drwxrwxrwx 3 dan db2iadm1 72 Oct 18 10:35 /opt/photos/db2base Okay, so even though the db2inst1 user is a member of the db2iadm1 group, DB2 apparently also wants it to own the directory in which the tablespaces live. By issuing a "chown db2inst1 /opt/photos/db2base" command, DB2 was satisfied and my PHP scripts started working again. And maybe, just maybe, the next user that runs into this problem will find this post in a search engine and save themselves an hour or two of befuddlement... Monday, August 8. 2005Apache Derby tutorial (OSCON 2005) materialsThe presentations, handouts, and solutions for the "Deep Dive with Apache Derby: Perl, PHP, and Python" tutorial I gave at OSCON 2005 are finally online, including a couple of last-minute corrections: The tutorial (my first solo 3.5 hour teaching session) went reasonably well, although my expectations of what comprises a tutorial apparently differs from the OSCON standard. I had structured the tutorial to be hands-on: after a brief lecture, the attendees were expected to perform exercises like installing Apache Derby, creating a database, setting up a connection through PHP, and the like. I was there to help them through the hurdles.
In contrast, most of the other tutorials were apparently three-hour lectures. I sat in on the end of Larry Rosen's *Law for Geeks*, which appeared to be a great discussion forum -- good format, important subject, great speaker. I followed that up with Monday morning's *Introduction to PostgreSQL*, which, at the one-hour mark, was going painfully slowly... so I slipped out at the break to do a last-minute run through of my own tutorial instead, and fixed a couple of bugs in the presentation materials just in time My tutorial had about 20 attendees at the start, but I knew it was going to be trouble when less than half of them actually had laptops. Of the laptops, about three had Linux (hurrah -- that matched my tutorial assumptions), a couple had Windows (I had tested everything on Windows, so I knew I could help them through), and one had Mac OSX (completely untested and foreign to me). That ratio was actually rather kind to me; in the rest of the OSCON audience, it seemed about half of the attendees were carrying Mac laptops.
After the break, I was down to about seven hard-core attendees. The attrition didn't surprise or dismay me -- if I was stuck in a hands-on tutorial without being able to get my hands on anything, I would probably leave too. Now, I had asked people to buddy up, but apparently overcoming the fear of strangers was too much to ask (and I admittedly didn't break the ice enough). The other rather frustrating factor was the rather sluggish wireless connectivity and ibm.com's sadistic time-outs playing hell with the 200 MB download required for the DB2 Application Development Client. I ended up copying all of the software onto a USB drive and distributing it manually to the attendees. Despite the frustrations with technology, though, I really felt positive vibes from the audience -- and nobody laid a beating on me in a dark corner after the session
Updated 2005-08-11: Fixed the link to the presentations. Apparently not many people have tried to look at them, or didn't bother to tell me that they were not found Updated 2007-11-20: Apparently my host dropped one of the libraries required by the PHP presentation system, so the link to my presentation stopped working. Fixed - thanks to Gordon Agress for bringing this to my attention. Wednesday, August 3. 2005Apache Derby graduates, Derby 10.1 has been released!Heh -- one day after my OSCON tutorial on Apache Derby: Perl, PHP, and Python, where I focused on the 10.1-alpha release from back in May, the Derby team announced the first Derby 10.1 release. Figures that there would be a flurry of activity after my talk I also noticed that, following the announcement of Derby's graduation from an Apache incubator project into a full-fledged Apache sub-project, the migration of Apache Derby from http://incubator.apache.org/derby to http://db.apache.org/derby is closer to completion. Great stuff! I think I'll have to talk with the Derby team about having a bit more of an open and more traditional release process; most open source projects that I've been involved with go with more than one alpha release, and they don't usually wait two months before suddenly declaring the official release. I have to assume that timing for OSCON and the Derby graduation into a full-fledged Apache sub-project played some part in the move, but it would be nice to see a more public discussion of the release cycle for future releases. Wednesday, June 15. 2005I'm an author! "Apache Derby: Off to the Races"
If you're looking for a book that covers Apache Derby / IBM Cloudscape top-to-bottom, and you want a gentle introduction to programming database applications with PHP, Perl, Python, JDBC, or .NET, Apache Derby: Off to the Races
Thursday, April 21. 2005Miguel deIcaza addresses MySQL User Conference
Miguel gave a high-energy, high-enthusiasm, but rather scattered keynote address to the MySQL User Conference this morning:
Continue reading "Miguel deIcaza addresses MySQL User Conference"
(Page 1 of 1, totaling 14 entries)
|
Calendar
QuicksearchCategoriesBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||


