Year One and Drizzle Day 2012 in review
Drizzle Day 2012 happened last Friday, April 13, 2012 after PLMCE 2012 in Santa Clara, California. I also realized that my first anniversary as a Drizzle user and contributor has already passed: on April 10, 2011 I wrote my first Drizzle blog post: Compiling Drizzle 7 on Mac OS X 10.6. (This blog didn’t exist at that time so I used Hack MySQL.) It’s cliché to say but I’ve learned a lot in the last year.
First, I want to apologize to the core Drizzle developers for speaking without first knowing all the facts. In What Drizzle needs I said the project lacked and needed leadership. Although it does need a certain type of leadership as all projects do, my thinking at the time, which was reflected in the blog post, was that Drizzle was a ship without a captain or first officer, so to speak. I know now that that criticism was too harsh given what was happening to the core developers around this time last year. Furthermore, it is still too much to expect someone to work full-time on Drizzle because I also know now on what most of the core developers are busy working, and they’re quite busy indeed. So what can be done in term of leadership? It occurred to me that for the moment Drizzle doesn’t need traditional, top-down leadership because the core developers have known each other long enough and are talented enough to be self-directing. In essence, a “hive mind” is driving Drizzle. This became apparent to me after Drizzle Day with Brian, Mark, Stewart, Patrick, and Henrik in one room all day: a harmonious team doesn’t need “direction from above”; everyone does what they can to help realize the project’s goals.
Speaking of the project’s, i.e. Drizzle’s goals, one idea seemed to be prevalent at PLMCE 2012: cloud services. In Brian Aker’s keynote he demonstrated provisioning a MySQL server via HP Cloud; and in Mårten Mickos’s keynote he talked about new paradigms: client-server (scale-up) to web (scale-out) to cloud (multi-scale). These talks along with others during PLMCE and Drizzle Day 2012 finally clarified my understanding of Drizzle as a “Lightweight SQL Database for Cloud Infrastructure and Web Applications”: Drizzle is the first multi-scale relational database server. That’s a big claim and surely many people will argue with me, so here are my reasons.
First, as Mårten said, “In the cloud you must scale both up and out.” Why do this? Because as he also said, “the whole world is going online”. To meet increasing demand, database servers will have to scale up: use modern hardware and more of it. In other words: more cores, more RAM, faster storage, and a lot more threads. In Mark Callaghan’s keynote he talked about how InnoDB is still unable to realize the full potential of modern hardware. Drizzle is supposedly designed to avoid these issues because it’s optimized for modern hardware, but this has yet to be proven. Drizzle really needs independent benchmarks (need #6).
On the scale-out side, Drizzle replication has the potential to scale in ways MySQL can’t. Already it supports multi-master replication, and the fact that Drizzle replication is pluggable will open the doors to innovation. Although strong already, replication is an area where Drizzle should keep focusing efforts.
Third, Drizzle directly addresses a key aspect of the cloud paradigm: multi-tenancy. In the cloud paradigm, servers contain and serve many isolated services for many different customers. Using virtualization is one way to do this, or running multiple instances of a program, but those methods have known drawbacks. Drizzle multi-tenancy, which should be the star of the 7.2 release, directly satisfies this key aspect of the cloud paradigm, thereby making Drizzle a true database for the cloud.
So, Drizzle should work well on modern hardware, and its replication system is robust and flexible, and it will have native multi-tenancy. For these reason, I contend that Drizzle is the first multi-scale relational database server. Realizing this has helped me to stop thinking in old client-server MySQL terms and begin thinking in new cloud Drizzle terms. Granted, these considerations really only apply to the web and web applications; a company can still benefit the most from scale-up with MySQL in-house, but when it comes to Drizzle, serving web apps is the goal which requires multi-scale because, yes, the whole world is going online.
Another significant realization I had while talking with Patrick Crews was that Drizzle adapts to an environment rather than forcing the environment to adapt to it. Why does this matter? Again, Mårten Mickos’s spoke about how older database servers (he didn’t name names) didn’t adapt as quickly as MySQL, therefore MySQL has lead the way in cloud infrastructures. I agree, but MySQL has its own rigidities, namely: all its subsystems. For example, MySQL replication is what it is, and there’s no easy way to extend or change it. It took until MySQL 5.6 before global transaction IDs became reality, whereas Drizzle has had global transaction IDs since its first GA, 7.0. New query logging in MySQL? Forget about it; but in Drizzle it’s trivial. If those old database servers are whales, and MySQL is a dolphin, then Drizzle is a marlin–even faster than a dolphin.
Does Drizzle’s adaptability really matter? I think it does because another common prediction I heard at PLMCE was that cloud infrastructure standards are in their infancy today, so there are a few competing and incompatible ones. Mark Atwood talked about as much during his BoF. What will “the” cloud require in the future? Nobody knows, but it should be easy to write a Drizzle plugin to meet the requirements.
Finally, not to keep quoting Mårten but he just happened to say a lot that stuck with me, he noted how a database server takes 10 years to develop. I agree, and so did Brian in a tweet, noting also that Drizzle has a head start. In this respect, I should perhaps apologize again to the core Drizzle developers for constantly enough nagging the project about various things. I say “perhaps” because I think it’s fair to say that I contribute as much as I complain. In any case, I realize now that whereas I would like Drizzle to somehow explode onto the scene and be the talk of PLMCE 2013, I know that can’t happen. I’m not sure who said it in their keynote (probably Mårten), but they said that jumping into new technology was not good and not something people do; rather, people wade into new technology. Furthermore, the expo hall at PLMCE 2012 was really lively and there were a number of companies with MySQL or NoSQL or something-SQL servers I had never heard of. So I realized that it will be years more before Drizzle has even a fraction of the prominence that MySQL has, and it took nearly a decade (or more) for MySQL to become what it is today. That just gives us some breathing room though: time to fix the bugs and code the features that we know will make Drizzle the multi-scale relational database server of choice.
Drizzle 7.1 released
Drizzle 7.1 has been released. In my frank opinion, there’s no longer any reason to use 7.0; everyone should upgrade to 7.1 because it is far superior. Read the release announce for the list of new and fixed things. Most importantly, imho: sweeping updates to docs.drizzle.org. This is also good news because in a few days I’m giving a presentation, Getting Started with Drizzle 7.1.
Thanks to all the people who worked on Drizzle in addition to their day jobs, their families, and their personal lives.
How to find bugs in Drizzle slave replication
Number one on my second list of what Drizzle needs is replication documentation. So I’ve been completely rewriting the replication documentation for Drizzle 7.1. This is critical, in my humble opinion, not only because complete and accurate replication docu is needed, but also because I’m giving a talk “Getting Started with Drizzle 7.1” at the MySQL Conference & Expo 2012 and replication will no doubt be a key topic. In the course of rewriting the replication docu, I have so far found 6 bugs and discovered countless details about Drizzle replication using the slave plugin, all of which are now documented. It turns out that documenting every little aspect of something is a great way to find bugs because one is forced to actually try all the options, which leads to a lot of “what if” questions. For example, I asked myself recently: “What if I use the simple_user_policy plugin on the master and slave? Will this filter replication depending on the slave’s master-user?” (It doesn’t; on the contrary, slave and simple_user_policy plugins don’t work together).
None of the bugs that I’ve found so far are critical. They are all “polishing”. Drizzle replication using the slave plugin works and seems to work very well. It’s both simple and robust. I’m told by core developers that its performance will shine brightly in benchmarks, but that remains to be proven.
I feel certain the new replication documentation will be done and merged before the conference next month. At the moment, don’t bother reading anything on the web about Drizzle replication because it’s all wrong (including docs.drizzle.org). If want to know right now how to make Drizzle replication work, build the docs from my branch; else, wait for my branch to be merged and update docs.drizzle.org, or come to my talk at the conference in April.
Encore: What Drizzle needs
I’m giving a talk titled Getting Started with Drizzle 7.1 at Percona Live MySQL Conference & Expo 2012. Therefore, I want to follow up on a post of mine from six months ago: What Drizzle needs. This time, my point of view is what Drizzle (7.1) needs from the naïve/new user’s perspective (by naïve I don’t mean stupid; I mean wants to begin using Drizzle but knows nothing about it yet).
First, let’s review my previous six items:
- Authentication
- Documentation
- Query logging
- Clearer contributor guidelines
- Leadership
- Independent benchmarks
#1 was solved: I figured out and documented how to make auth_pam work, and I wrote auth_schema for MySQL-like table-based authentication using encrypted passwords. #2 was also solved: Henrik and I rewrote a bunch of the Drizzle docs: https://blueprints.launchpad.net/drizzle/+spec/docs71-focus-areas. #3 was solved: I wrote query_log that’s like the MySQL slow log but better (because its output format is consistent and documented), but we still need to fix bug 779708. #4 was solved enough: Contributing Code. My original request for this need was too optimistic. Drizzle is a big project with lots of moving parts and sometimes people just want to fix a little thing and they don’t care about the whole development process, and some people (like me) care about the whole development process. So chances are we can’t make everyone happy. The Drizzle mailing list is active and helpful. #5 is still an issue, but it’s politics that I don’t want to get into until I can walk my talk (at the moment, I’m working 60-70 hours/week so I don’t have time to provide any leadership, but that will change in a few months). #6 is in progress. I started on some big hardware, ran into a bug, that bug was recently merged, so hopefully we’ll have benchmarks soon. So all told, my original needs have been addressed in the last six months.
But we’re not home yet, so to speak. Drizzle is really close, I think, to crossing the critical threshold of becoming a usable product in the real world, but a few must-haves are still missing. Again, as Charles Kettering said, “A problem well stated is a problem half solved.” So here’s my next round of needs.
1. Replication documentation
Drizzle’s replication system is awesome, in my humble opinion, but it’s an awesomeness shrouded in mystery. What is the relationship between Drizzle Replication and Replication Slave? The former section needs to be the single, complete source of knowledge regarding Drizzle replication, else the naïve user will ask: which do I use? which is the replication system?
2. Example or default configurations
I wrote the current sections on Configuration and Administration, and they have pretty much all the facts, but those facts are disconnected. I don’t think it’s unreasonable to tell a user “you’re smart, RTFM and figure it out yourself”, but in reality people like to be told what to do, to be given the answers, to see examples, etc., and then they work from that, consulting documentation as needed. So the documentation (or Drizzle itself somehow) needs a section that says “for a single standalone server, do …” and “for a master and one replica, do …”. Maybe I can persuade Percona to add Drizzle as an option to their Configuration Wizard.
3. Dynamic plugins
Currently, only the plugins that I’ve written are fully dynamic, i.e. capable of being (re)configured while Drizzle is running. Early versions of MySQL were annoying because, for example, you had to restart the server just to turn the slow log on or off. How ridiculous! How much trouble is it to close a filehandle and open a new one? Sadly, many important Drizzle plugins act the same way which is a terrible insult to what I consider to be the database server’s motto: “Leap forward.” I’m working to fix this: https://blueprints.launchpad.net/drizzle/+spec/plugin-standards
4. Sane versioning and releases
Versioning has been a hot topic on the mailing list lately and I don’t want to fan the flames, but the Unix world is married to versions like 7.1.1, not 2012.01.23, which is not January 23 2012, but January 2012 revision or release number 23, I think. This is just confusing and since the apparent major and minor numbers change every year and month, it makes it look like Drizzle is unstable. Also, Drizzle needs to stop “cowboy coding” as Patrick put it. Doing dev releases every 2 weeks is too much; it’s too much to follow and the real world doesn’t care; the real world only cares about real, stable releases. On this topic, and related to original need #5, the current releases are not really planned. This is “cowboy coding”: whatever is done is put into a release and voila. But a serious database like Drizzle needs planning, direction; the important issues and bugs must be fixed first, etc. Then all this organized work needs to be packaged, versioned, and delivered sanely. Users must be able to determine easily what was fixed or changed from one version to the next, which requires that they understand the versioning. One last example before I stop belaboring this point: https://launchpad.net/drizzle/fremont/2012-01-13 has no targeted bugs or blueprints. We’re tragically underutilizing Launchpad.
5. Generic binaries
Drizzle provides .deb and .rpm packages, and on Ubuntu 11 it’s really easy to get Drizzle, but I think generic binary tarballs are pretty commonly used by administrators because repos can be a pain to manage, they may not deliver the desired version, or they may have a bunch of dependencies, etc. Also, generic binaries will allow us to create Drizzle sandbox servers.
Conclusion
Drizzle came a long way to its 7.0 GA release early last year, and it’s come another long way in the last six months. The next release, Drizzle 7.1, will be significant, but in my humble opinion it still won’t gain traction in the real world unless these needs are also solved. There are a few people in high places toying with Drizzle, trying to implement it, but it takes a lot more than these few people to make the product “pop”. I would like to see 1,000 companies, no matter their size, using Drizzle for serious, day-to-day operations by the end of 2012. That will happen if Drizzle is the obvious choice for ease and usability, but MySQL has a big lead in these areas, so Drizzle must leap forward.
Relation of Drizzle modules and plugins
Drizzle is known for its plugins, but the DATA_DICTIONARY.PLUGINS table has a column for MODULE_NAME and there’s a MODULES table. What is the relation between Drizzle modules and plugins? A Drizzle module provides one or more plugin of potentially various types. Or, analogously: a Drizzle module is a bookshelf and each plugin is a book. Let’s look at an example: the syslog module.
drizzle> SELECT plugin_name, module_name, plugin_type
-> FROM DATA_DICTIONARY.PLUGINS
-> WHERE module_name='syslog'
-> ORDER BY by module_name, plugin_name;
+----------------+-------------+--------------+
| plugin_name | module_name | plugin_type |
+----------------+-------------+--------------+
| Syslog | syslog | ErrorMessage |
| syslog | syslog | Function |
| Syslog Logging | syslog | Logging |
+----------------+-------------+--------------+
The syslog module creates three plugins, two named “Syslog” and one named “Syslog Logging”. Each plugin has a different type corresponding to the various features that this module provides. The ErrorMessage plugin provides Drizzle error message logging to the standard system syslog (/var/log/system.log on my Mac). The Function plugin provides the syslog() function which allows you send messages to the syslog through Drizzle:
drizzle> SELECT syslog("local0", "warning", "Hello, world!");
+----------------------------------------------+
| syslog("local0", "warning", "Hello, world!") |
+----------------------------------------------+
| Hello, world! |
+----------------------------------------------+
$ tail -n 1 /var/log/system.log
Dec 4 10:18:10 beatrice drizzled[5584]: Hello, world!
And the Logging plugin provides query logging to the syslog. For example, the query above was also logged in the syslog:
Dec 4 10:18:03 beatrice drizzled[5584]: thread_id=3 query_id=11 db="" query="select syslog('local0', 'warning', "Hello, world!")" command="Query" t_connect=172185931 t_start=410 t_lock=410 rows_sent=1 rows_examined=0 tmp_table=0 total_warn_count=0
Most modules provide only one type of plugin, but as this example shows, that’s not necessarily the case. To see all modules, their plugins, and those plugins’ types, execute: SELECT * FROM DATA_DICTIONARY.PLUGINS ORDER BY module_name, plugin_name;.
Some gotchas to be aware of:
- A module may not always create a plugin. The
syslogmodule only creates itsLoggingplugin when Drizzle is started with--syslog.logging-enable. - Currently, the only way to learn what a module’s plugins do is the Drizzle documentation which, at the moment, is far from complete. (I had to look at the source code to figure out how to use
syslog().) - The plugin types are not yet documented, but their names are mostly intuitive.
- Plugin and module names are currently nonuniform, which makes querying the
MODULESandPLUGINStables awkward and associating options and variables to their corresponding modules/plugins difficult (but I’m working to fix this).
In conclusion: Drizzle modules provide one or more plugin of potentially various types. The DATA_DICTIONARY.PLUGINS table reveals which modules provide which plugins and those plugins’ types. The concept of modules is mostly applicable for developers because it’s plugins that the user ultimately uses. But it is nevertheless good for users to be aware that behind each plugin is at least one module, and one module may be behind several plugins.