Number one on my second list of what Drizzle needs is replication documentation. So I’ve been completely rewriting the replication documentation for Drizzle 7.1. This is critical, in my humble opinion, not only because complete and accurate replication docu is needed, but also because I’m giving a talk “Getting Started with Drizzle 7.1” at the MySQL Conference & Expo 2012 and replication will no doubt be a key topic. In the course of rewriting the replication docu, I have so far found 6 bugs and discovered countless details about Drizzle replication using the slave plugin, all of which are now documented. It turns out that documenting every little aspect of something is a great way to find bugs because one is forced to actually try all the options, which leads to a lot of “what if” questions. For example, I asked myself recently: “What if I use the simple_user_policy plugin on the master and slave? Will this filter replication depending on the slave’s master-user?” (It doesn’t; on the contrary, slave and simple_user_policy plugins don’t work together).
None of the bugs that I’ve found so far are critical. They are all “polishing”. Drizzle replication using the slave plugin works and seems to work very well. It’s both simple and robust. I’m told by core developers that its performance will shine brightly in benchmarks, but that remains to be proven.
I feel certain the new replication documentation will be done and merged before the conference next month. At the moment, don’t bother reading anything on the web about Drizzle replication because it’s all wrong (including docs.drizzle.org). If want to know right now how to make Drizzle replication work, build the docs from my branch; else, wait for my branch to be merged and update docs.drizzle.org, or come to my talk at the conference in April.
I’m giving a talk titled Getting Started with Drizzle 7.1 at Percona Live MySQL Conference & Expo 2012. Therefore, I want to follow up on a post of mine from six months ago: What Drizzle needs. This time, my point of view is what Drizzle (7.1) needs from the naïve/new user’s perspective (by naïve I don’t mean stupid; I mean wants to begin using Drizzle but knows nothing about it yet).
First, let’s review my previous six items:
- Query logging
- Clearer contributor guidelines
- Independent benchmarks
#1 was solved: I figured out and documented how to make
auth_pam work, and I wrote
auth_schema for MySQL-like table-based authentication using encrypted passwords. #2 was also solved: Henrik and I rewrote a bunch of the Drizzle docs: https://blueprints.launchpad.net/drizzle/+spec/docs71-focus-areas. #3 was solved: I wrote
query_log that’s like the MySQL slow log but better (because its output format is consistent and documented), but we still need to fix bug 779708. #4 was solved enough: Contributing Code. My original request for this need was too optimistic. Drizzle is a big project with lots of moving parts and sometimes people just want to fix a little thing and they don’t care about the whole development process, and some people (like me) care about the whole development process. So chances are we can’t make everyone happy. The Drizzle mailing list is active and helpful. #5 is still an issue, but it’s politics that I don’t want to get into until I can walk my talk (at the moment, I’m working 60-70 hours/week so I don’t have time to provide any leadership, but that will change in a few months). #6 is in progress. I started on some big hardware, ran into a bug, that bug was recently merged, so hopefully we’ll have benchmarks soon. So all told, my original needs have been addressed in the last six months.
But we’re not home yet, so to speak. Drizzle is really close, I think, to crossing the critical threshold of becoming a usable product in the real world, but a few must-haves are still missing. Again, as Charles Kettering said, “A problem well stated is a problem half solved.” So here’s my next round of needs.
1. Replication documentation
Drizzle’s replication system is awesome, in my humble opinion, but it’s an awesomeness shrouded in mystery. What is the relationship between Drizzle Replication and Replication Slave? The former section needs to be the single, complete source of knowledge regarding Drizzle replication, else the naïve user will ask: which do I use? which is the replication system?
2. Example or default configurations
I wrote the current sections on Configuration and Administration, and they have pretty much all the facts, but those facts are disconnected. I don’t think it’s unreasonable to tell a user “you’re smart, RTFM and figure it out yourself”, but in reality people like to be told what to do, to be given the answers, to see examples, etc., and then they work from that, consulting documentation as needed. So the documentation (or Drizzle itself somehow) needs a section that says “for a single standalone server, do …” and “for a master and one replica, do …”. Maybe I can persuade Percona to add Drizzle as an option to their Configuration Wizard.
3. Dynamic plugins
Currently, only the plugins that I’ve written are fully dynamic, i.e. capable of being (re)configured while Drizzle is running. Early versions of MySQL were annoying because, for example, you had to restart the server just to turn the slow log on or off. How ridiculous! How much trouble is it to close a filehandle and open a new one? Sadly, many important Drizzle plugins act the same way which is a terrible insult to what I consider to be the database server’s motto: “Leap forward.” I’m working to fix this: https://blueprints.launchpad.net/drizzle/+spec/plugin-standards
4. Sane versioning and releases
Versioning has been a hot topic on the mailing list lately and I don’t want to fan the flames, but the Unix world is married to versions like 7.1.1, not 2012.01.23, which is not January 23 2012, but January 2012 revision or release number 23, I think. This is just confusing and since the apparent major and minor numbers change every year and month, it makes it look like Drizzle is unstable. Also, Drizzle needs to stop “cowboy coding” as Patrick put it. Doing dev releases every 2 weeks is too much; it’s too much to follow and the real world doesn’t care; the real world only cares about real, stable releases. On this topic, and related to original need #5, the current releases are not really planned. This is “cowboy coding”: whatever is done is put into a release and voila. But a serious database like Drizzle needs planning, direction; the important issues and bugs must be fixed first, etc. Then all this organized work needs to be packaged, versioned, and delivered sanely. Users must be able to determine easily what was fixed or changed from one version to the next, which requires that they understand the versioning. One last example before I stop belaboring this point: https://launchpad.net/drizzle/fremont/2012-01-13 has no targeted bugs or blueprints. We’re tragically underutilizing Launchpad.
5. Generic binaries
Drizzle provides .deb and .rpm packages, and on Ubuntu 11 it’s really easy to get Drizzle, but I think generic binary tarballs are pretty commonly used by administrators because repos can be a pain to manage, they may not deliver the desired version, or they may have a bunch of dependencies, etc. Also, generic binaries will allow us to create Drizzle sandbox servers.
Drizzle came a long way to its 7.0 GA release early last year, and it’s come another long way in the last six months. The next release, Drizzle 7.1, will be significant, but in my humble opinion it still won’t gain traction in the real world unless these needs are also solved. There are a few people in high places toying with Drizzle, trying to implement it, but it takes a lot more than these few people to make the product “pop”. I would like to see 1,000 companies, no matter their size, using Drizzle for serious, day-to-day operations by the end of 2012. That will happen if Drizzle is the obvious choice for ease and usability, but MySQL has a big lead in these areas, so Drizzle must leap forward.
Drizzle is known for its plugins, but the
DATA_DICTIONARY.PLUGINS table has a column for
MODULE_NAME and there’s a
MODULES table. What is the relation between Drizzle modules and plugins? A Drizzle module provides one or more plugin of potentially various types. Or, analogously: a Drizzle module is a bookshelf and each plugin is a book. Let’s look at an example: the
drizzle> SELECT plugin_name, module_name, plugin_type -> FROM DATA_DICTIONARY.PLUGINS -> WHERE module_name='syslog' -> ORDER BY by module_name, plugin_name; +----------------+-------------+--------------+ | plugin_name | module_name | plugin_type | +----------------+-------------+--------------+ | Syslog | syslog | ErrorMessage | | syslog | syslog | Function | | Syslog Logging | syslog | Logging | +----------------+-------------+--------------+
syslog module creates three plugins, two named “Syslog” and one named “Syslog Logging”. Each plugin has a different type corresponding to the various features that this module provides. The
ErrorMessage plugin provides Drizzle error message logging to the standard system syslog (/var/log/system.log on my Mac). The
Function plugin provides the
syslog() function which allows you send messages to the syslog through Drizzle:
drizzle> SELECT syslog("local0", "warning", "Hello, world!"); +----------------------------------------------+ | syslog("local0", "warning", "Hello, world!") | +----------------------------------------------+ | Hello, world! | +----------------------------------------------+ $ tail -n 1 /var/log/system.log Dec 4 10:18:10 beatrice drizzled: Hello, world!
Logging plugin provides query logging to the syslog. For example, the query above was also logged in the syslog:
Dec 4 10:18:03 beatrice drizzled: thread_id=3 query_id=11 db="" query="select syslog('local0', 'warning', "Hello, world!")" command="Query" t_connect=172185931 t_start=410 t_lock=410 rows_sent=1 rows_examined=0 tmp_table=0 total_warn_count=0
Most modules provide only one type of plugin, but as this example shows, that’s not necessarily the case. To see all modules, their plugins, and those plugins’ types, execute:
SELECT * FROM DATA_DICTIONARY.PLUGINS ORDER BY module_name, plugin_name;.
Some gotchas to be aware of:
- A module may not always create a plugin. The
syslogmodule only creates its
Loggingplugin when Drizzle is started with
- Currently, the only way to learn what a module’s plugins do is the Drizzle documentation which, at the moment, is far from complete. (I had to look at the source code to figure out how to use
- The plugin types are not yet documented, but their names are mostly intuitive.
- Plugin and module names are currently nonuniform, which makes querying the
PLUGINStables awkward and associating options and variables to their corresponding modules/plugins difficult (but I’m working to fix this).
In conclusion: Drizzle modules provide one or more plugin of potentially various types. The
DATA_DICTIONARY.PLUGINS table reveals which modules provide which plugins and those plugins’ types. The concept of modules is mostly applicable for developers because it’s plugins that the user ultimately uses. But it is nevertheless good for users to be aware that behind each plugin is at least one module, and one module may be behind several plugins.
I’m surprised and delighted to see that the Drizzle documentation was updated recently. Last time I looked, it was the original documentation which was missing, among other things, information about its 70+ plugins. So Henrik and I began filling in missing pieces of crucial information like administering Drizzle. I even generated skeleton documentation for every plugin, including command line options and variables.
Imho, the new Drizzle documentation is the most important development since the 7.0 GA release because it makes Drizzle understandable and therefore usable to the average DBA.
Drizzle trunk as of 2011-09-20 (r2422) has a new revision of my query_log plugin which is important because that revision works with the latest trunk revision of mk-query-digest (
wget maatkit.org/trunk/mk-query-digest). I made the query log format truly consistent and then wrote DrizzleQueryLogParser for mk-query-digest and added the command line option
I don’t intend for mk-query-digest to be the de facto standard tool for parsing Drizzle query logs; this is just a quick solution and proof-of-concept. In fact, I’m working to create new tools for Drizzle using Python 3, but that project won’t be ready for awhile. In the meantime, the latest Drizzle (which may become Drizzle 7.1) has a highly-structured query log and a very stable, feature-complete tool to parse it.