Saturday, April 21. 2012A Sequence Generator
The memcache protocol has an INCR command, which atomically increments a stored value. MySQL Cluster is able to take simple operations (like incrementing a counter) and "push them down" to run on a data node with a lock held. Acquire the lock, increment the value, read it, and release the lock.
Originally in memcached, the incrementing sequence number was local to one memcache server, and was also volatile. Whenever the memcached server restarted, the sequence would be reset. Today, though, we can put cluster and memcached together, and instead of a volatile local sequence number, we get a durable and shared one. This is a very simple application. Keep reading to see the whole thing, start to finish. In my example I'll use MySQL Cluster 7.2.4 and the memcached ASCII protocol. There is a minor (but annoying) bug, already fixed in 7.2.6, that prevented this from working with the binary protocol. So, keep that in mind, and grab 7.2.6 or later once it is released. Continue reading "A Sequence Generator" Friday, March 30. 2012Storing long values with NDB and Memcache
With MySQL Cluster 7.2 we introduced native support for the memcache protocol. From the very start of the project, we knew there would be some challenges in storing large values – "blobs" that are bigger than the maximum NDB row size – from the memcached server. Coincidentally, in 7.2, we also increased this maximum row size from 8000 bytes to 14,000 bytes. This helped out a little bit when the first preview release became available, back in September, and there was still no way to access blobs through memcached.
We had always wanted the memcached server to be event-driven and to use asynchronous APIs on the back-end, but we had no asynchronous API for blob access. Reading a blob is a multi-step operation: it requires reading first a header row, and then a series of blob parts from a (hidden) parts table. The existing blob implementation, known as NdbBlob, encapsulates all of this into a single synchronous, blocking, API call. Continue reading "Storing long values with NDB and Memcache" Monday, October 10. 2011The NeXT
Sitting on a shelf in my garage is a reminder that Steve Jobs' product launches didn't always go the way he hoped for. It's my NeXT station and my NeXT laser printer. The printer has a perfectly straight paper path; it can print on cardboard. As for the computer, I learned a few years ago in NERT class that if the house were ever on fire, the firemen would probably want to be alerted about its magnesium case.
Continue reading "The NeXT" Tuesday, October 4. 2011Open World & a bunch of camps!
It's quite a week here! Craig Russell and I will be giving a talk about the NoSQL APIs for MySQL Cluster (mod_ndb, memcached, and ClusterJ) twice: first at Oracle Open World at 9:00 AM on Thursday morning, and then again at Silicon Valley Codecamp on Saturday afternoon. Tomas Ulin's keynote address yesterday morning unveiled some of the goodness in MySQL 5.6 and MySQL Cluster 7.2 (like Batch Key Access, the great optimizer feature from the ill-fated MySQL 6.0 tree).
And last night I dropped by Dave Neilsen's NoSQL Camp, where I saw Andy Twigg ask a very interesting question in a 5-minute lightning talk. Andy simply pointed out that with hard disk drives, random I/O is slow and capacity is cheap -- so of course it makes sense to denormalize your data (for instance, into a document database like MongoDB). But on solid state storage, random I/O is fast and capacity is expensive. Do you want to denormalize your data on an SSD? Maybe not. Tuesday, August 23. 2011NoSQL Now!
I'm headed to NoSQL Now! in San Jose this week. I'll be speaking on Thursday morning about our NoSQL APIs for MyQSL Cluster -- memcached, ClusterJ, and mod_ndb. Our upcoming MySQL Cluster release introduces a Memcache API with near-native NDB performance, and, from the SQL side, includes major algorithmic changes to the MySQL optimizer so that, at long last, MySQL Cluster handles multi-way joins by "bringing the query to the data."
It's been a long, long road from the very first MySQL Cluster releases in 2004 to this point -- but this release really brings me back to the excitement I remember when I first heard of MySQL Cluster, with its combination of low-level (NDBAPI) and high-level (SQL) access to the same distributed database. In 2004, the NDBAPI was hard to use, the SQL access turned out to have severe limitations, and managing the database was more complex than I'd expected. Mangaing distrtibuted databases might never be easy, but today there are tools built around MySQL Cluster that take away a lot of complexity and shorten the learning curve -- both from Oracle (MySQL Cluster Manager) and from the larger user community (Severalnines Configurator). And I think we've gotten the APIs right: SQL with adaptive query localization, Memcache for LAMP apps, and the Hibernate-style high performance Java interface, ClusterJ. I'm also excited to spend two days at NoSQL Now! and get some fresh insight on the latest buzz. I'm still inclined to think schemas are good -- after all, if a developer misspells the name of an attribute, do you want to implicitly introduce a new misspelled attribute into the data set? But people are out there solving big problems, and choosing the tools to solve them with. Some of those tool choices are turning out to be unorthodox, surprising, and quite insightful. It will be a good week. Monday, May 2. 2011NDB & Memcache: when to cache?
I had a great time at this year's MySQL Conference and Expo, introducing the new memcached API for MySQL Cluster (now available on launchpad). There is an interesting question about how you might use the new API: should you turn the cache off, so that the memcached server is just an easy-to-use, high performance NoSQL gateway to the data stored in the cluster? Or should you enable caching in memcached? The memcached server – actually the NDB engine plugin in the server – allows either. In fact you can tune this on the basis of a key prefix, so that some keys are cached and others are not. But how would you make the choice?
I look at it this way. Not all MySQL Cluster databases are the same. Some may store all the data in memory, while others store some tables on disk. Some may live in ideal, low-latency environments, while others have higher network latencies, maybe because of the distance between nodes or maybe for some other reason. Continue reading "NDB & Memcache: when to cache?" Sunday, April 10. 2011MySQL Cluster and Memcached: Together at Last
Memcached is a simple solution for tough scalability problems. MySQL Cluster is a telco-grade network data store. Both are capable of performance far beyond what a relational database (even as good of one as MySQL) usually delivers. But memcached always relies on some other technology to persistently store data, and while MySQL Cluster can deliver exceptional throughput and response times, making it do so has not always been easy.
So, people have long discussed the idea of putting the two together, and I've been lucky enough to spend the past six months working on it. Continue reading "MySQL Cluster and Memcached: Together at Last" Monday, July 21. 2008Page Size and the Five-Minute Rule
As the years go by, data pages on disk have to get bigger. 16 KB pages were good for databases in the late 1990's, but today's data pages should probably be 64 KB. Page sizes go up over time because memory gets cheaper, and disks get much larger, but disks do not get very much faster.
In 1997, a Megabyte of memory cost $15, but today it costs 10 cents. A new SCSI drive held 4 GB then but 146 GB today. If cost is held constant, today's machine has 150 times the memory and 36 times the storage of a machine from ten years ago, but the performance of disks is only about 3 times better. Disks are still about 3 inches across and turn at about 10K RPM; today's best disk might deliver about 180 I/Os per second (IOPS), compared to 60 IOPS in 1997. And since the disk doesn't move much faster, it makes sense -- given the cheap storage both on disk and off -- to transfer a bigger chunk of data back and forth with every disk access. There's a rule of thumb that data engineers can use to judge cache sizes; it says a system should have enough RAM to cache any data that will be reused within about five minutes. Continue reading "Page Size and the Five-Minute Rule" Friday, March 21. 2008Summer of Code with MySQL Cluster and Apache
It's Summer of Code time again -- a chance for students to do some real-world programming, contribute to an open source project, and earn $4500 over the summer. You have just one week, March 24 - 31, to submit an application.
Over on MySQL's Summer of Code Ideas page, I have listed some projects around MySQL Cluster and mod_ndb. They require some experience with C and C++. I'm hoping they will be great summer of code projects:
I'm looking forward to the summer! Friday, December 28. 2007mod_ndb 1.0 and 1.1
A quick note on mod_ndb: the official 1.0 release, and the first beta of 1.1, are now available for download at mod-ndb.googlecode.com. Mod_ndb is an Apache web server module that provides a Web Services gateway into MySQL Cluster. Its documentation is at MySQL Forge.
Mod_ndb supports MySQL 5.0 and 5.1 along with Apache 1.3, 2.0, and 2.2, and it is distributed in source code form only – you build it for your particular versions of Apache and MySQL. One nice improvement in 1.1 is that it is easier to build mod_ndb for Apache 2.0, as the build system no longer has any dependencies on the old version of apxs.
Posted by John David Duncan
at
08:31
Sunday, October 7. 2007mod_ndb gets a SQL parser
The mod_ndb 1.0 release candidate is now available from mod-ndb.googlecode.com. mod_ndb is a "web services node" for MySQL Cluster: an NDB API node that runs as an Apache web server module and handles requests over HTTP. It supports MySQL Cluster 5.0 and 5.1, and Apache 1.3, 2.0, and 2.2.
A few months ago, I felt that mod_ndb's configuration parameters were getting too complex to remember, and realized that a SQL-like configuration language ("N-SQL") would be more intuitive. It's not quite as simple as SQL -- it does not have an optimizer, so it still requires you to dictate an access plan -- but it is more concise and readable than the strict Apache-style configuration that mod_ndb started with. A lot of other details have fallen into place in the last two months, especially regarding error handling, HTTP response codes, and documentation, so the newest release is the release candidate and will hopefully become mod_ndb 1.0. Monday, April 2. 2007Custom output formats in mod_ndb
Often enough, it seems like you can begin working on software project with the assumption that you won't need to, say, design a little language and build a parser for it -- and yet, before long, that's exactly what you end up doing.
In the case of mod_ndb, my web services gateway for MySQL Cluster, the little language turns out to involve page-description -- it describes how to take a table of results from a database query and present it within an HTTP response. Inside an httpd.conf file, it looks something like this:
<OutputFormat XML>
Table scan = '<NDBScan>\n$row$\n...\n</NDBScan>\n'
Row row = ' <NDBTuple> $attr$ \n ... </NDBTuple>'
Record attr = '<Attr name=$name/Q$ value=$value/Qx$ />' or '<Attr name=$name/Q$ isNull="1" />'
</OutputFormat>
<OutputFormat JSON>
Table array = '[\n $object$,\n ... \n]\n'
Row object = ' { $pair$ , ... }'
Record pair = '$name/Q$:$value/qj$' or '$name/Q$:null'
</OutputFormat>
The three data types are a Table, representing a result table, a Row of data, and a Record, which (though it's probably not a very good term for it) is a column name/value pair. The record description breaks down to $name$ and $value$, with some flags: $value/Q$ means to put the value in quotes; $value/q$ means to quote it only if it's character (not numeric); $value/j$ means to encode the value JSON-style with backslash escapes, while $value/x$ means to encode it XML-style using HTML entities (> and so forth). A record description contains both non-null and null varieties. A row description describes how to loop over the columns in the row, and a table description describes how to loop over the rows of the table. Everything except the $variables$, the elipses, and the \n is plain text. The format is (I hope) concise and intuitive, and I can parse it with a simple little two-pass compiler. First read each line and build a symbol table; then walk the table and parse each format. It's not exactly how I had intended to spend the weekend, but it mostly works. Now I've got three weeks to test it and get the bugs out before the MySQL Conference. Wednesday, March 28. 2007mod_ndb
Last august, I started wanting a new API for MySQL Cluster that could solve a few problems. Frist, if you were to use MySQL Cluster in a LAMP application, you would be turning a two-layer architecture (Apache and MySQL) into three layers (Apache, MySQL, and NDB). Adding that extra tier requires more hardware and means slower response times. Second, the one way around this was to write native NDB API code in C++, but learning the API and developing low-level code can require a big commitment of time. I wanted an easier way to eliminate the extra layer.
My solution is called mod_ndb, and it is an Apache module that runs inside a web server and connects directly to the MySQL Cluster data nodes. It supports a "RESTful" API -- applications make GET, POST, and DELETE requests over HTTP, and mod_ndb provides appropriate HTTP responses. Because it is HTTP, responses can be cached by proxies in a well-defined way, and data can be delivered "straight from the database to the browser" in an AJAX application. The code is now available at http://code.google.com/p/mod-ndb, with documentation hosted on the MySQL Forge Wiki at http://forge.mysql.com/wiki/Mod_ndb. While I don't have lots of performance measurements, the benchmarks so far show that if you simply replace a mysqld with an httpd and query it using HTTP instead of SQL, mod_ndb has performance very similar to or slightly better than mysql. (However, you have to use persistent HTTP, so that if you are making five queries you can make them all in one TCP connection, rather than five). And if mod_ndb allows you to improve the overall architecture, a very big performance improvement can be gained -- maybe even a factor of ten. The current release is not quite beta: some HTTP response codes need to be corrected, along with some formatting details of particular data types. Also, for now, mod_ndb can provide output either in JSON or in a simple XML format, but the big improvement I hope to make before the MySQL Conference next month is to support user-defined output formats. I'll be presenting mod_ndb at the conference on Tuesday, April 24th at 5:30 PM. Friday, September 8. 2006Exciting MySQL Meetup coming up
Next Monday's MySQL Meetup in San Francisco should be lots of fun. We'll meet at JasperSoft (303 2nd Street, Suite 450) at 7:00 PM, Mon 11 Sep., and have special guest Teodor Danciu, the creator and architect of Jasper Reports. The guys from JasperSoft will also show off some freely available reports for Bugzilla, and I'm going to give a quick 5-slide presentation on the MySQL Roadmap.
Wednesday, May 17. 2006The web application is a mess
Most web applications use something like five languages. One of these is a big programming language, like Java or PHP, running on a server, and the other four are HTML, CSS, Javascript, and SQL.
SQL and HTML are "declarative": they let you state what you want, and the browser (for HTML) or database (for SQL) has to decide how to cook it up for you. This makes them easy languages for people to grasp; it also ensures that the browser and database server will be exceptionally large and complex pieces of software. (And apparently programmers are never really satisfied with declarative languages, so database servers grew stored procedures and browsers got Javascript.) CSS is a "little language," a vocabulary for design, more lexicon than grammar. Over the last ten years, a lot of presentation code has moved out of HTML into CSS, while today's HTML is full of DOM signposts ("id" attributes and "div" tags) that didn't used to be there. The full-fledged language at the center of a web app -- maybe PHP, Java, or Ruby -- is the one language in the bunch that is well-suited for a large software engineering project. Ironically, with AJAX, the user interface on the browser gets better, but the role of this central language in the whole scheme of the application gets smaller and smaller. Web apps still aren't as good as desktop apps. They are tooday's dominant software platform (or at least the one with the most momentum.) People are throwing out perfectly good CRM systems so they can pay per-employee, per-month, for Salesforce.com -- which, on it's own part, sees some big advantages: nobody there has to press CDs, help customers install the software, or worry about people who run old versions and refuse to upgrade. These advantages are all fortunate side effects of the design of the Web, but they are not fundamental to it. The World Wide Web was intended to be a big network of hyperlinked global public information. If it also revolutionized the distribution of data-driven business software, that's just an afterthought. Out of all those languages, only HTML was in the original plan. If, today, you really set out to create an internet application framework on top of HTTP, would you build it around a hodgepodge of dissimilar interpreted languages? As web UIs get better, can't you imagine that the web development stack will get even messier? Netfrastruture -- Jim Starkey's last project, which MySQL AB now owns -- was an interesting experiment in simplifying the server side of web apps. Maybe the rest of the stack can be cleaned up some, too.
(Page 1 of 2, totaling 19 entries)
» next page
|
Calendar
QuicksearchArchivesCategoriesSyndicate This BlogBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||
