As the years go by, data pages on disk have to get bigger. 16 KB pages were good for databases in the late 1990's, but today's data pages should probably be 64 KB. Page sizes go up over time because memory gets cheaper, and disks get much larger, but disks do not get very much faster.
In 1997, a Megabyte of memory cost $15, but today it costs 10 cents. A new SCSI drive held 4 GB then but 146 GB today. If cost is held constant, today's machine has 150 times the memory and 36 times the storage of a machine from ten years ago, but the performance of disks is only about 3 times better. Disks are still about 3 inches across and turn at about 10K RPM; today's best disk might deliver about 180 I/Os per second (IOPS), compared to 60 IOPS in 1997. And since the disk doesn't move much faster, it makes sense -- given the cheap storage both on disk and off -- to transfer a bigger chunk of data back and forth with every disk access.
There's a rule of thumb that data engineers can use to judge cache sizes; it says a system should have enough RAM to cache any data that will be reused within about five minutes.
(
About every five minutes means 1 to 10 minutes, loosely, or on the order of 100 seconds.) This "five-minute rule" was introduced by Jim Gray, in
a 1987 paper that balances
price vs.
performance. When you buy RAM, you pay for size, but when you buy disks, you pay for I/O performance. Gray said that if you reused a page more often than the "break even reference interval,"
RI, then the cost of storing it in RAM was less than the cost of accessing it on disk.
RI is defined using the performance of the disk and the ratio of the costs:
RI = ( Data Pages per MB ÷ IOPS per disk) × (Price per disk ÷ Price per MB of RAM)
In 1987, 1 MB of RAM cost $5,000, and a disk that delivered 15 IOPS cost $15,000. Using a page size of 1 KB, here's how the math worked out:
RI = (1024 ÷ 15) × ( 15000 ÷ 5000) = 204 sec.
Gray repeated the analysis
in another paper ten years later. In 1997, RAM cost $15 per MB, a 64 IOPS disk cost $2,000, and the page size was 8 KB. Amazingly, with those huge changes, the answer comes out about the same:
RI = (128 ÷ 64) × ( 2000 ÷ 15) = 266 sec.
Does the five-minute rule still hold true today? In June of 2008, a
high-end 2.5 inch, 15000 RPM disk -- adding 30% for controllers and whatnot -- costs $650 and delivers 183 IOPS. Memory costs $100 per GB, or ten cents per MB. I'll use InnoDB's default page size, 16 KB:
RI = (64 ÷ 183) × ( 650 ÷ .1) = 2273 sec.
2273 sec. is way above five minutes (38 minutes, actually). So, the math gives you a choice: either you buy enough RAM to cache everything that will be reused with a half hour, at $100 per GB, or you redefine a constant in software and use each of your limited IOs to move around more bytes. When pages go up to 64 KB, then 16 of them fit in a Megabyte, and the Reference Interval falls back below ten minutes:
RI = (16 ÷ 183) × ( 650 ÷ .1) = 568 sec.
Increasing the page size is something you can do in software -- almost for free -- to keep the economics on track. Microsoft SQL Server went from 2 KB pages to 8 KB pages in 1998, and it looks like InnoDB's 16 KB default pages were a good idea ten years ago but are too small for today. (Since InnoDB's data pages are organized as indexes, the story is actually a little bit more complex. But you can change the InnoDB page size at compile time --
Vadim tells you how here.)
With 64 KB pages, the five-minute rule is still true today. But the bigger story is that flash storage -- including Solid State Disks -- is about to change everything. Today's flash is less expensive than the disk drives of 10 years ago. Over time, it seems possible that flash may replace disks as our primary durable storage technology. If that happens, there will be no more five minute rule. Until then, we face a complex question -- how to optimally balance and use some mix of RAM, flash, and disks. I'll be writing more on that in the months to come.