Code review comment for lp:~nfernandez/akiban-persistit/persistit-warmup-capability

Revision history for this message
Peter Beaman (pbeaman) wrote :

I've been thinking some more about the general problem and have some suggestions.

This version represents good progress. It passes all tests and makes the warmup process optional.

Here are some additional issues I'd like to address, perhaps over time:

Writing the buffer pool inventory should be optional. I suggest using the same configuration property, isBufferWarmupEnabled(), to control the polling action. I also think we might be able to put the inventory-writing activity on the CHECKPOINT_MANAGER thread since it does work only about once every two minutes. The Javadoc for setBufferWarmupEnabled(boolean) says the property is settable from the Management class. We would want to actually make it so.

When we load a large buffer pool, say 1M pages of 16Kb each (which we already have on largish machines) it will be important to optimize the order in which these are read to reduce physical I/O delays. About the best we can do is to read pages in file offset order which is approximately linear with page address order - "approximately" because some pages may be read from the journal instead of the volume file. We probably want to sort the list of pages being read by physical file address order before reading anything.

We may benefit from repopulating the BufferPool in "clock" order. This will generally not be the same order in which we load the pages. The idea would be to read the pages in physical file order but put them in the _buffers array in an order that approximates their distance from the "clock" pointer at the time the inventory was taken. To do this we'll need a method other than BufferPool#get(...) to read and install the pages. I'm giving this some thought and will probably propose something later. We can do without this for now.

I wonder if there is any benefit in reloading only pages that have been in the BufferPool for awhile. For example, if we inventory the BufferPool during a data load operating we'll record lots of pages in the warmup file that when reloaded will never be used again. One possibility is to snapshot the inventory somewhat more frequently, say once per minute, and only include pages that have been in place for two or three cycles.

Small detail - should have brought this up earlier. Each Volume in the BufferPool has a small integer handle available through Volume#getHandle(). This is a good alternative to recording the volume name.

Finally, we can't put this in trunk without resolving where the warmup file goes; clearly we can't keep it in a fixed location in the /tmp directory. I think if we can resolve the file location, make the writing of the file optional through configuration, fix the Javadoc for setBufferWarmupEnabled not to refer to a method that currently does not exist in Management, we could probably merge this and at least try it in some of our bigger tests.

« Back to merge proposal