Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Code review comment for lp:~akopytov/percona-xtrabackup/parallel-compression

parallel-compression
Merge into 2.0

Revision history for this message

Stewart Smith (stewart) wrote on 2012-02-10:

I have a number of thoughts:
- mostly with my xtrabackup-also-in-drizzle hat on, i'd prefer to see innodb OS abstraction file IO routines used instead of mysys. (also from a general dislike of mysys being some weird third thing and not posix or win32 io). With the datasink API this makes it a bit easier to just use native IO calls... have a POSIX local sink and a Win32 local sink.
- CRC32C instead of adler for XBSTREAM chunk checksum lets us use CRC32 instructions on Intel CPUs to compute checksums really fast.
- I think we should have a flags for XBSTREAM chunk. e.g. if unknown header a flag can be set to say if it's ignorable or not. This would allow us to add things like informational chunks in the future without breaking backwards compatibility.
- for xb_rstream_chunk_t, if data member is at the end, we can use the zero length array trick for reading it. Even if we don't use zero length array, it makes things a little nicer. I think this structure should also more mirror what's on disk (pathlength first).
- I wonder if we should have a incrementing number in the XBSTREAM chunk header. for disaster recovery, this may let the backup file be more easily reassembled from parts, or at least help figure out what's missing.
- i think the mutexes around writing the chunks out could be around less code
- the F_WRITE macro calls should be replaced with a single write call. doing multiple small write() calls is less efficient than one big one... actually... if we store the checksum along with the header rather than at the tail of the chunk, we save a write() call and even without much code modification we go from 3 to 2 write() calls in write_chunk().
- we should have defines for xbstream chunk payload type

in xb_stream_write_chunk()
- tmpbuf should be malloc() instead of on stack - will be easier to catch overruns

xb_stream_write_data()
- is it intentional that a series of writes of 100bytes, 1mb will be in 2 chunks (one 100bytes, the other 1mb)? It's certainly the simpler option :)

I have a number of thoughts:
- mostly with my xtrabackup-also-in-drizzle hat on, i'd prefer to see innodb OS abstraction file IO routines used instead of mysys.  (also from a general dislike of mysys being some weird third thing and not posix or win32 io). With the datasink API this makes it a bit easier to just use native IO calls... have a POSIX local sink and a Win32 local sink.
- CRC32C instead of adler for XBSTREAM chunk checksum lets us use CRC32 instructions on Intel CPUs to compute checksums really fast.
- I think we should have a flags for XBSTREAM chunk. e.g. if unknown header a flag can be set to say if it's ignorable or not. This would allow us to add things like informational chunks in the future without breaking backwards compatibility.
- for xb_rstream_chunk_t, if data member is at the end, we can use the zero length array trick for reading it. Even if we don't use zero length array, it makes things a little nicer. I think this structure should also more mirror what's on disk (pathlength first).
- I wonder if we should have a incrementing number in the XBSTREAM chunk header. for disaster recovery, this may let the backup file be more easily reassembled from parts, or at least help figure out what's missing.
- i think the mutexes around writing the chunks out could be around less code
- the F_WRITE macro calls should be replaced with a single write call. doing multiple small write() calls is less efficient than one big one... actually... if we store the checksum along with the header rather than at the tail of the chunk, we save a write() call and even without much code modification we go from 3 to 2 write() calls in write_chunk().
- we should have defines for xbstream chunk payload type

in xb_stream_write_chunk()
- tmpbuf should be malloc() instead of on stack - will be easier to catch overruns

xb_stream_write_data()
- is it intentional that a series of writes of 100bytes, 1mb will be in 2 chunks (one 100bytes, the other 1mb)? It's certainly the simpler option :)

« Back to merge proposal