libdrizzle: performance: don't memset(0) drizzle_column_st, which is huge

Bug #687582 reported by Evan Jones
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Drizzle
Fix Released
Medium
Monty Taylor
7.0
Fix Released
Medium
Monty Taylor

Bug Description

 drizzle_column_create gets called once for every column in every result set that is received by libdrizzle. This function calls memset(column_ptr, 0, sizeof(*column_ptr)). This structure is ~6 kB in size. For my performance sensitive libdrizzle application that reads many small result sets, this means it is writing far more zeros to memory than actual data. Removing this call to memset improved the throughput of my application by 25%!

The attached patch is against my internal source project, since pulling the drizzle bazaar repository to this machine is taking way too long (its huge!). This means you'll need to use patch -p3 or -p4. I *think* I've initialized all the required fields, although it has only been lightly tested. In my application, this runs with ~10 concurrent requests without valgrind complaining.

Related branches

Revision history for this message
Evan Jones (evanj) wrote :
Revision history for this message
Stewart Smith (stewart) wrote : Re: [Bug 687582] [NEW] libdrizzle: performance: don't memset(0) drizzle_column_st, which is huge

On Wed, 08 Dec 2010 22:34:15 -0000, Evan Jones <email address hidden> wrote:
> drizzle_column_create gets called once for every column in every result
> set that is received by libdrizzle. This function calls
> memset(column_ptr, 0, sizeof(*column_ptr)). This structure is ~6 kB in
> size. For my performance sensitive libdrizzle application that reads
> many small result sets, this means it is writing far more zeros to
> memory than actual data. Removing this call to memset improved the
> throughput of my application by 25%!

Cool!

Have you run it through with valgrind?

--
Stewart Smith

Revision history for this message
Monty Taylor (mordred) wrote :

"In my application, this runs with ~10 concurrent requests without valgrind complaining."

I'm going to say then that, yeah - he's run it through valgrind. :)

Revision history for this message
Evan Jones (evanj) wrote :

Yes, valgrinded. But as I said: my application is relatively simple, and I don't understand libdrizzle *super* well. I think I initialized the only required fields, but maybe I'm wrong in some cases?

Revision history for this message
Monty Taylor (mordred) wrote :

Hi!

Any chance I could get you to try this alternate patch and see how it does on performance for you?

http://paste.drizzle.org/show/235/

I memset most of the struct this time, but skipped the big arrays. This works in the drizzle test suite. BUT - if this is too slow, I'll go back through and see if I can adjust again...

Revision history for this message
Evan Jones (evanj) wrote :

Short version: This seems to even make things slightly better (cache effects?) But I don't think your patch is quite correct:

Your patch has:

memset(&((column)->charset), 0, sizeof(drizzle_charset_t) + sizeof(uint32_t) + sizeof(size_t) + sizeof(drizzle_column_type_t) + sizeof(int) + sizeof(uint8_t) + sizeof(uint8_t) + sizeof(size_t));

Except the fields after charset are:

  drizzle_charset_t charset;
  uint32_t size;
  size_t max_size;
  drizzle_column_type_t type;
  drizzle_column_flags_t flags;
  uint8_t decimals;
  uint8_t default_value[DRIZZLE_MAX_DEFAULT_VALUE_SIZE];
  size_t default_value_size;

You have an "int" where you probably want "drizzle_column_flags_t", and default_value is an array of length 2048, which you don't account for. Also: due to structure padding, this memset isn't guaranteed to zero all these fields anyway. In particular, this struct mixes values of different sizes, so there is a good chance there is some padding somewhere in here.

My suggestion: If you really want to partially zero the structure, maybe zero each field explicitly and let the compiler figure it out? I tried putting the following code in drizzle_column_create:

  column->result = result;
  /* SET BELOW: column->next */
  column->prev = NULL;
  /* SET ABOVE: column->options */
  column->catalog[0] = '\0';
  column->db[0] = '\0';
  column->table[0] = '\0';
  column->orig_table[0] = '\0';
  column->name[0] = '\0';
  column->orig_name[0] = '\0';
  column->charset = 0;
  column->size = 0;
  column->max_size = 0;
  column->type = 0;
  column->flags = 0;
  column->decimals = 0;
  /* UNSET: column->default_value */
  column->default_value_size = 0;

With my original patch, my code pushes ~47 900 requests per second to MySQL (3 attempts; 60 seconds each). With this version, it pushes ~49 100 requests per second. Amazing! Maybe this effectively "prefetches" the right parts of the column structure (eg. the beginning of all the strings), which then gets filled with the appropriate data? Just as a sanity check, I reverted this change and went back to the memset: ~42 200 requests per second.

Aside: These numbers change a bit depending on how I use taskset to assign either my program or MySQL to various cores. If I let the Linux kernel do its own thing, this original patch makes a bigger difference, probably because it is less cache friendly?

Revision history for this message
Monty Taylor (mordred) wrote : Re: [Bug 687582] Re: libdrizzle: performance: don't memset(0) drizzle_column_st, which is huge

On 12/10/2010 04:54 PM, Evan Jones wrote:
> Short version: This seems to even make things slightly better (cache
> effects?) But I don't think your patch is quite correct:
>
> Your patch has:
>
> memset(&((column)->charset), 0, sizeof(drizzle_charset_t) +
> sizeof(uint32_t) + sizeof(size_t) + sizeof(drizzle_column_type_t) +
> sizeof(int) + sizeof(uint8_t) + sizeof(uint8_t) + sizeof(size_t));
>
> Except the fields after charset are:
>
> drizzle_charset_t charset;
> uint32_t size;
> size_t max_size;
> drizzle_column_type_t type;
> drizzle_column_flags_t flags;
> uint8_t decimals;
> uint8_t default_value[DRIZZLE_MAX_DEFAULT_VALUE_SIZE];
> size_t default_value_size;
>
>
> You have an "int" where you probably want "drizzle_column_flags_t", and default_value is an array of length 2048, which you don't account for. Also: due to structure padding, this memset isn't guaranteed to zero all these fields anyway. In particular, this struct mixes values of different sizes, so there is a good chance there is some padding somewhere in here.

Doh and doh.

> My suggestion: If you really want to partially zero the structure, maybe
> zero each field explicitly and let the compiler figure it out? I tried
> putting the following code in drizzle_column_create:
>
> column->result = result;
> /* SET BELOW: column->next */
> column->prev = NULL;
> /* SET ABOVE: column->options */
> column->catalog[0] = '\0';
> column->db[0] = '\0';
> column->table[0] = '\0';
> column->orig_table[0] = '\0';
> column->name[0] = '\0';
> column->orig_name[0] = '\0';
> column->charset = 0;
> column->size = 0;
> column->max_size = 0;
> column->type = 0;
> column->flags = 0;
> column->decimals = 0;
> /* UNSET: column->default_value */
> column->default_value_size = 0;
>
> With my original patch, my code pushes ~47 900 requests per second to
> MySQL (3 attempts; 60 seconds each). With this version, it pushes ~49
> 100 requests per second. Amazing! Maybe this effectively "prefetches"
> the right parts of the column structure (eg. the beginning of all the
> strings), which then gets filled with the appropriate data? Just as a
> sanity check, I reverted this change and went back to the memset: ~42
> 200 requests per second.

Excellent. I'm going to apply that and run it through drizzle tests to
see if it all works there (I'm sure it will)

> Aside: These numbers change a bit depending on how I use taskset to
> assign either my program or MySQL to various cores. If I let the Linux
> kernel do its own thing, this original patch makes a bigger difference,
> probably because it is less cache friendly?

Fascinating... but yeah, I'm guessing you're probably right there...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.