Merge lp:~jaypipes/drizzle/bugs into lp:~drizzle-trunk/drizzle/development

Proposed by Jay Pipes
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jaypipes/drizzle/bugs
Merge into: lp:~drizzle-trunk/drizzle/development
Diff against target: 129 lines (+23/-23)
2 files modified
drizzled/temporal_format.cc (+23/-22)
drizzled/temporal_format.h (+0/-1)
To merge this branch: bzr merge lp:~jaypipes/drizzle/bugs
Reviewer Review Type Date Requested Status
Brian Aker Pending
Drizzle Developers Pending
Review via email: mp+16579@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Jay Pipes (jaypipes) wrote :

    Fixes LP Bug #500031:

    "dbt2 fails with 1024 connections"

    After investigation into this, I discovered that there was a
    race condition in TemporalFormat::match():

    TemporalFormat::_re is the compiled PCRE regular expression object
    inside each of the TemporalFormat objects, which are shared
    among all threads and live in global scope.

    Unfortunately, TemporalFormat::match() was using the member
    variable TemporalFormat::_match_vector as its match state.
    At high concurrency, this means that the following race
    condition could happen:

    Thread 1 executes pcre_exec() and finds a match, therefore
    populating TemporalFormat::_match_vector of integers
    with the position offsets of the matched pieces of the temporal object.

    Thread 1, during construction of the Temporal output of
    TemporalFormat::match(), uses these _match_vector position
    offsets in calling std::string::substr on a copy of the
    matched string, essentially "cutting up" the string
    into year, month, day, etc.

    Thread 2 executes pcre_exec() and also finds a match,
    thereby changing TemporalFormat::_match_vector to something
    different

    Thread 1 continues trying to use std::string::substr(),
    but now uses offsets that are invalid for its string,
    thereby producing an out_of_range exception.

    The solution is to pull the TemporalFormat::_match_vector
    member variable and instead put a function-scope-level
    match_vector variable on the stack inside TemporalFormat::match().

Revision history for this message
Monty Taylor (mordred) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Looks good to me in general. Why aren't you using a std::vector for
match vector?

Jay Pipes wrote:
> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>
> Requested reviews:
> Brian Aker (brianaker)
> Drizzle-developers (drizzle-developers)
> Related bugs:
> #384531 compile failure centos 5.2
> https://bugs.launchpad.net/bugs/384531
> #402831 Query Logging output missing a comma
> https://bugs.launchpad.net/bugs/402831
> #402855 session->connect_utime doesn't appear to be populated
> https://bugs.launchpad.net/bugs/402855
> #409711 plugin/command_log/command_log.cc can't be compiled on 32 bit platforms
> https://bugs.launchpad.net/bugs/409711
> #434128 Build failure on debian
> https://bugs.launchpad.net/bugs/434128
>
>
>
>
> Fixes LP Bug #500031:
>
> "dbt2 fails with 1024 connections"
>
> After investigation into this, I discovered that there was a
> race condition in TemporalFormat::match():
>
> TemporalFormat::_re is the compiled PCRE regular expression object
> inside each of the TemporalFormat objects, which are shared
> among all threads and live in global scope.
>
> Unfortunately, TemporalFormat::match() was using the member
> variable TemporalFormat::_match_vector as its match state.
> At high concurrency, this means that the following race
> condition could happen:
>
> Thread 1 executes pcre_exec() and finds a match, therefore
> populating TemporalFormat::_match_vector of integers
> with the position offsets of the matched pieces of the temporal object.
>
> Thread 1, during construction of the Temporal output of
> TemporalFormat::match(), uses these _match_vector position
> offsets in calling std::string::substr on a copy of the
> matched string, essentially "cutting up" the string
> into year, month, day, etc.
>
> Thread 2 executes pcre_exec() and also finds a match,
> thereby changing TemporalFormat::_match_vector to something
> different
>
> Thread 1 continues trying to use std::string::substr(),
> but now uses offsets that are invalid for its string,
> thereby producing an out_of_range exception.
>
> The solution is to pull the TemporalFormat::_match_vector
> member variable and instead put a function-scope-level
> match_vector variable on the stack inside TemporalFormat::match().
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksz8OcACgkQ2Jv7/VK1RgGUHQCeIYj2o9RCC7rIbG+GLXyZqOmN
0tEAn1IV+nwIJszqCqtimktphYi01ymm
=oJvO
-----END PGP SIGNATURE-----

Revision history for this message
Jay Pipes (jaypipes) wrote :

Oh, it's the PCRE library terminology, that's all. The "match vector"
is an address of the set of integer offsets that you pass to pcre_exec().

Monty Taylor wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Looks good to me in general. Why aren't you using a std::vector for
> match vector?
>
> Jay Pipes wrote:
>> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>>
>> Requested reviews:
>> Brian Aker (brianaker)
>> Drizzle-developers (drizzle-developers)
>> Related bugs:
>> #384531 compile failure centos 5.2
>> https://bugs.launchpad.net/bugs/384531
>> #402831 Query Logging output missing a comma
>> https://bugs.launchpad.net/bugs/402831
>> #402855 session->connect_utime doesn't appear to be populated
>> https://bugs.launchpad.net/bugs/402855
>> #409711 plugin/command_log/command_log.cc can't be compiled on 32 bit platforms
>> https://bugs.launchpad.net/bugs/409711
>> #434128 Build failure on debian
>> https://bugs.launchpad.net/bugs/434128
>>
>>
>>
>>
>> Fixes LP Bug #500031:
>>
>> "dbt2 fails with 1024 connections"
>>
>> After investigation into this, I discovered that there was a
>> race condition in TemporalFormat::match():
>>
>> TemporalFormat::_re is the compiled PCRE regular expression object
>> inside each of the TemporalFormat objects, which are shared
>> among all threads and live in global scope.
>>
>> Unfortunately, TemporalFormat::match() was using the member
>> variable TemporalFormat::_match_vector as its match state.
>> At high concurrency, this means that the following race
>> condition could happen:
>>
>> Thread 1 executes pcre_exec() and finds a match, therefore
>> populating TemporalFormat::_match_vector of integers
>> with the position offsets of the matched pieces of the temporal object.
>>
>> Thread 1, during construction of the Temporal output of
>> TemporalFormat::match(), uses these _match_vector position
>> offsets in calling std::string::substr on a copy of the
>> matched string, essentially "cutting up" the string
>> into year, month, day, etc.
>>
>> Thread 2 executes pcre_exec() and also finds a match,
>> thereby changing TemporalFormat::_match_vector to something
>> different
>>
>> Thread 1 continues trying to use std::string::substr(),
>> but now uses offsets that are invalid for its string,
>> thereby producing an out_of_range exception.
>>
>> The solution is to pull the TemporalFormat::_match_vector
>> member variable and instead put a function-scope-level
>> match_vector variable on the stack inside TemporalFormat::match().
>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAksz8OcACgkQ2Jv7/VK1RgGUHQCeIYj2o9RCC7rIbG+GLXyZqOmN
> 0tEAn1IV+nwIJszqCqtimktphYi01ymm
> =oJvO
> -----END PGP SIGNATURE-----

Revision history for this message
Monty Taylor (mordred) wrote :
Download full text (3.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Fair enough. Just to be annoying... you know you can do this:

/* initialize match_vector to OUT_VECTOR_SIZE elements containing 0 */
vector<int> match_vector(OUT_VECTOR_SIZE, 0);

int32_t result= pcre_exec(_re
                             , NULL /* No extra data */
                             , data_len
                             , 0 /* Start at offset 0 of subject...*/
                             , 0 /* Default options */
                             , &match_vector[0]
                             , OUT_VECTOR_SIZE
                             );

and from there on treat match_vector like an actual std::vector, right?
One of the fun bits with vector - guaranteed contiguous memory storage.

Monty

Jay Pipes wrote:
> Oh, it's the PCRE library terminology, that's all. The "match vector"
> is an address of the set of integer offsets that you pass to pcre_exec().
>
> Monty Taylor wrote:
> Looks good to me in general. Why aren't you using a std::vector for
> match vector?
>
> Jay Pipes wrote:
>>>> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>>>>
>>>> Requested reviews:
>>>> Brian Aker (brianaker)
>>>> Drizzle-developers (drizzle-developers)
>>>> Related bugs:
>>>> #384531 compile failure centos 5.2
>>>> https://bugs.launchpad.net/bugs/384531
>>>> #402831 Query Logging output missing a comma
>>>> https://bugs.launchpad.net/bugs/402831
>>>> #402855 session->connect_utime doesn't appear to be populated
>>>> https://bugs.launchpad.net/bugs/402855
>>>> #409711 plugin/command_log/command_log.cc can't be compiled on 32 bit platforms
>>>> https://bugs.launchpad.net/bugs/409711
>>>> #434128 Build failure on debian
>>>> https://bugs.launchpad.net/bugs/434128
>>>>
>>>>
>>>>
>>>>
>>>> Fixes LP Bug #500031:
>>>>
>>>> "dbt2 fails with 1024 connections"
>>>>
>>>> After investigation into this, I discovered that there was a
>>>> race condition in TemporalFormat::match():
>>>>
>>>> TemporalFormat::_re is the compiled PCRE regular expression object
>>>> inside each of the TemporalFormat objects, which are shared
>>>> among all threads and live in global scope.
>>>>
>>>> Unfortunately, TemporalFormat::match() was using the member
>>>> variable TemporalFormat::_match_vector as its match state.
>>>> At high concurrency, this means that the following race
>>>> condition could happen:
>>>>
>>>> Thread 1 executes pcre_exec() and finds a match, therefore
>>>> populating TemporalFormat::_match_vector of integers
>>>> with the position offsets of the matched pieces of the temporal object.
>>>>
>>>> Thread 1, during construction of the Temporal output of
>>>> TemporalFormat::match(), uses these _match_vector position
>>>> offsets in calling std::string::substr on a copy of the
>>>> matched string, essentially "cutting up" the string
>>>> into year, month, day, etc.
>>>>
>>>> Thread 2 executes pcre_exec() and also finds a match,
>>>> thereby changing TemporalFormat::_match_vector to something
>>>> different
>>>>
>>>> Thread 1 continues trying to use std::string::substr(...

Read more...

Revision history for this message
Jay Pipes (jaypipes) wrote :
Download full text (3.9 KiB)

yeah, yeah :P showoff.

Monty Taylor wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Fair enough. Just to be annoying... you know you can do this:
>
> /* initialize match_vector to OUT_VECTOR_SIZE elements containing 0 */
> vector<int> match_vector(OUT_VECTOR_SIZE, 0);
>
> int32_t result= pcre_exec(_re
> , NULL /* No extra data */
> , data_len
> , 0 /* Start at offset 0 of subject...*/
> , 0 /* Default options */
> , &match_vector[0]
> , OUT_VECTOR_SIZE
> );
>
> and from there on treat match_vector like an actual std::vector, right?
> One of the fun bits with vector - guaranteed contiguous memory storage.
>
> Monty
>
> Jay Pipes wrote:
>> Oh, it's the PCRE library terminology, that's all. The "match vector"
>> is an address of the set of integer offsets that you pass to pcre_exec().
>>
>> Monty Taylor wrote:
>> Looks good to me in general. Why aren't you using a std::vector for
>> match vector?
>>
>> Jay Pipes wrote:
>>>>> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>>>>>
>>>>> Requested reviews:
>>>>> Brian Aker (brianaker)
>>>>> Drizzle-developers (drizzle-developers)
>>>>> Related bugs:
>>>>> #384531 compile failure centos 5.2
>>>>> https://bugs.launchpad.net/bugs/384531
>>>>> #402831 Query Logging output missing a comma
>>>>> https://bugs.launchpad.net/bugs/402831
>>>>> #402855 session->connect_utime doesn't appear to be populated
>>>>> https://bugs.launchpad.net/bugs/402855
>>>>> #409711 plugin/command_log/command_log.cc can't be compiled on 32 bit platforms
>>>>> https://bugs.launchpad.net/bugs/409711
>>>>> #434128 Build failure on debian
>>>>> https://bugs.launchpad.net/bugs/434128
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Fixes LP Bug #500031:
>>>>>
>>>>> "dbt2 fails with 1024 connections"
>>>>>
>>>>> After investigation into this, I discovered that there was a
>>>>> race condition in TemporalFormat::match():
>>>>>
>>>>> TemporalFormat::_re is the compiled PCRE regular expression object
>>>>> inside each of the TemporalFormat objects, which are shared
>>>>> among all threads and live in global scope.
>>>>>
>>>>> Unfortunately, TemporalFormat::match() was using the member
>>>>> variable TemporalFormat::_match_vector as its match state.
>>>>> At high concurrency, this means that the following race
>>>>> condition could happen:
>>>>>
>>>>> Thread 1 executes pcre_exec() and finds a match, therefore
>>>>> populating TemporalFormat::_match_vector of integers
>>>>> with the position offsets of the matched pieces of the temporal object.
>>>>>
>>>>> Thread 1, during construction of the Temporal output of
>>>>> TemporalFormat::match(), uses these _match_vector position
>>>>> offsets in calling std::string::substr on a copy of the
>>>>> matched string, essentially "cutting up" the string
>>>>> into year, month, day, etc.
>>>>>
>>>>> Thread 2 executes pcre_exec() and also finds a match,
>>>>> t...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'drizzled/temporal_format.cc'
2--- drizzled/temporal_format.cc 2009-12-11 21:54:18 +0000
3+++ drizzled/temporal_format.cc 2009-12-24 22:22:12 +0000
4@@ -41,8 +41,7 @@
5 namespace drizzled
6 {
7
8- TemporalFormat::TemporalFormat(const char *pattern)
9- :
10+TemporalFormat::TemporalFormat(const char *pattern) :
11 _pattern(pattern)
12 , _error_offset(0)
13 , _error(NULL)
14@@ -55,9 +54,6 @@
15 , _usecond_part_index(0)
16 , _nsecond_part_index(0)
17 {
18- /* Make sure we've got no junk in the match_vector. */
19- memset(_match_vector, 0, sizeof(_match_vector));
20-
21 /* Compile our regular expression */
22 _re= pcre_compile(pattern
23 , 0 /* Default options */
24@@ -71,7 +67,12 @@
25 {
26 if (! is_valid())
27 return false;
28+
29+ int32_t match_vector[OUT_VECTOR_SIZE]; /**< Stores match substring indexes */
30
31+ /* Make sure we've got no junk in the match_vector. */
32+ memset(match_vector, 0, sizeof(match_vector));
33+
34 /* Simply check the subject against the compiled regular expression */
35 int32_t result= pcre_exec(_re
36 , NULL /* No extra data */
37@@ -79,7 +80,7 @@
38 , data_len
39 , 0 /* Start at offset 0 of subject...*/
40 , 0 /* Default options */
41- , _match_vector
42+ , match_vector
43 , OUT_VECTOR_SIZE
44 );
45 if (result < 0)
46@@ -119,46 +120,46 @@
47 */
48 if (_year_part_index > 1)
49 {
50- size_t year_start= _match_vector[_year_part_index];
51- size_t year_len= _match_vector[_year_part_index + 1] - _match_vector[_year_part_index];
52+ size_t year_start= match_vector[_year_part_index];
53+ size_t year_len= match_vector[_year_part_index + 1] - match_vector[_year_part_index];
54 to->_years= atoi(copy_data.substr(year_start, year_len).c_str());
55 if (year_len == 2)
56 to->_years+= (to->_years >= DRIZZLE_YY_PART_YEAR ? 1900 : 2000);
57 }
58 if (_month_part_index > 1)
59 {
60- size_t month_start= _match_vector[_month_part_index];
61- size_t month_len= _match_vector[_month_part_index + 1] - _match_vector[_month_part_index];
62+ size_t month_start= match_vector[_month_part_index];
63+ size_t month_len= match_vector[_month_part_index + 1] - match_vector[_month_part_index];
64 to->_months= atoi(copy_data.substr(month_start, month_len).c_str());
65 }
66 if (_day_part_index > 1)
67 {
68- size_t day_start= _match_vector[_day_part_index];
69- size_t day_len= _match_vector[_day_part_index + 1] - _match_vector[_day_part_index];
70+ size_t day_start= match_vector[_day_part_index];
71+ size_t day_len= match_vector[_day_part_index + 1] - match_vector[_day_part_index];
72 to->_days= atoi(copy_data.substr(day_start, day_len).c_str());
73 }
74 if (_hour_part_index > 1)
75 {
76- size_t hour_start= _match_vector[_hour_part_index];
77- size_t hour_len= _match_vector[_hour_part_index + 1] - _match_vector[_hour_part_index];
78+ size_t hour_start= match_vector[_hour_part_index];
79+ size_t hour_len= match_vector[_hour_part_index + 1] - match_vector[_hour_part_index];
80 to->_hours= atoi(copy_data.substr(hour_start, hour_len).c_str());
81 }
82 if (_minute_part_index > 1)
83 {
84- size_t minute_start= _match_vector[_minute_part_index];
85- size_t minute_len= _match_vector[_minute_part_index + 1] - _match_vector[_minute_part_index];
86+ size_t minute_start= match_vector[_minute_part_index];
87+ size_t minute_len= match_vector[_minute_part_index + 1] - match_vector[_minute_part_index];
88 to->_minutes= atoi(copy_data.substr(minute_start, minute_len).c_str());
89 }
90 if (_second_part_index > 1)
91 {
92- size_t second_start= _match_vector[_second_part_index];
93- size_t second_len= _match_vector[_second_part_index + 1] - _match_vector[_second_part_index];
94+ size_t second_start= match_vector[_second_part_index];
95+ size_t second_len= match_vector[_second_part_index + 1] - match_vector[_second_part_index];
96 to->_seconds= atoi(copy_data.substr(second_start, second_len).c_str());
97 }
98 if (_usecond_part_index > 1)
99 {
100- size_t usecond_start= _match_vector[_usecond_part_index];
101- size_t usecond_len= _match_vector[_usecond_part_index + 1] - _match_vector[_usecond_part_index];
102+ size_t usecond_start= match_vector[_usecond_part_index];
103+ size_t usecond_len= match_vector[_usecond_part_index + 1] - match_vector[_usecond_part_index];
104 /*
105 * For microseconds, which are millionth of 1 second,
106 * we must ensure that we produce a correct result,
107@@ -176,8 +177,8 @@
108 }
109 if (_nsecond_part_index > 1)
110 {
111- size_t nsecond_start= _match_vector[_nsecond_part_index];
112- size_t nsecond_len= _match_vector[_nsecond_part_index + 1] - _match_vector[_nsecond_part_index];
113+ size_t nsecond_start= match_vector[_nsecond_part_index];
114+ size_t nsecond_len= match_vector[_nsecond_part_index + 1] - match_vector[_nsecond_part_index];
115 /*
116 * For nanoseconds, which are 1 billionth of a second,
117 * we must ensure that we produce a correct result,
118
119=== modified file 'drizzled/temporal_format.h'
120--- drizzled/temporal_format.h 2009-08-30 00:26:17 +0000
121+++ drizzled/temporal_format.h 2009-12-24 22:22:12 +0000
122@@ -49,7 +49,6 @@
123 pcre *_re; /**< The compiled regular expression struct */
124 int32_t _error_offset; /**< Any error encountered during compilation or matching */
125 const char *_error;
126- int32_t _match_vector[OUT_VECTOR_SIZE]; /**< Stores match substring indexes */
127 /* Index of the pattern which is a specific temporal part */
128 uint32_t _year_part_index;
129 uint32_t _month_part_index;