Merge lp:~jaypipes/drizzle/bugs into lp:~drizzle-trunk/drizzle/development
- bugs
- Merge into development
Status: | Merged | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Merged at revision: | not available | ||||||||||||||||||||||||
Proposed branch: | lp:~jaypipes/drizzle/bugs | ||||||||||||||||||||||||
Merge into: | lp:~drizzle-trunk/drizzle/development | ||||||||||||||||||||||||
Diff against target: |
129 lines (+23/-23) 2 files modified
drizzled/temporal_format.cc (+23/-22) drizzled/temporal_format.h (+0/-1) |
||||||||||||||||||||||||
To merge this branch: | bzr merge lp:~jaypipes/drizzle/bugs | ||||||||||||||||||||||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Brian Aker | Pending | ||
Drizzle Developers | Pending | ||
Review via email: mp+16579@code.launchpad.net |
Commit message
Description of the change
Jay Pipes (jaypipes) wrote : | # |
Monty Taylor (mordred) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Looks good to me in general. Why aren't you using a std::vector for
match vector?
Jay Pipes wrote:
> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>
> Requested reviews:
> Brian Aker (brianaker)
> Drizzle-developers (drizzle-
> Related bugs:
> #384531 compile failure centos 5.2
> https:/
> #402831 Query Logging output missing a comma
> https:/
> #402855 session-
> https:/
> #409711 plugin/
> https:/
> #434128 Build failure on debian
> https:/
>
>
>
>
> Fixes LP Bug #500031:
>
> "dbt2 fails with 1024 connections"
>
> After investigation into this, I discovered that there was a
> race condition in TemporalFormat:
>
> TemporalFormat::_re is the compiled PCRE regular expression object
> inside each of the TemporalFormat objects, which are shared
> among all threads and live in global scope.
>
> Unfortunately, TemporalFormat:
> variable TemporalFormat:
> At high concurrency, this means that the following race
> condition could happen:
>
> Thread 1 executes pcre_exec() and finds a match, therefore
> populating TemporalFormat:
> with the position offsets of the matched pieces of the temporal object.
>
> Thread 1, during construction of the Temporal output of
> TemporalFormat:
> offsets in calling std::string::substr on a copy of the
> matched string, essentially "cutting up" the string
> into year, month, day, etc.
>
> Thread 2 executes pcre_exec() and also finds a match,
> thereby changing TemporalFormat:
> different
>
> Thread 1 continues trying to use std::string:
> but now uses offsets that are invalid for its string,
> thereby producing an out_of_range exception.
>
> The solution is to pull the TemporalFormat:
> member variable and instead put a function-
> match_vector variable on the stack inside TemporalFormat:
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAks
0tEAn1IV+
=oJvO
-----END PGP SIGNATURE-----
Jay Pipes (jaypipes) wrote : | # |
Oh, it's the PCRE library terminology, that's all. The "match vector"
is an address of the set of integer offsets that you pass to pcre_exec().
Monty Taylor wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Looks good to me in general. Why aren't you using a std::vector for
> match vector?
>
> Jay Pipes wrote:
>> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>>
>> Requested reviews:
>> Brian Aker (brianaker)
>> Drizzle-developers (drizzle-
>> Related bugs:
>> #384531 compile failure centos 5.2
>> https:/
>> #402831 Query Logging output missing a comma
>> https:/
>> #402855 session-
>> https:/
>> #409711 plugin/
>> https:/
>> #434128 Build failure on debian
>> https:/
>>
>>
>>
>>
>> Fixes LP Bug #500031:
>>
>> "dbt2 fails with 1024 connections"
>>
>> After investigation into this, I discovered that there was a
>> race condition in TemporalFormat:
>>
>> TemporalFormat::_re is the compiled PCRE regular expression object
>> inside each of the TemporalFormat objects, which are shared
>> among all threads and live in global scope.
>>
>> Unfortunately, TemporalFormat:
>> variable TemporalFormat:
>> At high concurrency, this means that the following race
>> condition could happen:
>>
>> Thread 1 executes pcre_exec() and finds a match, therefore
>> populating TemporalFormat:
>> with the position offsets of the matched pieces of the temporal object.
>>
>> Thread 1, during construction of the Temporal output of
>> TemporalFormat:
>> offsets in calling std::string::substr on a copy of the
>> matched string, essentially "cutting up" the string
>> into year, month, day, etc.
>>
>> Thread 2 executes pcre_exec() and also finds a match,
>> thereby changing TemporalFormat:
>> different
>>
>> Thread 1 continues trying to use std::string:
>> but now uses offsets that are invalid for its string,
>> thereby producing an out_of_range exception.
>>
>> The solution is to pull the TemporalFormat:
>> member variable and instead put a function-
>> match_vector variable on the stack inside TemporalFormat:
>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://
>
> iEYEARECAAYFAks
> 0tEAn1IV+
> =oJvO
> -----END PGP SIGNATURE-----
Monty Taylor (mordred) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Fair enough. Just to be annoying... you know you can do this:
/* initialize match_vector to OUT_VECTOR_SIZE elements containing 0 */
vector<int> match_vector(
int32_t result= pcre_exec(_re
and from there on treat match_vector like an actual std::vector, right?
One of the fun bits with vector - guaranteed contiguous memory storage.
Monty
Jay Pipes wrote:
> Oh, it's the PCRE library terminology, that's all. The "match vector"
> is an address of the set of integer offsets that you pass to pcre_exec().
>
> Monty Taylor wrote:
> Looks good to me in general. Why aren't you using a std::vector for
> match vector?
>
> Jay Pipes wrote:
>>>> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>>>>
>>>> Requested reviews:
>>>> Brian Aker (brianaker)
>>>> Drizzle-developers (drizzle-
>>>> Related bugs:
>>>> #384531 compile failure centos 5.2
>>>> https:/
>>>> #402831 Query Logging output missing a comma
>>>> https:/
>>>> #402855 session-
>>>> https:/
>>>> #409711 plugin/
>>>> https:/
>>>> #434128 Build failure on debian
>>>> https:/
>>>>
>>>>
>>>>
>>>>
>>>> Fixes LP Bug #500031:
>>>>
>>>> "dbt2 fails with 1024 connections"
>>>>
>>>> After investigation into this, I discovered that there was a
>>>> race condition in TemporalFormat:
>>>>
>>>> TemporalFormat::_re is the compiled PCRE regular expression object
>>>> inside each of the TemporalFormat objects, which are shared
>>>> among all threads and live in global scope.
>>>>
>>>> Unfortunately, TemporalFormat:
>>>> variable TemporalFormat:
>>>> At high concurrency, this means that the following race
>>>> condition could happen:
>>>>
>>>> Thread 1 executes pcre_exec() and finds a match, therefore
>>>> populating TemporalFormat:
>>>> with the position offsets of the matched pieces of the temporal object.
>>>>
>>>> Thread 1, during construction of the Temporal output of
>>>> TemporalFormat:
>>>> offsets in calling std::string::substr on a copy of the
>>>> matched string, essentially "cutting up" the string
>>>> into year, month, day, etc.
>>>>
>>>> Thread 2 executes pcre_exec() and also finds a match,
>>>> thereby changing TemporalFormat:
>>>> different
>>>>
>>>> Thread 1 continues trying to use std::string:
Jay Pipes (jaypipes) wrote : | # |
yeah, yeah :P showoff.
Monty Taylor wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Fair enough. Just to be annoying... you know you can do this:
>
> /* initialize match_vector to OUT_VECTOR_SIZE elements containing 0 */
> vector<int> match_vector(
>
> int32_t result= pcre_exec(_re
> , NULL /* No extra data */
> , data_len
> , 0 /* Start at offset 0 of subject...*/
> , 0 /* Default options */
> , &match_vector[0]
> , OUT_VECTOR_SIZE
> );
>
> and from there on treat match_vector like an actual std::vector, right?
> One of the fun bits with vector - guaranteed contiguous memory storage.
>
> Monty
>
> Jay Pipes wrote:
>> Oh, it's the PCRE library terminology, that's all. The "match vector"
>> is an address of the set of integer offsets that you pass to pcre_exec().
>>
>> Monty Taylor wrote:
>> Looks good to me in general. Why aren't you using a std::vector for
>> match vector?
>>
>> Jay Pipes wrote:
>>>>> Jay Pipes has proposed merging lp:~jaypipes/drizzle/bugs into lp:drizzle.
>>>>>
>>>>> Requested reviews:
>>>>> Brian Aker (brianaker)
>>>>> Drizzle-developers (drizzle-
>>>>> Related bugs:
>>>>> #384531 compile failure centos 5.2
>>>>> https:/
>>>>> #402831 Query Logging output missing a comma
>>>>> https:/
>>>>> #402855 session-
>>>>> https:/
>>>>> #409711 plugin/
>>>>> https:/
>>>>> #434128 Build failure on debian
>>>>> https:/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Fixes LP Bug #500031:
>>>>>
>>>>> "dbt2 fails with 1024 connections"
>>>>>
>>>>> After investigation into this, I discovered that there was a
>>>>> race condition in TemporalFormat:
>>>>>
>>>>> TemporalFormat::_re is the compiled PCRE regular expression object
>>>>> inside each of the TemporalFormat objects, which are shared
>>>>> among all threads and live in global scope.
>>>>>
>>>>> Unfortunately, TemporalFormat:
>>>>> variable TemporalFormat:
>>>>> At high concurrency, this means that the following race
>>>>> condition could happen:
>>>>>
>>>>> Thread 1 executes pcre_exec() and finds a match, therefore
>>>>> populating TemporalFormat:
>>>>> with the position offsets of the matched pieces of the temporal object.
>>>>>
>>>>> Thread 1, during construction of the Temporal output of
>>>>> TemporalFormat:
>>>>> offsets in calling std::string::substr on a copy of the
>>>>> matched string, essentially "cutting up" the string
>>>>> into year, month, day, etc.
>>>>>
>>>>> Thread 2 executes pcre_exec() and also finds a match,
>>>>> t...
Preview Diff
1 | === modified file 'drizzled/temporal_format.cc' | |||
2 | --- drizzled/temporal_format.cc 2009-12-11 21:54:18 +0000 | |||
3 | +++ drizzled/temporal_format.cc 2009-12-24 22:22:12 +0000 | |||
4 | @@ -41,8 +41,7 @@ | |||
5 | 41 | namespace drizzled | 41 | namespace drizzled |
6 | 42 | { | 42 | { |
7 | 43 | 43 | ||
10 | 44 | TemporalFormat::TemporalFormat(const char *pattern) | 44 | TemporalFormat::TemporalFormat(const char *pattern) : |
9 | 45 | : | ||
11 | 46 | _pattern(pattern) | 45 | _pattern(pattern) |
12 | 47 | , _error_offset(0) | 46 | , _error_offset(0) |
13 | 48 | , _error(NULL) | 47 | , _error(NULL) |
14 | @@ -55,9 +54,6 @@ | |||
15 | 55 | , _usecond_part_index(0) | 54 | , _usecond_part_index(0) |
16 | 56 | , _nsecond_part_index(0) | 55 | , _nsecond_part_index(0) |
17 | 57 | { | 56 | { |
18 | 58 | /* Make sure we've got no junk in the match_vector. */ | ||
19 | 59 | memset(_match_vector, 0, sizeof(_match_vector)); | ||
20 | 60 | |||
21 | 61 | /* Compile our regular expression */ | 57 | /* Compile our regular expression */ |
22 | 62 | _re= pcre_compile(pattern | 58 | _re= pcre_compile(pattern |
23 | 63 | , 0 /* Default options */ | 59 | , 0 /* Default options */ |
24 | @@ -71,7 +67,12 @@ | |||
25 | 71 | { | 67 | { |
26 | 72 | if (! is_valid()) | 68 | if (! is_valid()) |
27 | 73 | return false; | 69 | return false; |
28 | 70 | |||
29 | 71 | int32_t match_vector[OUT_VECTOR_SIZE]; /**< Stores match substring indexes */ | ||
30 | 74 | 72 | ||
31 | 73 | /* Make sure we've got no junk in the match_vector. */ | ||
32 | 74 | memset(match_vector, 0, sizeof(match_vector)); | ||
33 | 75 | |||
34 | 75 | /* Simply check the subject against the compiled regular expression */ | 76 | /* Simply check the subject against the compiled regular expression */ |
35 | 76 | int32_t result= pcre_exec(_re | 77 | int32_t result= pcre_exec(_re |
36 | 77 | , NULL /* No extra data */ | 78 | , NULL /* No extra data */ |
37 | @@ -79,7 +80,7 @@ | |||
38 | 79 | , data_len | 80 | , data_len |
39 | 80 | , 0 /* Start at offset 0 of subject...*/ | 81 | , 0 /* Start at offset 0 of subject...*/ |
40 | 81 | , 0 /* Default options */ | 82 | , 0 /* Default options */ |
42 | 82 | , _match_vector | 83 | , match_vector |
43 | 83 | , OUT_VECTOR_SIZE | 84 | , OUT_VECTOR_SIZE |
44 | 84 | ); | 85 | ); |
45 | 85 | if (result < 0) | 86 | if (result < 0) |
46 | @@ -119,46 +120,46 @@ | |||
47 | 119 | */ | 120 | */ |
48 | 120 | if (_year_part_index > 1) | 121 | if (_year_part_index > 1) |
49 | 121 | { | 122 | { |
52 | 122 | size_t year_start= _match_vector[_year_part_index]; | 123 | size_t year_start= match_vector[_year_part_index]; |
53 | 123 | size_t year_len= _match_vector[_year_part_index + 1] - _match_vector[_year_part_index]; | 124 | size_t year_len= match_vector[_year_part_index + 1] - match_vector[_year_part_index]; |
54 | 124 | to->_years= atoi(copy_data.substr(year_start, year_len).c_str()); | 125 | to->_years= atoi(copy_data.substr(year_start, year_len).c_str()); |
55 | 125 | if (year_len == 2) | 126 | if (year_len == 2) |
56 | 126 | to->_years+= (to->_years >= DRIZZLE_YY_PART_YEAR ? 1900 : 2000); | 127 | to->_years+= (to->_years >= DRIZZLE_YY_PART_YEAR ? 1900 : 2000); |
57 | 127 | } | 128 | } |
58 | 128 | if (_month_part_index > 1) | 129 | if (_month_part_index > 1) |
59 | 129 | { | 130 | { |
62 | 130 | size_t month_start= _match_vector[_month_part_index]; | 131 | size_t month_start= match_vector[_month_part_index]; |
63 | 131 | size_t month_len= _match_vector[_month_part_index + 1] - _match_vector[_month_part_index]; | 132 | size_t month_len= match_vector[_month_part_index + 1] - match_vector[_month_part_index]; |
64 | 132 | to->_months= atoi(copy_data.substr(month_start, month_len).c_str()); | 133 | to->_months= atoi(copy_data.substr(month_start, month_len).c_str()); |
65 | 133 | } | 134 | } |
66 | 134 | if (_day_part_index > 1) | 135 | if (_day_part_index > 1) |
67 | 135 | { | 136 | { |
70 | 136 | size_t day_start= _match_vector[_day_part_index]; | 137 | size_t day_start= match_vector[_day_part_index]; |
71 | 137 | size_t day_len= _match_vector[_day_part_index + 1] - _match_vector[_day_part_index]; | 138 | size_t day_len= match_vector[_day_part_index + 1] - match_vector[_day_part_index]; |
72 | 138 | to->_days= atoi(copy_data.substr(day_start, day_len).c_str()); | 139 | to->_days= atoi(copy_data.substr(day_start, day_len).c_str()); |
73 | 139 | } | 140 | } |
74 | 140 | if (_hour_part_index > 1) | 141 | if (_hour_part_index > 1) |
75 | 141 | { | 142 | { |
78 | 142 | size_t hour_start= _match_vector[_hour_part_index]; | 143 | size_t hour_start= match_vector[_hour_part_index]; |
79 | 143 | size_t hour_len= _match_vector[_hour_part_index + 1] - _match_vector[_hour_part_index]; | 144 | size_t hour_len= match_vector[_hour_part_index + 1] - match_vector[_hour_part_index]; |
80 | 144 | to->_hours= atoi(copy_data.substr(hour_start, hour_len).c_str()); | 145 | to->_hours= atoi(copy_data.substr(hour_start, hour_len).c_str()); |
81 | 145 | } | 146 | } |
82 | 146 | if (_minute_part_index > 1) | 147 | if (_minute_part_index > 1) |
83 | 147 | { | 148 | { |
86 | 148 | size_t minute_start= _match_vector[_minute_part_index]; | 149 | size_t minute_start= match_vector[_minute_part_index]; |
87 | 149 | size_t minute_len= _match_vector[_minute_part_index + 1] - _match_vector[_minute_part_index]; | 150 | size_t minute_len= match_vector[_minute_part_index + 1] - match_vector[_minute_part_index]; |
88 | 150 | to->_minutes= atoi(copy_data.substr(minute_start, minute_len).c_str()); | 151 | to->_minutes= atoi(copy_data.substr(minute_start, minute_len).c_str()); |
89 | 151 | } | 152 | } |
90 | 152 | if (_second_part_index > 1) | 153 | if (_second_part_index > 1) |
91 | 153 | { | 154 | { |
94 | 154 | size_t second_start= _match_vector[_second_part_index]; | 155 | size_t second_start= match_vector[_second_part_index]; |
95 | 155 | size_t second_len= _match_vector[_second_part_index + 1] - _match_vector[_second_part_index]; | 156 | size_t second_len= match_vector[_second_part_index + 1] - match_vector[_second_part_index]; |
96 | 156 | to->_seconds= atoi(copy_data.substr(second_start, second_len).c_str()); | 157 | to->_seconds= atoi(copy_data.substr(second_start, second_len).c_str()); |
97 | 157 | } | 158 | } |
98 | 158 | if (_usecond_part_index > 1) | 159 | if (_usecond_part_index > 1) |
99 | 159 | { | 160 | { |
102 | 160 | size_t usecond_start= _match_vector[_usecond_part_index]; | 161 | size_t usecond_start= match_vector[_usecond_part_index]; |
103 | 161 | size_t usecond_len= _match_vector[_usecond_part_index + 1] - _match_vector[_usecond_part_index]; | 162 | size_t usecond_len= match_vector[_usecond_part_index + 1] - match_vector[_usecond_part_index]; |
104 | 162 | /* | 163 | /* |
105 | 163 | * For microseconds, which are millionth of 1 second, | 164 | * For microseconds, which are millionth of 1 second, |
106 | 164 | * we must ensure that we produce a correct result, | 165 | * we must ensure that we produce a correct result, |
107 | @@ -176,8 +177,8 @@ | |||
108 | 176 | } | 177 | } |
109 | 177 | if (_nsecond_part_index > 1) | 178 | if (_nsecond_part_index > 1) |
110 | 178 | { | 179 | { |
113 | 179 | size_t nsecond_start= _match_vector[_nsecond_part_index]; | 180 | size_t nsecond_start= match_vector[_nsecond_part_index]; |
114 | 180 | size_t nsecond_len= _match_vector[_nsecond_part_index + 1] - _match_vector[_nsecond_part_index]; | 181 | size_t nsecond_len= match_vector[_nsecond_part_index + 1] - match_vector[_nsecond_part_index]; |
115 | 181 | /* | 182 | /* |
116 | 182 | * For nanoseconds, which are 1 billionth of a second, | 183 | * For nanoseconds, which are 1 billionth of a second, |
117 | 183 | * we must ensure that we produce a correct result, | 184 | * we must ensure that we produce a correct result, |
118 | 184 | 185 | ||
119 | === modified file 'drizzled/temporal_format.h' | |||
120 | --- drizzled/temporal_format.h 2009-08-30 00:26:17 +0000 | |||
121 | +++ drizzled/temporal_format.h 2009-12-24 22:22:12 +0000 | |||
122 | @@ -49,7 +49,6 @@ | |||
123 | 49 | pcre *_re; /**< The compiled regular expression struct */ | 49 | pcre *_re; /**< The compiled regular expression struct */ |
124 | 50 | int32_t _error_offset; /**< Any error encountered during compilation or matching */ | 50 | int32_t _error_offset; /**< Any error encountered during compilation or matching */ |
125 | 51 | const char *_error; | 51 | const char *_error; |
126 | 52 | int32_t _match_vector[OUT_VECTOR_SIZE]; /**< Stores match substring indexes */ | ||
127 | 53 | /* Index of the pattern which is a specific temporal part */ | 52 | /* Index of the pattern which is a specific temporal part */ |
128 | 54 | uint32_t _year_part_index; | 53 | uint32_t _year_part_index; |
129 | 55 | uint32_t _month_part_index; | 54 | uint32_t _month_part_index; |
Fixes LP Bug #500031:
"dbt2 fails with 1024 connections"
After investigation into this, I discovered that there was a :match( ):
race condition in TemporalFormat:
TemporalFor mat::_re is the compiled PCRE regular expression object
inside each of the TemporalFormat objects, which are shared
among all threads and live in global scope.
Unfortunately, TemporalFormat: :match( ) was using the member :_match_ vector as its match state.
variable TemporalFormat:
At high concurrency, this means that the following race
condition could happen:
Thread 1 executes pcre_exec() and finds a match, therefore :_match_ vector of integers
populating TemporalFormat:
with the position offsets of the matched pieces of the temporal object.
Thread 1, during construction of the Temporal output of mat::match( ), uses these _match_vector position
TemporalFor
offsets in calling std::string::substr on a copy of the
matched string, essentially "cutting up" the string
into year, month, day, etc.
Thread 2 executes pcre_exec() and also finds a match, :_match_ vector to something
thereby changing TemporalFormat:
different
Thread 1 continues trying to use std::string: :substr( ),
but now uses offsets that are invalid for its string,
thereby producing an out_of_range exception.
The solution is to pull the TemporalFormat: :_match_ vector scope-level :match( ).
member variable and instead put a function-
match_vector variable on the stack inside TemporalFormat: