Merge lp:~drizzle-pbxt/drizzle/drizzle-pbxt-2 into lp:~drizzle-trunk/drizzle/development

Proposed by Paul McCullagh
Status: Work in progress
Proposed branch: lp:~drizzle-pbxt/drizzle/drizzle-pbxt-2
Merge into: lp:~drizzle-trunk/drizzle/development
Diff against target: 144096 lines (has conflicts)
Text conflict in drizzled/sql_string.h
Text conflict in drizzled/table_share.h
Text conflict in plugin/innobase/lock/lock0lock.c
Contents conflict in tests/r/information_schema.result
Text conflict in tests/t/create_not_windows.test
Text conflict in tests/t/information_schema.test
Text conflict in tests/t/subselect.test
To merge this branch: bzr merge lp:~drizzle-pbxt/drizzle/drizzle-pbxt-2
Reviewer Review Type Date Requested Status
Brian Aker Needs Resubmitting
Review via email: mp+6822@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Paul McCullagh (paul-mccullagh) wrote :

This is the current merge of the PBXT storage engine into Drizzle (revision #1039). I re-merged (using to Stewart's method) after the move of the storage engines to the plugin directory, to ensure the entire PBXT change history is included.

So bazaar merging PBXT works on this tree. i.e. in the root directory:
bzr merge lp:pbxt
correctly merges the PBXT trunk into the plugin/pbxt directory.

I have also checked the following:

- Compiles and runs on Linux, Solaris and Mac OS.
- All tests run though using: ./dtr --engine=pbxt
- Compiles and runs without atomic ops if not supported.
- Performance (lp:~drizzle-developers/sysbench/trunk) also looks OK.

Let me know if anything is missing :)

Revision history for this message
Stewart Smith (stewart) wrote :

I got this for insert_update test (./dtr --engine=pbxt):

--- /home/stewart/drizzle/drizzle-pbxt-2/tests/r/pbxt/insert_update.result 2009-05-28 15:55:48.726957481 +1000
+++ /home/stewart/drizzle/drizzle-pbxt-2/tests/r/pbxt/insert_update.reject 2009-05-29 14:01:43.915086845 +1000
@@ -58,12 +58,12 @@
 8 9 60 NULL
 explain extended SELECT *, VALUES(a) FROM t1;
 id select_type table type possible_keys key key_len ref rows filtered Extra
-1 SIMPLE t1 ALL NULL NULL NULL NULL 5 100.00
+1 SIMPLE t1 ALL NULL NULL NULL NULL 7 100.00
 Warnings:
 Note 1003 select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`c` AS `c`,values(`test`.`t1`.`a`) AS `VALUES(a)` from `test`.`t1`
 explain extended select * from t1 where values(a);
 id select_type table type possible_keys key key_len ref rows filtered Extra
-1 SIMPLE t1 ALL NULL NULL NULL NULL 5 100.00 Using where
+1 SIMPLE t1 ALL NULL NULL NULL NULL 7 100.00 Using where
 Warnings:
 Note 1003 select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`c` AS `c` from `test`.`t1` where values(`test`.`t1`.`a`)
 DROP TABLE t1;

and

--- /home/stewart/drizzle/drizzle-pbxt-2/tests/r/pbxt/type_enum.result 2009-05-20 21:41:41.506953655 +1000
+++ /home/stewart/drizzle/drizzle-pbxt-2/tests/r/pbxt/type_enum.reject 2009-05-29 14:09:57.670984523 +1000
@@ -1776,7 +1776,7 @@
 c
 EXPLAIN SELECT a FROM t1 WHERE a=0;
 id select_type table type possible_keys key key_len ref rows Extra
-1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using where
+1 SIMPLE t1 ALL NULL NULL NULL NULL 6 Using where
 SELECT a FROM t1 WHERE a=0;
 a
 ALTER TABLE t1 ADD PRIMARY KEY (a);

apart from that, I vote for merging:
- doesn't touch core (apart from us having to reslove the sql_string
c_ptr() thing)
- test fixes are all okay.
- code is quite likely to be actively maintained

--
Stewart Smith

Revision history for this message
Monty Taylor (mordred) wrote :

Stewart Smith wrote:

>
> apart from that, I vote for merging:
> - doesn't touch core (apart from us having to reslove the sql_string
> c_ptr() thing)
> - test fixes are all okay.
> - code is quite likely to be actively maintained

I second this vote. I'm working on the c_ptr() thing anyway.

Monty

Revision history for this message
Brian Aker (brianaker) wrote :

Out of date (I did ask Paul to remerge)

review: Needs Resubmitting
Revision history for this message
Paul McCullagh (paul-mccullagh) wrote :

Hi Brian,

This work is currently in progress. I think we should have it done by the end of next week.

Revision history for this message
Brian Aker (brianaker) wrote :

Awesome!

Just tell me when... we should end up with spare time on our build
servers. Tell me when you need it pushed to them and we can see how
well it passes all of the build environments.

Cheers,
 -Brian

On Oct 27, 2009, at 4:12 PM, Paul McCullagh wrote:

> Hi Brian,
>
> This work is currently in progress. I think we should have it done
> by the end of next week.
>
> --
> https://code.launchpad.net/~drizzle-pbxt/drizzle/drizzle-pbxt-2/+merge/6822
> You are reviewing the proposed merge of lp:~drizzle-pbxt/drizzle/
> drizzle-pbxt-2 into lp:drizzle.

1039. By Paul McCullagh

Merged Drizzle trunk and PBXT 1.0.09

1040. By Paul McCullagh

Changes required to compile PBXT for MySQL again

1041. By Padraig O'Sullivan

Merge trunk.

1042. By Padraig O'Sullivan

Changes required to compile PBXT with drizzle.

1043. By Padraig O'Sullivan

Updated the information_schema result file as it has changed due to the
extra I_S table that PBXT introduces.

1044. By Padraig O'Sullivan

Decided to update the information_schema test case to filter out PBXT
information schema tables in the queries. This way, we won't have to worry
about the output being different if the PBXT plugin has not been loaded.

1045. By Padraig O'Sullivan

Whoops, forgot to change renameTableImplementation to doRenameTable.

1046. By Paul McCullagh

Prototype corrections

1047. By Vladimir Kolesnikov

merge from work breanch

1048. By Vladimir Kolesnikov

fixed test-cases according to changes in drizzle

1049. By Vladimir Kolesnikov

a diff not related to PBXT

1050. By Vladimir Kolesnikov

added workaround for a test of DELETE IGNORE which is not currently supported by PBXT

1051. By Vladimir Kolesnikov

fixed resultset order

1052. By Vladimir Kolesnikov

merged Paul's fix from lp:pbxt rev.730

1053. By Vladimir Kolesnikov

more simple fixes

1054. By Vladimir Kolesnikov

added pbxt-specific action

1055. By Vladimir Kolesnikov

the original query plan can be forced by inserting more rows, but it doesn't test a storage engine's code

1056. By Vladimir Kolesnikov

explain differences mostly because of non-clustered indexes in pbxt vs innodb, better row count estimates

1057. By Vladimir Kolesnikov

changes similar to subselect.result

1058. By Vladimir Kolesnikov

added resultset ordering

1059. By Vladimir Kolesnikov

fixes similar to subselect.result

1060. By Vladimir Kolesnikov

fixes similar to subselect.result

1061. By Vladimir Kolesnikov

fixes similar to subselect.result

1062. By Vladimir Kolesnikov

added restulset ordering

1063. By Vladimir Kolesnikov

fixed a problem with key size for enums on drizzle side

Unmerged revisions

1063. By Vladimir Kolesnikov

fixed a problem with key size for enums on drizzle side

1062. By Vladimir Kolesnikov

added restulset ordering

1061. By Vladimir Kolesnikov

fixes similar to subselect.result

1060. By Vladimir Kolesnikov

fixes similar to subselect.result

1059. By Vladimir Kolesnikov

fixes similar to subselect.result

1058. By Vladimir Kolesnikov

added resultset ordering

1057. By Vladimir Kolesnikov

changes similar to subselect.result

1056. By Vladimir Kolesnikov

explain differences mostly because of non-clustered indexes in pbxt vs innodb, better row count estimates

1055. By Vladimir Kolesnikov

the original query plan can be forced by inserting more rows, but it doesn't test a storage engine's code

1054. By Vladimir Kolesnikov

added pbxt-specific action

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'drizzled/field.h'
2--- drizzled/field.h 2010-03-27 10:10:49 +0000
3+++ drizzled/field.h 2010-04-01 14:19:35 +0000
4@@ -65,7 +65,7 @@
5
6 inline uint32_t get_enum_pack_length(int elements)
7 {
8- return elements < 256 ? 1 : 2;
9+ return elements < 256 ? 1 : 4;
10 }
11
12 /**
13
14=== modified file 'drizzled/sql_bitmap.h'
15--- drizzled/sql_bitmap.h 2010-02-04 08:14:46 +0000
16+++ drizzled/sql_bitmap.h 2010-04-01 14:19:35 +0000
17@@ -297,6 +297,14 @@
18 return last_word_ptr;
19 }
20
21+ /**
22+ * * @return the last word mask for this bitmap
23+ * */
24+ my_bitmap_map getLastWordMask() const
25+ {
26+ return last_word_mask;
27+ }
28+
29 void addMaskToLastWord() const
30 {
31 *last_word_ptr|= last_word_mask;
32
33=== modified file 'drizzled/sql_string.h'
34--- drizzled/sql_string.h 2010-02-05 08:11:15 +0000
35+++ drizzled/sql_string.h 2010-04-01 14:19:35 +0000
36@@ -93,11 +93,21 @@
37 inline const char *ptr() const { return Ptr; }
38 inline char *c_ptr()
39 {
40+ if (!Ptr || Ptr[str_length]) /* Should be safe */
41+ (void) realloc(str_length);
42+/* This code crashes or overwrites the buffer if
43+ * str_length > Alloced_length,
44+ * which can happen if the buffer is not allocated at
45+ * all (Alloced_length == 0)!
46 if (str_length == Alloced_length)
47 (void) realloc(str_length);
48 else
49 Ptr[str_length]= 0;
50+<<<<<<< TREE
51
52+=======
53+*/
54+>>>>>>> MERGE-SOURCE
55 return Ptr;
56 }
57 inline char *c_ptr_quick()
58
59=== modified file 'drizzled/table_share.h'
60--- drizzled/table_share.h 2010-03-26 19:56:34 +0000
61+++ drizzled/table_share.h 2010-04-01 14:19:35 +0000
62@@ -327,6 +327,7 @@
63 max_rows= arg;
64 }
65
66+<<<<<<< TREE
67 /**
68 * Returns true if the supplied Field object
69 * is part of the table's primary key.
70@@ -344,6 +345,19 @@
71 }
72
73 TableIdentifier::Type tmp_table;
74+=======
75+ inline uint32_t getAvgRowLength()
76+ {
77+ return (table_proto) ? (table_proto->options().has_avg_row_length() ? table_proto->options().avg_row_length() : 0) : 0;
78+ }
79+
80+ drizzled::plugin::StorageEngine *storage_engine; /* storage engine plugin */
81+ inline drizzled::plugin::StorageEngine *db_type() const /* table_type for handler */
82+ {
83+ return storage_engine;
84+ }
85+ enum tmp_table_type tmp_table;
86+>>>>>>> MERGE-SOURCE
87
88 uint32_t ref_count; /* How many Table objects uses this */
89 uint32_t getTableCount()
90
91=== modified file 'plugin/innobase/lock/lock0lock.c'
92--- plugin/innobase/lock/lock0lock.c 2009-11-09 06:31:17 +0000
93+++ plugin/innobase/lock/lock0lock.c 2010-04-01 14:19:35 +0000
94@@ -4878,6 +4878,7 @@
95 LOCK_GAP type locks from the successor
96 record */
97 {
98+ rec_t* nc_rec;
99 const rec_t* next_rec;
100 trx_t* trx;
101 lock_t* lock;
102@@ -4892,7 +4893,12 @@
103 }
104
105 trx = thr_get_trx(thr);
106+<<<<<<< TREE
107 next_rec = page_rec_get_next_const(rec);
108+=======
109+ nc_rec = (rec_t *)rec;
110+ next_rec = page_rec_get_next(nc_rec);
111+>>>>>>> MERGE-SOURCE
112 next_rec_heap_no = page_rec_get_heap_no(next_rec);
113
114 lock_mutex_enter_kernel();
115
116=== added directory 'plugin/pbxt'
117=== added file 'plugin/pbxt/AUTHORS'
118--- plugin/pbxt/AUTHORS 1970-01-01 00:00:00 +0000
119+++ plugin/pbxt/AUTHORS 2010-04-01 14:19:35 +0000
120@@ -0,0 +1,4 @@
121+Paul McCullagh
122+paul.mccullagh@primebase.org
123+http://www.primebase.org
124+http://pbxt.blogspot.com
125
126=== added file 'plugin/pbxt/COPYING'
127--- plugin/pbxt/COPYING 1970-01-01 00:00:00 +0000
128+++ plugin/pbxt/COPYING 2010-04-01 14:19:35 +0000
129@@ -0,0 +1,340 @@
130+ GNU GENERAL PUBLIC LICENSE
131+ Version 2, June 1991
132+
133+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
134+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
135+ Everyone is permitted to copy and distribute verbatim copies
136+ of this license document, but changing it is not allowed.
137+
138+ Preamble
139+
140+ The licenses for most software are designed to take away your
141+freedom to share and change it. By contrast, the GNU General Public
142+License is intended to guarantee your freedom to share and change free
143+software--to make sure the software is free for all its users. This
144+General Public License applies to most of the Free Software
145+Foundation's software and to any other program whose authors commit to
146+using it. (Some other Free Software Foundation software is covered by
147+the GNU Library General Public License instead.) You can apply it to
148+your programs, too.
149+
150+ When we speak of free software, we are referring to freedom, not
151+price. Our General Public Licenses are designed to make sure that you
152+have the freedom to distribute copies of free software (and charge for
153+this service if you wish), that you receive source code or can get it
154+if you want it, that you can change the software or use pieces of it
155+in new free programs; and that you know you can do these things.
156+
157+ To protect your rights, we need to make restrictions that forbid
158+anyone to deny you these rights or to ask you to surrender the rights.
159+These restrictions translate to certain responsibilities for you if you
160+distribute copies of the software, or if you modify it.
161+
162+ For example, if you distribute copies of such a program, whether
163+gratis or for a fee, you must give the recipients all the rights that
164+you have. You must make sure that they, too, receive or can get the
165+source code. And you must show them these terms so they know their
166+rights.
167+
168+ We protect your rights with two steps: (1) copyright the software, and
169+(2) offer you this license which gives you legal permission to copy,
170+distribute and/or modify the software.
171+
172+ Also, for each author's protection and ours, we want to make certain
173+that everyone understands that there is no warranty for this free
174+software. If the software is modified by someone else and passed on, we
175+want its recipients to know that what they have is not the original, so
176+that any problems introduced by others will not reflect on the original
177+authors' reputations.
178+
179+ Finally, any free program is threatened constantly by software
180+patents. We wish to avoid the danger that redistributors of a free
181+program will individually obtain patent licenses, in effect making the
182+program proprietary. To prevent this, we have made it clear that any
183+patent must be licensed for everyone's free use or not licensed at all.
184+
185+ The precise terms and conditions for copying, distribution and
186+modification follow.
187+
188
189+ GNU GENERAL PUBLIC LICENSE
190+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
191+
192+ 0. This License applies to any program or other work which contains
193+a notice placed by the copyright holder saying it may be distributed
194+under the terms of this General Public License. The "Program", below,
195+refers to any such program or work, and a "work based on the Program"
196+means either the Program or any derivative work under copyright law:
197+that is to say, a work containing the Program or a portion of it,
198+either verbatim or with modifications and/or translated into another
199+language. (Hereinafter, translation is included without limitation in
200+the term "modification".) Each licensee is addressed as "you".
201+
202+Activities other than copying, distribution and modification are not
203+covered by this License; they are outside its scope. The act of
204+running the Program is not restricted, and the output from the Program
205+is covered only if its contents constitute a work based on the
206+Program (independent of having been made by running the Program).
207+Whether that is true depends on what the Program does.
208+
209+ 1. You may copy and distribute verbatim copies of the Program's
210+source code as you receive it, in any medium, provided that you
211+conspicuously and appropriately publish on each copy an appropriate
212+copyright notice and disclaimer of warranty; keep intact all the
213+notices that refer to this License and to the absence of any warranty;
214+and give any other recipients of the Program a copy of this License
215+along with the Program.
216+
217+You may charge a fee for the physical act of transferring a copy, and
218+you may at your option offer warranty protection in exchange for a fee.
219+
220+ 2. You may modify your copy or copies of the Program or any portion
221+of it, thus forming a work based on the Program, and copy and
222+distribute such modifications or work under the terms of Section 1
223+above, provided that you also meet all of these conditions:
224+
225+ a) You must cause the modified files to carry prominent notices
226+ stating that you changed the files and the date of any change.
227+
228+ b) You must cause any work that you distribute or publish, that in
229+ whole or in part contains or is derived from the Program or any
230+ part thereof, to be licensed as a whole at no charge to all third
231+ parties under the terms of this License.
232+
233+ c) If the modified program normally reads commands interactively
234+ when run, you must cause it, when started running for such
235+ interactive use in the most ordinary way, to print or display an
236+ announcement including an appropriate copyright notice and a
237+ notice that there is no warranty (or else, saying that you provide
238+ a warranty) and that users may redistribute the program under
239+ these conditions, and telling the user how to view a copy of this
240+ License. (Exception: if the Program itself is interactive but
241+ does not normally print such an announcement, your work based on
242+ the Program is not required to print an announcement.)
243+
244
245+These requirements apply to the modified work as a whole. If
246+identifiable sections of that work are not derived from the Program,
247+and can be reasonably considered independent and separate works in
248+themselves, then this License, and its terms, do not apply to those
249+sections when you distribute them as separate works. But when you
250+distribute the same sections as part of a whole which is a work based
251+on the Program, the distribution of the whole must be on the terms of
252+this License, whose permissions for other licensees extend to the
253+entire whole, and thus to each and every part regardless of who wrote it.
254+
255+Thus, it is not the intent of this section to claim rights or contest
256+your rights to work written entirely by you; rather, the intent is to
257+exercise the right to control the distribution of derivative or
258+collective works based on the Program.
259+
260+In addition, mere aggregation of another work not based on the Program
261+with the Program (or with a work based on the Program) on a volume of
262+a storage or distribution medium does not bring the other work under
263+the scope of this License.
264+
265+ 3. You may copy and distribute the Program (or a work based on it,
266+under Section 2) in object code or executable form under the terms of
267+Sections 1 and 2 above provided that you also do one of the following:
268+
269+ a) Accompany it with the complete corresponding machine-readable
270+ source code, which must be distributed under the terms of Sections
271+ 1 and 2 above on a medium customarily used for software interchange; or,
272+
273+ b) Accompany it with a written offer, valid for at least three
274+ years, to give any third party, for a charge no more than your
275+ cost of physically performing source distribution, a complete
276+ machine-readable copy of the corresponding source code, to be
277+ distributed under the terms of Sections 1 and 2 above on a medium
278+ customarily used for software interchange; or,
279+
280+ c) Accompany it with the information you received as to the offer
281+ to distribute corresponding source code. (This alternative is
282+ allowed only for noncommercial distribution and only if you
283+ received the program in object code or executable form with such
284+ an offer, in accord with Subsection b above.)
285+
286+The source code for a work means the preferred form of the work for
287+making modifications to it. For an executable work, complete source
288+code means all the source code for all modules it contains, plus any
289+associated interface definition files, plus the scripts used to
290+control compilation and installation of the executable. However, as a
291+special exception, the source code distributed need not include
292+anything that is normally distributed (in either source or binary
293+form) with the major components (compiler, kernel, and so on) of the
294+operating system on which the executable runs, unless that component
295+itself accompanies the executable.
296+
297+If distribution of executable or object code is made by offering
298+access to copy from a designated place, then offering equivalent
299+access to copy the source code from the same place counts as
300+distribution of the source code, even though third parties are not
301+compelled to copy the source along with the object code.
302+
303
304+ 4. You may not copy, modify, sublicense, or distribute the Program
305+except as expressly provided under this License. Any attempt
306+otherwise to copy, modify, sublicense or distribute the Program is
307+void, and will automatically terminate your rights under this License.
308+However, parties who have received copies, or rights, from you under
309+this License will not have their licenses terminated so long as such
310+parties remain in full compliance.
311+
312+ 5. You are not required to accept this License, since you have not
313+signed it. However, nothing else grants you permission to modify or
314+distribute the Program or its derivative works. These actions are
315+prohibited by law if you do not accept this License. Therefore, by
316+modifying or distributing the Program (or any work based on the
317+Program), you indicate your acceptance of this License to do so, and
318+all its terms and conditions for copying, distributing or modifying
319+the Program or works based on it.
320+
321+ 6. Each time you redistribute the Program (or any work based on the
322+Program), the recipient automatically receives a license from the
323+original licensor to copy, distribute or modify the Program subject to
324+these terms and conditions. You may not impose any further
325+restrictions on the recipients' exercise of the rights granted herein.
326+You are not responsible for enforcing compliance by third parties to
327+this License.
328+
329+ 7. If, as a consequence of a court judgment or allegation of patent
330+infringement or for any other reason (not limited to patent issues),
331+conditions are imposed on you (whether by court order, agreement or
332+otherwise) that contradict the conditions of this License, they do not
333+excuse you from the conditions of this License. If you cannot
334+distribute so as to satisfy simultaneously your obligations under this
335+License and any other pertinent obligations, then as a consequence you
336+may not distribute the Program at all. For example, if a patent
337+license would not permit royalty-free redistribution of the Program by
338+all those who receive copies directly or indirectly through you, then
339+the only way you could satisfy both it and this License would be to
340+refrain entirely from distribution of the Program.
341+
342+If any portion of this section is held invalid or unenforceable under
343+any particular circumstance, the balance of the section is intended to
344+apply and the section as a whole is intended to apply in other
345+circumstances.
346+
347+It is not the purpose of this section to induce you to infringe any
348+patents or other property right claims or to contest validity of any
349+such claims; this section has the sole purpose of protecting the
350+integrity of the free software distribution system, which is
351+implemented by public license practices. Many people have made
352+generous contributions to the wide range of software distributed
353+through that system in reliance on consistent application of that
354+system; it is up to the author/donor to decide if he or she is willing
355+to distribute software through any other system and a licensee cannot
356+impose that choice.
357+
358+This section is intended to make thoroughly clear what is believed to
359+be a consequence of the rest of this License.
360+
361
362+ 8. If the distribution and/or use of the Program is restricted in
363+certain countries either by patents or by copyrighted interfaces, the
364+original copyright holder who places the Program under this License
365+may add an explicit geographical distribution limitation excluding
366+those countries, so that distribution is permitted only in or among
367+countries not thus excluded. In such case, this License incorporates
368+the limitation as if written in the body of this License.
369+
370+ 9. The Free Software Foundation may publish revised and/or new versions
371+of the General Public License from time to time. Such new versions will
372+be similar in spirit to the present version, but may differ in detail to
373+address new problems or concerns.
374+
375+Each version is given a distinguishing version number. If the Program
376+specifies a version number of this License which applies to it and "any
377+later version", you have the option of following the terms and conditions
378+either of that version or of any later version published by the Free
379+Software Foundation. If the Program does not specify a version number of
380+this License, you may choose any version ever published by the Free Software
381+Foundation.
382+
383+ 10. If you wish to incorporate parts of the Program into other free
384+programs whose distribution conditions are different, write to the author
385+to ask for permission. For software which is copyrighted by the Free
386+Software Foundation, write to the Free Software Foundation; we sometimes
387+make exceptions for this. Our decision will be guided by the two goals
388+of preserving the free status of all derivatives of our free software and
389+of promoting the sharing and reuse of software generally.
390+
391+ NO WARRANTY
392+
393+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
394+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
395+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
396+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
397+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
398+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
399+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
400+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
401+REPAIR OR CORRECTION.
402+
403+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
404+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
405+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
406+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
407+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
408+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
409+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
410+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
411+POSSIBILITY OF SUCH DAMAGES.
412+
413+ END OF TERMS AND CONDITIONS
414+
415
416+ How to Apply These Terms to Your New Programs
417+
418+ If you develop a new program, and you want it to be of the greatest
419+possible use to the public, the best way to achieve this is to make it
420+free software which everyone can redistribute and change under these terms.
421+
422+ To do so, attach the following notices to the program. It is safest
423+to attach them to the start of each source file to most effectively
424+convey the exclusion of warranty; and each file should have at least
425+the "copyright" line and a pointer to where the full notice is found.
426+
427+ <one line to give the program's name and a brief idea of what it does.>
428+ Copyright (C) <year> <name of author>
429+
430+ This program is free software; you can redistribute it and/or modify
431+ it under the terms of the GNU General Public License as published by
432+ the Free Software Foundation; either version 2 of the License, or
433+ (at your option) any later version.
434+
435+ This program is distributed in the hope that it will be useful,
436+ but WITHOUT ANY WARRANTY; without even the implied warranty of
437+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
438+ GNU General Public License for more details.
439+
440+ You should have received a copy of the GNU General Public License
441+ along with this program; if not, write to the Free Software
442+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
443+
444+
445+Also add information on how to contact you by electronic and paper mail.
446+
447+If the program is interactive, make it output a short notice like this
448+when it starts in an interactive mode:
449+
450+ Gnomovision version 69, Copyright (C) year name of author
451+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
452+ This is free software, and you are welcome to redistribute it
453+ under certain conditions; type `show c' for details.
454+
455+The hypothetical commands `show w' and `show c' should show the appropriate
456+parts of the General Public License. Of course, the commands you use may
457+be called something other than `show w' and `show c'; they could even be
458+mouse-clicks or menu items--whatever suits your program.
459+
460+You should also get your employer (if you work as a programmer) or your
461+school, if any, to sign a "copyright disclaimer" for the program, if
462+necessary. Here is a sample; alter the names:
463+
464+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
465+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
466+
467+ <signature of Ty Coon>, 1 April 1989
468+ Ty Coon, President of Vice
469+
470+This General Public License does not permit incorporating your program into
471+proprietary programs. If your program is a subroutine library, you may
472+consider it more useful to permit linking proprietary applications with the
473+library. If this is what you want to do, use the GNU Library General
474+Public License instead of this License.
475
476=== added file 'plugin/pbxt/ChangeLog'
477--- plugin/pbxt/ChangeLog 1970-01-01 00:00:00 +0000
478+++ plugin/pbxt/ChangeLog 2010-04-01 14:19:35 +0000
479@@ -0,0 +1,801 @@
480+PBXT Release Notes
481+==================
482+
483+------- 1.0.09e RC3 - Not released yet
484+
485+RN283: Fixed bug that cause the error "[ERROR] Invalid (old?) table or database name 'mysqld.1'", when running temp_table.test under MariaDB (thanks to Monty for his initial bug fix).
486+
487+RN282: Added win_inttypes.h to the distribution. This file is only required for the Windows build.
488+
489+RN281: Fixed bug #451101: jump or move depends on uninitialised value in myxt_get_key_length
490+
491+RN280: Fixed bug #451080: Uninitialised memory write in XTDatabaseLog::xlog_append
492+
493+RN279: Fixed bug #451085: jump or move depends on uninitialised value in my_type_to_string
494+
495+RN278: Fixed bug #441000: xtstat crashes with segmentation fault on startup if max_pbxt_threads exceeded.
496+
497+------- 1.0.09d RC3 - 2009-09-30
498+
499+RN277: Added r/o flag to pbxt_max_threads server variable (this fix is related to bug #430637)
500+
501+RN276: Added test case for replication on tables w/o PKs (see bug #430716)
502+
503+RN275: Fixed bug #430600: 'Failed to read auto-increment value from storage engine' error.
504+
505+RN274: Fixed bug #431240: This report is public edit xtstat fails if no PBXT table has been created. xtstat now accepts --database=information_schema or --database=pbxt. Depending on this setting PBXT will either use the information_schema.pbxt_statistics or the pbxt.statistics table. If information_schema is used, then the statistics are displayed even when no PBXT table exists. Recovery activity is also displayed, unless pbxt_support_xa=1, in which case MySQL will wait for PBXT recovery to complete before allowing connections.
506+
507+RN273: Fixed bug #430633: XA_RBDEADLOCK is not returned on XA END after the transacting ended with a deadlock.
508+
509+RN272: Fixed bug #430596: Backup/restore does not work well even on a basic PBXT table with auto-increment.
510+
511+------- 1.0.09c RC3 - 2009-09-16
512+
513+RN271: Windows build update: now you can simply put the pbxt directory under <mysql-root>/storage and build the PBXT engine as a part of the source tree. The engine will be linked statically. Be sure to specify the WITH_PBXT_STORAGE_ENGINE option when running win\configure.js
514+
515+RN270: Correctly disabled PBMS so that this version now compiles under Windows. If PBMS_ENABLED is defined, PBXT will not compile under Windows becaause of a getpid() call in pbms.h.
516+
517+------- 1.0.09 RC3 - 2009-09-09
518+
519+RN269: Implemented online backup. A native online backup driver now performs BACKUP and RESTORE DATABASE operations for PBXT. NOTE: This feature is only supported by MySQL 6.0.9 or later.
520+
521+RN268: Implemented XA support. PBXT now supports all XA related MySQL statements. The variable pbxt_support_xa determines if XA support is enabled. Note: due to MySQL bug #47134, enabling XA support could lead to a crash.
522+
523+------- 1.0.08d RC2 - 2009-09-02
524+
525+RN267: Fixed a bug that caused MySQL to crash on shutdown, after an incorrect command line parameter was given. The crash occurred because the background recovery task was not cleaned up before the PBXT engine was de-initialized.
526+
527+------- 1.0.08c RC2 - 2009-08-18
528+
529+RN266: Updated BLOB streaming glue, used with the PBMS engine. The glue code is now identical to the version of "1.0.08-rc-pbms" version of PBXT available from http://blobstreaming.org/download.
530+
531+RN265: Changes the sequential reading of data log files to skip gaps, instead of returning EOF. This ensures that extended data records are preserved even when something goes wrong with the way the file is written.
532+
533+RN264: Fixed a bug that cased an "Data log not found" error after an out of disk space error on a log file. This bug is similar to RN262 in that it allows "gaps" to appear in the data logs.
534+
535+RN263: Updated xtstat to compile on Windows/MS Visual C++.
536+
537+RN262: Merged changes for PBMS version 0.5.09.
538+
539+RN261: Concerning bug #377788: Cannot find index for FK. Fixed buffer overflow which occurred when the error was reported.
540+
541+RN260: Fixed bug #377788: Cannot find index for FK. PBXT now correctly uses prefix of an index to support FK references (e.g. if key = (c1, c2) then an index on (c1, c2, c3) will work). Also fixed buffer overflow, which occurred when reporting the error.
542+
543+RN259: Fixed bug #309424: xtstat doesn't use my.cnf. You can now add an [xtstat] section to my.cnf, for use with xtstat.
544+
545+RN258: updated xt_p_join implementation for Windows to check if a thread has already exited or has not yet started
546+
547+RN257: Removed false assertion that could fail during restore if a transaction log page was zero-filled
548+
549+RN256: Update datalog eof pointer only if write opearions were sucessful
550+
551+RN255: Added re-allocation of of filemap if allocating the of the new map failed. This often happens if there's not enough space on disk.
552+
553+RN254: When a table with a corrupted index is detected, PBXT creates a file called 'repair-pending' in the pbxt directory, with the name of the table in it. Each table in the file is listed on a line by itself (the last line has no trailing \n). When the table is repaired (using the REPAIR TABLE command), this entry is removed from the file.
554+
555+RN253: Use fcntl(F_FULLFSYNC) instead of fsync on platforms that support it. Improper fsync operation was presumably the reason of index corruption on Mac OS X.
556+
557+RN252: Fixed bug #368692: PBXT not reporting data size correctly in information_schema.
558+
559+------- 1.0.08 RC2 - 2009-06-30
560+
561+RN251: A Windows-specific test update, also removed false assertion that failed on Windows.
562+
563+RN250: Fixed a bug that caused recovery to fail when the transaction log ID exceeded 255. The problem was a checksum failed in the log record.
564+
565+RN249: Fixed bug #313176: Test case timeout. This happened because record cache pages where not properly freed and as soon as cache filled up the performacne degraded.
566+
567+RN248: PBXT now compiles and runs with MySQL 5.1.35. All tests pass.
568+
569+RN247: Fixed bug #369086: Incosistent/Incorrect Truncate behavior
570+
571+RN246: Fixed bug #378222: Drop sakila causes error: Cannot delete or update a parent row: a foreign key constraint fails
572+
573+RN245: Fixed bug #379315: Inconsistent behavior of DELETE IGNORE and FK constraint.
574+
575+RN244: Fixed a recovery problem: during the recovery of "record modified" action the table was updated before the old index entries were removed; then the xres_remove_index_entries was supplied the new record which lead to incorrect index update.
576+
577+RN243: Fixed a bug that caused a recovery failure if partitioned pbxt tables where present. This happended because the recovery used a MySQL function to open tables and the PBXT handler was not yet registered
578+
579+RN242: Fixed a bug that caused a deadlock if pbxt initialization failed. This happened because pbxt ceanup was done from pbxt_init() with PLUGIN_lock being held by MySQL which lead to a deadlock in the freeer thread
580+
581+RN241: Fixed a heap corruption bug (writing to a freed memory location). It happened only when memory mapped files were used leading to heap inconsistency and program crash or termination by heap checker. Likely to happen right after or during DROP TABLE but possible in other cases too.
582+
583+RN240: Load the record cache on read when no using memory mapped files.
584+
585+RN239: Added PBXT variable pbxt_max_threads. This is the maximum number of threads that can be created PBXT. By default this value is set to 0 which means the number of threads is derived from the MySQL variable max_connections. The value used is max_connections+7. Under Drizzle the default value is 500.
586+
587+RN238: Added an option to wait for the sweeper to clean up old transactions on a particular connection. This prevents the sweeper from getting too far behind.
588+
589+RN237: Added an option to lazy delete fixed length index entries. This means the index entries are just marked for deletion, instead of removing the items from the index page. This has the advantage that an exclusive lock is not always required for deletion.
590+
591+RN236: Fixed bug #349177: a bug in configure.in script.
592+
593+RN235: Fixed bug 349176: a compiler warning.
594+
595+RN234: Completed Drizzle integration. All Drizzle tests now run with PBXT.
596+
597+RN233: Fixed bugs which occur when PBXT is used together with PBMS (BLOB Streaming engine).
598+
599+RN232: Merged Drizzle-specific changes into the main tree.
600+
601+RN231: Fixed a bug that caused bad performance as the number of threads increased. This occurred when the number of open table handles exceeded 'table_open_cache', and MySQL started closing open table handlers. PBXT was flushing a table when all table handlers were closed. PBXT will now only do this when the FLUSH TABLES statement is used.
602+
603+RN230: Improved efficiency of conflict resolution: Implemented a queue for threads waiting for a lock. Threads no longer poll to take a lock. If a temp lock is granted because of an update, then the thread granted the temp lock will also wait for the transaction that did the update to quit.
604+
605+RN229: Fixed bug #313391: LOAD DATA ... REPLACE broken.
606+
607+RN228: Fixed bug #341115: 'Out of memory' error (a bug in key comparison algorithm).
608+
609+RN227: Changed conflict handling to use spin locks and improve efficiency.
610+
611+RN226: Fixed bug #340316: Issue with bigint unsigned auto-increment field.
612+
613+RN225: Fixed bug #308557: UPDATE fails to match all rows in a transactional scenario.
614+
615+RN224: Fixed a deadlock which could occur during table scans.
616+
617+RN223: Index scans now use handles to cache buffers instead of making a copy of the index page. The handles are "copy-on-write".
618+
619+RN222: Fixed a bug that caused the server to hang on startup if PBXT ran out of record cache while waiting for the sweeper to complete.
620+
621+RN221: Fixed an index recovery bug. This occurred if the server crashed after operating in low index cache sitations.
622+
623+RN220: Improved index selectivity estimation: added scanning from the end of index backwards.
624+
625+RN219: Fixed a problem: during intersected range scan not all fields were returned by engine to MySQL.
626+
627+RN218: Changed the way row locking (used by SELECT FOR UPDATE) works. Previously we locked a group of rows at once (although there were many groups). However, this caused conflicts even when the same rows were not locked. We now locks individual rows.
628+
629+RN217: Fixed bug #315564: Rollbacked inserts remain permanently in table.
630+
631+RN216: Added lock tracing. In DEBUG mode, each thread has a list of locks (semaphores, mutexes, r/w locks that it holds).
632+
633+RN215: Fixed a bug that caused a crash during restart if an index file was flushed during recovery.
634+
635+RN214: Fixed bug #310184: Deadlock when trying to wake up transactions
636+
637+RN213: Fixed an index corruption bug on SPARC Solaris. Note this error will occur on any machine that does not use the x86 (little endian) byte order.
638+
639+------- 1.0.07 RC - 2008-12-15
640+
641+RN212: Fixed build problems on NetBSD.
642+
643+RN211: Fixed build problems on FreeBSD.
644+
645+RN210: Fixed build problems on OpenSolaris.
646+
647+RN209: Added handling of the foreign_key_checks system flag.
648+
649+RN208: xtstat will now automatically reconnect if the connection to server is lost.
650+
651+RN207: Foreign key references are now checked on CREATE TABLE.
652+
653+RN206: Fixed a crash if inserting into a table that has an FK that references a column that has no index on it.
654+
655+RN205: Added processing of foreign key action SET DEFAULT.
656+
657+RN204: Fixed an index recovery problem: unswept index entries were not recovered correctly
658+
659+RN203: Fixed foreign key bug: REPLACE fails with 'on delete cascade'
660+
661+RN202: Fixes and updates to tests, now all tests pass on windows and linux.
662+
663+RN201: Fixed ref-counting for mmapped files.
664+
665+RN200: Fixed an index recovery problem: unswept index entries were not recovered correctly .
666+
667+RN199: Recovery now takes place on plug-in startup. Previously recovery occurred when the first PBXT table was accessed.
668+
669+RN198: Fixed a recovery bug that caused index entries to get out of sync with the data file.
670+
671+RN197: Improved the efficiency of group commit.
672+
673+RN196: Changed checkpointing so that it now works during idle time. Every record, row or index file fllush now also contributes to the checkpoint (fuzzy checkpointing). Checkpointing is forced to complete after about 50% of the checkpoint threshold in order to ensure the correct maximum for log reading on recovery.
674+
675+RN195: Fixed scheduling bug that caused sweeper to get behind with the cleanup, which caused performance problems in high conflict situations. Foreground threads will now wait if the sweeper gets too far behind.
676+
677+RN194: Created the xtstat program which monitors the internal performance of PBXT. Run xtstat --help for more details information of the output.
678+
679+RN193: Implemented the pbxt.statistics virtual table. The statistics table returns information about the internal activity of the engine. This includes I/O byte counts, cache hit counts and usage, commit count, etc.
680+
681+RN192: Due to timing issues in the engine API it could happen that the client received an OK for a committed transaction before the transaction was actually committed. This problem has been fixed.
682+
683+RN191: Fixed a bug that caused a hang when conflicts occured while reading a covering index.
684+
685+RN190: Previously the sweeper delayed deletion of transaction structures until all transactions that were running during sweeping have quit. This is now handled by the same code that fixed the bug in RN189.
686+
687+RN189: Fixed a bug that could cause a row to go missing due to a visibility issue.
688+
689+RN188: Fixed a bug which ocurred when using CREATE TABLE ... AVG_ROW_LENGTH=x, and the table contained BLOBs. In this case, alter table corrupted the table.
690+
691+RN187: Windows now stores paths in the location file in UNIX format by converting all '\' characters to '/'. Note that the location file is only cross-platform if the paths are relative (which is the default).
692+
693+RN186: Set version number to 1.0.07.
694+
695+------- 1.0.06 Beta 2 - 2008-11-06
696+
697+RN185: Disabled support for INSERT DELAYED because of MySQL bug #40505
698+
699+RN184: Implemented info(flag == HA_STATUS_AUTO) engine API call. This call returns the next value that will be assigned as auto-increment value on the table.
700+
701+RN183: Turned off streaming on Windows (see XT_STREAMING macro in sources)
702+
703+RN182: Switch code base to the latest version of BLOB streaming engine (PBMS): www.blobstreaming.org.
704+
705+RN181: Updated pbxt-test-run default parameters (--force is on, --default-storage-engine is pbxt, --base-dir is set according to config)
706+
707+RN180: PBXT can now cope with a missing .xti file (the file that contains the table indexes). This file can be regenerated using REPAIR TABLE.
708+
709+RN179: On recovery PBXT now creates a filed called 'recovery-progress' in the pbxt database. The recovery percentage complete is written to this file as recovery progresses. Note that this file will not be created if no recoery is necessary or if PBXT estimates that it will read less then 10MB to do recovery.
710+
711+RN178: Fixed a problem in CHECK TABLE that caused memory corruption for fixed-size records
712+
713+RN177: Added "crash debugging". When enabled, crash debugging does the following:
714+ - Create a core dump on Windows if the server crashes.
715+ - Make a backup copy of the datadir directory before recovery if the server crashes.
716+ - Keep at least 5 of the previous transaction logs.
717+Currently crash debugging is disabled by default. To disable, create a file called 'no-debug' in the pbxt database folder, and restart the server. When crash debugging is disabled by default, it can be enabled by creating a file called 'crash-debug; in the pbxt database folder.
718+
719+RN176: Fixed a bug: a lock was not released appropriately
720+
721+RN175: Fixed some debug assertions
722+
723+RN174: Fixed some of test/mysql-test tests
724+
725+RN173: Fixed a RENAME TABLE bug, that prevented index files from being properly recreated
726+
727+RN172: Added the file ./pbxt/lock-pid. This file is locked while the server is running, and contains the process of the server. PBXT will return an error on startup if the file is locked or the process is still running in order to prevent a second server from being started.
728+
729+RN171: Implemented the AVG_ROW_LENGTH table attribute. When set, this value determines the size of the fixed length data component of a record. Normally this size is estimated depending on the column definitions. The command CHECK TABLE dumps the current average row length to the log. This can be used to find a suitable value for AVG_ROW_LENGTH.
730+
731+RN170: Changed configure so that debug/optimize flags set for building the engine override the flags set for MySQL. If --with-debug is not specified, then the engine will use the flags set when building MySQL. If MySQL was built with --with-debug=full, the DEBUG will be defined for the engine. When building the engine, the following flags can be set:
732+ yes - Debug symbols enabled, no optimization, DEBUG not defined.
733+ full - Debug symbols enabled, no optimization, DEBUG defined.
734+ only - Debug symbols enabled, MySQL flags used, DEBUG not defined.
735+ prof - Profile code enabled, optimization on, DEBUG not defined.
736+ no - No debug symbols, optimization on, DEBUG not defined.
737+
738+RN169: Used MySQL root Makefile instead of config.status in order to extract settings (such as CFLAGS and CXXFLAGS) for the PBXT build.
739+
740+RN168: Fixed Windows build after merging changes for Drizzle.
741+
742+RN167: Fixed "This table requires primary key" error in sql-bench.
743+
744+RN166: Fixed threading problems that caused crashes in sql-bench.
745+
746+RN165: Added sql-bench to pbxt source tree.
747+
748+RN164: Ported PBXT to Drizzle. To compile for Drizzle DRIZZLED must be defined on the command line. The -drz.am and -drz.in files are must be used when PBXT is embedded in Drizzle.
749+
750+RN163: Added "make test" build step. Running "make test" from the root of pbxt source tree will launch test/mysql-test/pbxt-test-run.pl with appropriate options to execute the pbxt functional test suite. On Windows where
751+pbxt is statically linked into mysql server binary pbxt testing works by going to test/mysql-test directory and running ./pbxt-test-run.pl with --base-dir argument pointing to a mysql source tree (mysql binaries are taken
752+from there) and passing the rest of usual arguments (--force --mysqld=--default-storage-engine=pbxt)
753+
754+RN162: The 'pbxt' database must now be dropped explicitly. It is automatically created when the first PBXT table is created. After that, the pbxt database can be dropped once all PBXT tables have been dropped. Dropping the pbxt database will also cause all transaction (pbxt/system directory) and data logs (pbxt/data directory) to also be deleted.
755+
756+RN161: Added pbxt.location system table. This table can only be dropped when all PBXT tables have been deleted. Dropping the system table will cause all transaction (pbxt/system directory) and data logs (pbxt/data directory) to also be deleted.
757+
758+RN160: Made changes to run with MySQL 6.0.6.
759+
760+RN159: Changes to configure: added --with-plugindir=<path>, which should be used to specify the plugin directory. This means that --libdir should no longer be used. For backwards compatibility configure will still recognize this options if the path ends with 'plugin'.
761+
762+Also updated --help, to include all options, and better desciptions of the options.
763+
764+The configure options are now as follows:
765+
766+--with-mysql=<path> - (Required) It specifies the path to the MySQL source tree. The source should already be built. All other options will be taken from the MySQL build by default.
767+--with-debug=yes/no - (Optional) Specify if then engine should be built with different debug options to the MySQL source tree.
768+--with-plugindur=<path> - (Optional) Specify an alternative installation directory for the plugin. By default it will be installed in the plugin directory of the MySQL installation.
769+
770+
771+RN158: Added support for core dumps on Windows. This can be enabled by defining XT_COREDUMP. On by default at the moment. If the server crashes a file called PBXTCore00000001.dmp will be created in the data directory. This file can be openned using MS VS.
772+
773+RN157: Fixed a compile problem with tv_nsec which is not supported on all platforms.
774+
775+RN156: Updated tests to run with MySQL 5.1.28.
776+
777+RN155: Errors during cascade update of VARCHAR values with trailing spaces
778+
779+RN154: Fixed a bug: impossible to create a foreign key that referenced an ENUM or SET column
780+
781+RN153: Fixed a bug that caused the following problems: #1. Foreign keys: crash if update cascade and autocommit=0 #2. Foreign keys: crash if update cascade and multi-level recursion
782+
783+RN152: Fixed missing information about foreign keys in I_S.table_constraints and I_S.referential_constraints
784+
785+------- 1.0.05 Beta - 2008-08-30
786+
787+RN151: "Quick config": It is now possible to configure the engine by just specifying the mysql source code tree (the --with-mysql option). The --libdir and --with-debug setting will be deduced automatically.
788+
789+RN150: Added system variable pbxt_sweeper_priority, 0 = low (default), 1 = normal (same as user threads), 2 = high. The sweeper cleans up deleted records (deleted records also result from an update). If allowed to accumulate, these records can slow searches. Higher priority for the sweeper is recommended on systems with 4 or more cores.
790+
791+RN149: Record cleanup is now initiated if a deleted record is found, and the transaction that deleted the record has ended. Since waking up the sweeper is an expensive operation, normally the sweeper will run every 1/10th of a second.
792+
793+RN148: Fixed a bug which caused transaction starvation (one transaction was constantly locked out) during high conflict updates. This lead to cleanup of records not being done, which lead to a general slow down.
794+
795+RN147: Fixed a problem with TRUNCATE TABLE: a failed TRUNCATE TABLE could put the engine into an invalid state that later caused a crash
796+
797+RN146: Fixed a bug that caused the error: "-49: Record format unknown, either corrupted or upgrade required".
798+
799+RN145: Added pbxt_db_offline_log_function system variable, 0 = recycle logs (default), 1 = delete logs (default on Mac OS X), 2 = keep logs.
800+
801+------- 1.0.04 Alpha - 2008-08-02
802+
803+RN144: Completed port and testing of Windows version.
804+
805+RN143: Fixed a bug which caused the free-er thread to hang. This was a result of an invalid operation ID, which was the result of the checkpointer flushing the table at the same time as a foreground thread.
806+
807+RN142: The fast RW/mutex lock can now handle nested calls. This is possible during a sequential scan.
808+
809+RN141: The normal behavior in MySQL is that an auto-increment values will be re-issued if you delete the row containing the current maximum auto-increment value and then restart the server. To prevent this you can use ALTER TABLE my_table AUTO_INCREMENT = <current-max-auto-increment> + 1, before deleting the current maximum auto-increment value.
810+
811+A new system variable, pbxt_auto_increment_mode, has been added so that this work around is not necessary. When set to 0 (the default), auto-increment works as described above. When set to 1, the AUTO_INCREMENT value of the table is automatically to prevent previously issued auto-increment values being returned.
812+
813+However, if the server crashes, a gap of up to 100 unique values can result, because the table AUTO_INCREMENT value is incremented in steps of 100.
814+
815+RN140: Index statistics are now automatically recalculated when the table row count exceeds 200.
816+
817+RN139: Fixed a bug that caused index corruption, error: "int idx_push(index_xt.cc:172) -2: Core B-tree too deep".
818+
819+RN138: Handle startup and recovery when an index is corrupted.
820+
821+RN137: Fixed a bug in the zero wait R/W lock that caused the lock to fail (the state is extremely volatile, and must be written to memory after increment).
822+
823+RN136: Fixed a bug that cause the error "int xt_pwrite_file(filesys_xt.cc:789) errno (14): Bad address".
824+
825+RN135: Fixed TRUNCATE TABLE that did not work correctly when the table contained BLOBs stored in the BLOB streaming engine (www.blobstreaming.org).
826+
827+RN134: Fixed a bug that caused duplicate rows to be returned from an index scan (using a SELECT FOR UPDATE) if a concurrent update was done.
828+
829+RN133: Optimised PBXT for multi-processor scale-up. This mostly involved using different types of locks instead of the standard pthread mutex and reader/writer locks [TODO: 0038].
830+
831+------- 1.0.03 Alpha - 2008-05-30
832+
833+RN132: Fixed bug when using PBXT in conjunction with the BLOB streaming engine (www.blobstreaming.org). Uploaded BLOBs could not be inserted into a table.
834+
835+RN131: Fixed wait for background processes on shutdown. Shutdown will wait a maximum of 16 seconds for each process.
836+
837+RN130: Fixed calculation of bytes to be read for recovery.
838+
839+RN129: Fixed bug in cleanup of unterminated transactions.
840+
841+RN128: The writer will now start working when one of the following is true:
842+- it is time for a checkpoint,
843+- the log cache is almost full,
844+- the free'er is waiting for the writer,
845+- there is no other activity.
846+
847+RN127: Fixed checkpoint frequency. Checkpointing is now done correctly after 'pbxt_checkpoint_frequency' bytes.
848+
849+RN126: Implemented index consistent write [TODO: 0050].
850+
851+RN125: Implemented memory mapping for row pointer (.xtr) and handle data files (.xtd).
852+
853+RN124: Index files now use direct I/O.
854+
855+------- 1.0.02 Alpha - 2008-04-25
856+
857+RN123: Fixed compile errors with MySQL 5.1.24.
858+
859+------- 1.0.01 Alpha - 2008-03-28
860+
861+RN122: ++++ NOTE: This version is not compatible with older versions of PBXT ++++.
862+
863+RN121: Transaction logs are now global so that multi-database statements are now possible. This makes it also possible to work PBXT temporary tables.
864+
865+RN120: Transaction logs pre-allocated and recycled.
866+
867+RN119: Transaction log writes on 512 byte boundaries only.
868+
869+------- 1.0.00 Alpha - 2008-03-10
870+
871+This version has alpha status because of the large number of changes done for full durability.
872+
873+RN118: ++++ NOTE: This version is incompatible to older versions of PBXT ++++.
874+
875+RN117: Documentation now avaliable at http://www.primebase.org/documentation.
876+
877+RN116: Corrected the plug.in file so that PBXT compiles when dropped into the storage directory in the MySQL source tree.
878+
879+RN115: Compiled and tested with MySQL 5.1.23.
880+
881+RN114: Increased index block size. Minimum is now 4K. Default is 16K.
882+
883+RN113: Calculate index selectivity to return a more accurate value from records_in_range(). NOTE: FLUSH TABLESl will update the index statistics, after data has been inserted or updated.
884+
885+RN112: Optimized table storage, saving 8 bytes per row.
886+
887+RN111: Optimized search on keys containing 2 or 3 not null integer values.
888+
889+RN110: Optimization: store the row ID in the index so that an index entry can be verified as current without loading the record. This is necessary to optimize an access with index coverage.
890+
891+RN109: Optimization: only load the record extended data if required.
892+
893+RN108: Implemented SHOW ENGINE PBXT STATUS;
894+
895+RN107: Added the following system variables:
896+
897+pbxt_index_cache_size - The amount of memory allocated to the index cache, used only to cache index data
898+pbxt_record_cache_size - The amount of memory allocated to the record cache used to cache table data
899+pbxt_log_cache_size - The amount of memory allocated to the transaction log cache used to cache on transaction log data
900+pbxt_log_file_threshold - The size of a transaction log before rollover, and a new log is created
901+pbxt_transaction_buffer_size - The size of the global transaction log buffer (the engine allocates 2 buffers of this size)
902+pbxt_log_buffer_size - The size of the buffer used to cache data from transaction and data logs during sequential scans, or when writing a data log
903+pbxt_checkpoint_frequency - The amount of data written to the transaction log before a checkpoint is performed
904+pbxt_data_log_threshold - The maximum size of a data log file
905+pbxt_garbage_threshold - The percentage of garbage in a data log file before it is compacted
906+
907+RN106: PBXT now compiles for MySQL 6.0.3.
908+
909+RN104: Updates now locks a record temporarily. This prevents most "record changed" errors, however, it makes UPDATE statements a type of "committed read". This means that you may update a different value to that which you selected in repeatable read mode. To avoid this, use SELECT FOR UPDATE if you plan to UPDATE records after reading.
910+
911+RN103: Implemented SELECT FOR UPDATE. This is implemented by turning SELECT FOR UPDATE into a type of "committed read". This means that, if you do a SELECT followed by a SELECT FOR UPDATE you can get different results, even in repeatable read mode.
912+
913+RN102: Implemented recovery of index entries. Note: indexes are not yet fully consistent. This means that index can become currupted due to a crash. Data, however, cannot be lost. The indices can be rebuild using REPAIR TABLE.
914+
915+RN101: Writing and flushing of a single transaction write-ahead log.
916+
917+RN100: Automatic rollover of transaction logs as they become full.
918+
919+RN99: Implementation of the transaction log cache.
920+
921+RN98: Group commit.
922+
923+RN97: Implementation of the writer thread that applies changes in the transaction log to the database.
924+
925+RN96: Implementation of the checkpointer thread that periodically flushes the database and writes a checkpoint which determines the recovery start point.
926+
927+RN95: Implementation of the free'er thread that is responsible for keeping the record cache at a preset level.
928+
929+RN94: Modifications to the record cache so that rows are stored in pages, in order to speed up sequence access.
930+
931+RN93: Implemented the recovery process which applies changes written to the log that are not in the database, on startup.
932+
933+RN92: Modification of the sweeper thread which cleans up rolled-back transactions and deleted data, to use the new transaction log format.
934+
935+RN91: Modifications to the data logs so that they use the same record structure as the transaction logs.
936+
937+RN90: The data logs are now managed "per database" in order to minimize the work done to flush and commit a transaction.
938+
939+RN89: Implementation of a file handle pool for the data logs.
940+
941+------- 0.9.91 Beta - 2007-10-30
942+
943+RN88: The format of the URL genearated by MyBS has been changed. The format of the BLOB URLs is now as follows:
944+
945+'~*' <db-name> '/' <type-char> <table-id> '-' <blob-id> '-' <access-code> '-' <server-id>
946+
947+Where <type-char> is '_' or '~'.
948+
949+Examples: ~*test/_11-128-fbd590b-0, ~*test/~1-524-3dc45b09-0
950+
951+In other words, the characters '>' has been replace by '*', '^' has been replace by '_' and ':' has been replace by '~'. The reason for this is that the characters '>' and '^' are not allowed in URLs, and must be URL-encoded. The character ':' is reserved, but allowed.
952+
953+NOTE: This change makes this version incompatible with previous versions of MyBS. If you have a table with BLOB URLs, you can upgrade the URLs as follows:
954+
955+UPDATE blob_table SET blob_col = REPLACE(REPLACE(blob_col, '~>', '~*'), '/:', '/~');
956+
957+Replacing '^' is not necessary because BLOB URLs with '^' should not appear in tables.
958+
959+------- 0.9.90 Beta - 2007-10-17
960+
961+RN87: Corrected stack trace of errors passed through the BLOB streaming API.
962+
963+RN86: Added new engine API accessor functions that appeared in 5.1.21 (thanks Stewart).
964+
965+RN85: Added plug.in file. PBXT now compiles when dropped into the storage directory of the MySQL build tree. However, you have rebuild configure. For example:
966+
967+rm -rf autom4te.cache/
968+aclocal
969+autoconf
970+autoheader
971+automake -a
972+./configure --help
973+./configure --with-plugins=max --without-innodb --prefix=/usr/local/mysql --with-debug=full
974+
975+NOTE: ./configure --help should show that the PBXT has been included.
976+
977+RN84: Fixed several problems with shutdown of PBXT in combiniation with MyBS.
978+
979+------- 0.9.89 Beta - 2007-08-17
980+
981+RN83 (2007-08-21): Fixed a crash due to a compile bug that does not like the contruct *((xtWordPS *) &(v)) = (xtWordPS) (x) (macro allocr_() and alloczr_()).
982+
983+RN82: It is now possible to insert non-URL values into a LONGBLOB field, in the previous version the generated an "Invalid URL" error. Such values can be retrieved as a stream using a field reference.
984+
985+RN81: Fixed a bug that caused PBXT to crash during certina operations when MyBS was not installed.
986+
987+RN80: Set engine as capable of row-level replication, but not as statement replication. Statement replication does not work because MVCC is not serializable.
988+
989+------- 0.9.88 Beta - 2007-07-25
990+
991+RN79: Made some corrections in order to compile with MySQL 5.1.20.
992+
993+RN78: Support for the features of the MyBS BLOB Streaming engine, version 0.5 Alpha.
994+
995+RN77: Bugfix: The server crashes during BLOB data handling. The reason is the table field structure is shared, and may not be changed.
996+
997+------- 0.9.87 Beta - 2007-06-19
998+
999+RN76: The major feature of this release is support for the BLOB Streaming Engine. The current version enables the download of specific BLOB columns via the Streaming Engine. For example:
1000+
1001+use test;
1002+CREATE TABLE notes_tab (
1003+ n_id INTEGER PRIMARY KEY,
1004+ n_text BLOB
1005+) ENGINE=pbxt;
1006+INSERT notes_tab VALUES (1, "This is a BLOB streaming test!");
1007+
1008+The URL:
1009+
1010+http://localhost:8080/test/notes_tab/n_text/n_id=1
1011+
1012+will return the value "This is a BLOB streaming test!"
1013+
1014+RN75: Bugfix: MySQL prints error: "Plugin 'PBXT' will be forced to shutdown". This error was caused by the plug-in having a reference to itself.
1015+
1016+RN74: Added system variable pbxt_index_cache_size and pbxt_record_cache_size. These variable can now be set on the mysqld command line (for example: --pbxt_record_cache_size=50MB). The values are also displayed by SHOW VARIABLES.
1017+
1018+------- 0.9.86 Beta - 2007-04-07
1019+
1020+RN74: ++++ NOTE: This version is incompatible to older versions of PBXT ++++.
1021+
1022+In order to upgrade, install the older version of PBXT. Convert all tables to MyISAM using ALTER TABLE t1 ENGINE=MyISAM. Then install the new version of PBXT and convert back using ALTER TABLE t1 ENGINE=PBXT.
1023+
1024+RN73: Each table will now use a maximum of 4 data log files. This means a maximum of 7 files per table. The minimum is 3 for tables that do not have a variable field that exceeds about 40 bytes in size. This means that under Linux PBXT requires a maximum of 7 file handles per table used. Windows lock of pread/pwrite (atomic seek and read/write) functions means it requires a file handler per file per open table handler. [TODO: 0044]
1025+
1026+RN72: All threads now write to the same data log file. Recovery and compaction take this fact into account. Each thread still writes its own transaction log.
1027+
1028+RN71: Removed all directory scans when creating and dropping table. Increased the table limit to 10000.
1029+
1030+RN70: Changed locking to avoid a deadlock when TRUNCATE TABLE is used together with other DML.
1031+
1032+RN69: procedures and functions are now considered atomic, and execute in a single transaction.
1033+
1034+RN68: Bug fixed: all files are now correctly flushed before commit.
1035+
1036+------- 0.9.85 Beta - 2007-03-15
1037+
1038+RN67: Changed the implementation of the pushsr_ and allocr_ macros because "*((void **) &(v) = " caused a crash due to a compiler error on some platforms (thanks Luciano for your help on this one and RN66).
1039+
1040+RN66: Fixed a bug that caused PBXT to corrupt the index file when the size exceeded 4GB. [TODO: 0031]
1041+
1042+RN65: PBXT now runs under Windows. This source tree must be placed in the MySQL source storage directory in order to compile. Further details of how to build are in the windows-readme.txt file. [TODO: 0027]
1043+
1044+RN64: Improved speed of table lookup by ID after a table has been deleted. The sweeper needs to ignore these records. Scanning the directory each time was too slow.
1045+
1046+RN63: Added checking for repeat update of a record in a statement.
1047+
1048+RN62: Committed read no longer blocks due to a change made by another transaction (the XT_REPEATABLE_READ_BLOCKS define, turns blocking on).
1049+
1050+RN61: Avoid checking for duplicates if an index is not modified by an update.
1051+
1052+RN60: Records updated repeatedly by a transaction are now updated in place. [TODO: 0040]
1053+
1054+------- 0.9.8 Beta - 2007-01-30
1055+
1056+RN59: Reduced the number of file handles used to a maximum of one per file. This assumes that pread() and pwrite() allows multiple threads to use the same file handle (according to my tests, this is the case).
1057+
1058+RN58: Added the configure flag --with-debug=only which compiles a version of the plug-in with debug symbols that will link to an non-debug MySQL server.
1059+
1060+RN57: Changed error number returned on lock from 1205 (lock timeout) to 1020 (optimistic lock failure).
1061+
1062+RN56: Added UNIX environment variable for PBXT system parameters. These must be set before starting mysqld, for example:
1063+
1064+setenv pbxt_index_cache_size 400MB
1065+setenv pbxt_record_cache_size "1 GB"
1066+
1067+Values are in bytes unless one of the following units is specified: GB, MB, Kb
1068+
1069+RN55: Fixed a bug which prevented VARCHAR values from being compressed correctly when stored in variable length rows.
1070+
1071+RN54: Fixed a bug which caused a crash when PBXT was used with MySQL 5.1.14. This bug also caused data to be corrupted on insert.
1072+
1073+RN53: Set query caching mode to transactional. [TODO: 0027]
1074+
1075+RN52: Added conditions so that the engine compiles with MySQL 5.1.14 and 5.1.13.
1076+
1077+------- 0.9.74 Beta - 2006-12-14
1078+
1079+RN51: DELETE FROM <table>; is no longer implemented by re-creating the table. This statement now works by deleting all rows. TRUNCATE is implemented as before, by re-creating the table.
1080+
1081+RN50: The test scripts innodb.test and innodb-mysql.test have been modified to run with PBXT.
1082+
1083+RN49: [TODO: 0020] Implemented foreign keys. Functionality is identical to InnoDB with 2 exceptions:
1084+
1085+* Data types of referenced columns must be an exact match (e.g. you cannot mix VARCHAR and CHAR values).
1086+* Currently an exact matching index is required on referenced columns (i.e. the index may not have more columns that the columns used in the foreign key definition).
1087+
1088+Also note the following:
1089+
1090+* It is possible to create foreign keys that reference non-existent tables or columns. An error will occur when updating a table with an incorrect foreign key declaration.
1091+* If you alter the data-type of a column referenced by a foreign key set you need to set foreign_key_checks=0; or an error will occur.
1092+
1093+RN48: Fixed a bug in the implementation of indexes on ENUM and SET types.
1094+
1095+RN47: Fixed a bug that caused a crash when an index was place on a BLOB column, and data was retrieved from the index directly.
1096+
1097+------- 0.9.73 Beta - 2006-10-31
1098+
1099+RN46: Updated test scripts to run with MySQL 5.1.13.
1100+
1101+------- 0.9.72 Beta - 2006-10-19
1102+
1103+RN45: Corrected compilation errors that occurred due to a change to struct st_mysql_plugin.
1104+
1105+------- 0.9.71 Beta - 2006-10-04
1106+
1107+RN44: Corrected compilation errors that occurred due to changes in the storage engine API.
1108+
1109+------- 0.9.7 Beta - 2006-09-20
1110+
1111+RN43: This is the first Beta release of PrimeBase XT. It has been integrated into MySQL 4.1.21 and is available as a plug-in for MySQL 5.1.12, or later. This version has been extensively tested using mysql-test-run, on various Linux and Mac OS X platforms.
1112+
1113+RN42: ++++ NOTE: This version is incompatible to older versions of PBXT ++++. Files created by older versions cannot be opened by version 0.9.7.
1114+
1115+RN41: Renaming or deleting a table while using a name with different case to the original created name did not work.
1116+
1117+RN40: Fixed a bug when grouping and searching on indexed columns that contain a null.
1118+
1119+RN39: Fixed bugs related to trailing spaces on VARCHAR values. Values that only vary by the number of trailing spaces (for example "aa" and "aa "), are now correctly handled as identical.
1120+
1121+RN38: The default AUTO_INCREMENT value was not correctly preserved during ALTER TABLE.
1122+
1123+RN37: Created a MySQL 5.1 Plugin version of PBXT. [TODO: 0017]
1124+
1125+RN36: Fixed a race condition in the row cache which had the affect that inserted rows dissappeared after cleanup because the cache was out of date. I was only able to reproduce this error on multi-processor machines.
1126+
1127+------- 0.9.6 - 2006-08-05
1128+
1129+RN35: ++++ NOTE: This version is incompatible to older versions of PBXT ++++.
1130+
1131+The disk format of tables and log files has changed slightly in this version. As a result, files created by older versions cannot be opened by version 0.9.6. An error will be generated. If you have data wish to preserve, first start the older version of XT and convert all tables to MyISAM. The stop the server and removed all transaction log file (files of the form xtlog-*.xt). Then start the new version and convert tables back to XT.
1132+
1133+RN34: Implemented READ COMMITTED transaction mode. XT now supports READ COMMITTED and SERIALIZABLE transaction modes. NOTE: if the mode is set to REPEATABLE READ, SERIALIZABLE is used. If the mode is set to READ UNCOMMITTED READ COMMITTED is used.
1134+
1135+RN33: The implementation of AUTO_INCREMENT on a paritial index is non-standard. A unique value is generated without regard to the value of the index prefix. For example, assume we have the following table: CREATE TABLE t1 (c1 CHAR(10) not null, c2 INT not null AUTO_INCREMENT, PRIMARY KEY(c1, c2));
1136+
1137+With the following contents: c1 c2
1138+ A 8
1139+ B 1
1140+
1141+After executing the following statement: insert into t1 (c1) values ('B');
1142+
1143+This is the result using PBXT: c1 c2
1144+ A 8
1145+ B 1
1146+ B 9
1147+
1148+The standard result would be: c1 c2
1149+ A 8
1150+ B 1
1151+ B 2
1152+
1153+RN32: PBXT does not permit access to multiple databases within a single transaction. For example:
1154+
1155+begin;
1156+update database_1.t1 set a=10;
1157+update database_2.t2 set d=10;
1158+commit;
1159+
1160+In this case the following error is returned: 1015: Can't lock file (errno: -1)
1161+
1162+RN31: The implementation of COUNT(*) has changed. For effectiency, rows are not counted. The information is taken from the header of the record (.xtr) files. This information is only 100% accurate after transaction cleanup has completed. Which basically means, only when PBXT is idle. ANALYZE TABLE waits for all background activity to stop, so the statement may be executed before a COUNT(*) to ensure an accurate result. NOTE: Other then waiting for background processes, ANALYSE TABLE is not implemented.
1163+
1164+RN30: Two concurrency bugs have been fixed: a shared lock was used instead of an exclusive lock when deleting from a transaction list, the transaction segment semaphore was not initialized. XT now runs correctly in a multi-processor environment. The test used was sysbench on a dual-process, dual-core, AMD 64-bit machine running SUSE Linux 10.0.
1165+
1166+RN29: PBXT compiles and runs on under 64-bit Lunix. [TODO: 0009]
1167+
1168+RN28: ./mysql-test-run --force --mysqld=--default-storage-engine=pbxt will now execute most tests successfully. Changes to the tests and the result have been documented in http://www.primebase.com/xt/download/pbxt-test-run-changes.txt. [TODO: 0004, 0019]
1169+
1170+RN27: Fixed a bug that caused the server to crash if when using tables locks and transactions. For example: LOCK TABLES, BEGIN, COMMIT, SELECT. This sequence now returns an error. The correct sequence is:
1171+
1172+LOCK TABLES, BEGIN, COMMIT, UNLOCK TABLES, SELECT
1173+or
1174+LOCK TABLES, BEGIN, COMMIT, BEGIN, SELECT COMMIT, UNLOCK TABLES
1175+
1176+RN26: Fixed a concurrency problem which caused a number of threads to hang during the sysbench test - see RN30 above (bug reported by Vadim).
1177+
1178+RN25: Fixed a bug that caused the server to hang when ha_pbxt::create() and ha_pbxt::ha_open() where given different, but equivalent paths for a particular table.
1179+
1180+RN24: Fixed bug in the indexing of blob columns, for example: create table t1(name_id int, name blob, INDEX name_idx (name(5)));
1181+
1182+RN23: When a duplicate key error occurs in auto-commit mode, the transaction is now rolled back.
1183+
1184+RN22: Fixed incorrect duplicate key error. In the case of a unique key which allows NULLs, duplicates are allowed if the inserted key contains a NULL. For example:
1185+
1186+create table t1 (id int not null, str char(10), unique(str));
1187+insert into t1 values (1, null),(2, null),(3, "foo"),(4, "bar");
1188+
1189+RN21: PBXT now returns the correct error code on duplicate key: 1062 instead of 1022.
1190+
1191+RN19: Implemented AUTO_INCREMENT on partial keys. However, the XT implementation is non-standard. Increment of partial index works, but the ID generated is incremented like a non-partial index. For example:
1192+
1193+create table t1 (c1 char(10) not null, c2 int not null auto_increment, primary key(c1, c2));
1194+select * from t1;
1195+c1 c2
1196+A 8
1197+B 1
1198+
1199+insert into t1 (c1) values ('B');
1200+select * from t1;
1201+c1 c2
1202+A 8
1203+B 1
1204+B 9
1205+
1206+The standard result would be:
1207+c1 c2
1208+A 8
1209+B 1
1210+B 2
1211+
1212+RN18: Implemented TRUNCATE TABLE and DELETE FROM <table>; (i.e. a DELETE without WHERE clause). Previously DELETE FROM <table>; did not cause an error, but no rows where deleted (TRUNCATE TABLE returned an error). [TODO: 0012, 0022]
1213+
1214+RN17: Implemented CREATE TABLE (...) auto_increment=<value>;
1215+
1216+------- 0.9.51 - 2006-07-06
1217+
1218+RN16: Fixed crash which could occur when creating the first table in a database (bug reported by Hakan).
1219+
1220+------- 0.9.5 - 2006-07-03
1221+
1222+RN15: This version concludes the re-structuring of the PBXT implementation. I have made a number of major changes, including:
1223+
1224+- All files except the transaction logs are now associated with a particular table. All table related files begin with the name of the table. The extension indicates the function.
1225+
1226+- I have merged the handle and the fixed length row data for performance reasons.
1227+
1228+- Only the variable size component of a row is stored in the data log files. As a result the data logs can now be considered as a type of "overflow" area.
1229+
1230+- Memory mapped files are no longer used because it is not possible to flush changes to the disk.
1231+
1232+RN14: File names have the following forms:
1233+
1234+[table-name]-[table-id].xtr - These files contains the table row pointers. Each row pointer occupies 8 bytes and refers to a list of records. The file name also contains the table ID. This is a unique number which is used internally by XT to identify the table.
1235+
1236+[table-name].xtd - This file contains the fixed length data of a table. Each data item includes a handle and a record. The handle references a record in the data log file if the table contains variable length records.
1237+
1238+[table-name].xti - This file contains the index data of the table.
1239+
1240+[table-name]-[log-id].xtl - This is a data log file. It contains the variable length data of the table. A table may have any number of data log files, each with a unique ID.
1241+
1242+xtlog-[log-id].xt - These files are the transaction logs. Log entries that specify updates reference a data file record. Each active thread has its own transaction log in order to avoid contension.
1243+
1244+RN13: Fixed the bug "Hang on DROP DATABASE". [TODO: 0016]
1245+
1246+RN12: PBXT currently only supports the "Serializable" transaction isolation level. This is the highest isolation level possible and includes the "repeatable-read" functionality [TODO: 0015]. This is implemented by giving every transaction a snapshot of the database at the point when the transaction is started.
1247+
1248+If the transaction tries to update a record that was updated by some other transaction after the snapshot was taken, a locked error is returned. A deadlock can occur if 2 transactions update the same record in a different order. PBXT can detect all deadlocks.
1249+
1250+RN11: I have implemented write buffering on the table data files. [TODO: 0013]
1251+
1252+RN10: The unique constraint (UNIQUE INDEX/PRIMARY KEY) is now checked correctly. [TODO: 0008]
1253+
1254+RN9: I have implemented a conventional B-tree algorithm for the indices (instead of the Lehman and Yoa B*-link tree). Although this reduces concurrency it improves the performance of queries significantly because of the simplicity of the algorithm. Deletion is also implemented in a very simple manner. [TODO: 0007]
1255+
1256+RN8: PBXT now has only 2 caches [TODO: 0006]:
1257+
1258+The Index Cache (pbxt_index_cache_size): This is the amount of memory the PBXT storage engine uses to cache index data and row pointers. This is all the data in the files with the extensions '.xti' and '.xtr'. This cache is managed in blocks of 2K.
1259+
1260+The Record Cache (pbxt_record_cache_size): This is the amount of memory the PBXT storage engine uses to cache table row data (handles and records). This is all the data in the files with the extension '.xtd'.
1261+
1262+The size of the caches are determined by the values of the system variables pbxt_index_cache_size and pbxt_row_cache_size. By default these values are set to 32MB.
1263+
1264+RN7: Auto-increment is now implemented in memory. This is done by doing a MAX() select when a table is first opened to get the high value. After that, then high value is incremented in memory on INSERT. On UPDATE (or INSERT) the value in memory is adjusted if necessary. This method also makes it possible for rows to be inserted simultaneously on the same table. [TODO: 0005, 0014]
1265+
1266+RN6: ./run-all-tests --create-options=TYPE=PBXT succeeds. [TODO: 0004]
1267+
1268+RN5: Using sql-bench and my own Java based test I have confirmed that PBXT behaves correctly during multi-threaded access. [PARTIALY TODO: 0002]
1269+
1270+RN4: Load/Stability test. Using sql-bench I have tested PBXT under load over a long period of time. [PARTIALY TODO: 0001]
1271+
1272+------- 0.9.2 - 2006-04-01
1273+
1274+RN3: Fixed a bug that cause the error "-6: Handle is out of range: [0:0]".
1275+
1276+RN2: Implemented SET, ENUM and YEAR data types.
1277+
1278+RN1: Fixed a bug in the error reporting when a table is created with a datatype that is not supported. [TODO: 0011]
1279+
1280+
1281
1282=== added file 'plugin/pbxt/Makefile.am'
1283--- plugin/pbxt/Makefile.am 1970-01-01 00:00:00 +0000
1284+++ plugin/pbxt/Makefile.am 2010-04-01 14:19:35 +0000
1285@@ -0,0 +1,3 @@
1286+SUBDIRS = src
1287+
1288+EXTRA_DIST = plug.ini
1289
1290=== added file 'plugin/pbxt/NEWS'
1291=== added file 'plugin/pbxt/README'
1292--- plugin/pbxt/README 1970-01-01 00:00:00 +0000
1293+++ plugin/pbxt/README 2010-04-01 14:19:35 +0000
1294@@ -0,0 +1,19 @@
1295+PrimeBase XT for MySQL 5.1
1296+==========================
1297+
1298+This is the PrimeBase XT (PBXT) transactional storage engine for MySQL. PBXT is "pluggable", which means that it can be loaded dynamically by MySQL at runtime. It uses a unique "write-once" update strategy and MVCC (multi-version concurrency control) to provide optimal performance over a wide range of tasks.
1299+
1300+This package includes the complete source code for the engine. Although this is a standalone project it must be built against a compiled version of the MySQL 5.1 source tree, because it references headers files used internally by the server.
1301+
1302+Details about how to build PBXT both under UNIX or Windows, as a standalone plug-in, or as part of the MySQL source code, is distribed in the documentation which is avaliable online at:
1303+
1304+http://www.primebase.org/documentation
1305+
1306+Bug reports, questions and comments can be sent directly to me.
1307+
1308+Thanks for your support!
1309+
1310+Paul McCullagh
1311+SNAP Innovation GmbH
1312+paul.mccullagh@primebase.org
1313+
1314
1315=== added file 'plugin/pbxt/TODO'
1316--- plugin/pbxt/TODO 1970-01-01 00:00:00 +0000
1317+++ plugin/pbxt/TODO 2010-04-01 14:19:35 +0000
1318@@ -0,0 +1,195 @@
1319+PBXT To-Do List
1320+===============
1321+
1322+My thanks to all who have downloaded and tested PBXT. If an issue you reported before the date below is not on this list, please e-mail me again.
1323+
1324+------- 2008-12-09
1325+
1326+0063: The option for not using memory mapped files must be fixed.
1327+
1328+0062: Dynamic option for using memory mapping on a table (Dimitri).
1329+
1330+------- 2008-09-12
1331+
1332+0061: Add records per key result to ha_pbxt:info() call (Mark).
1333+
1334+------- 2008-08-31
1335+
1336+0060: Add table option to determine if a table should be memory mapped or not (also requested by Dimitri).
1337+
1338+0059: Add table options:
1339+ AVG_ROW_LENGTH [=] value
1340+ DATA DIRECTORY [=] 'absolute path to directory'
1341+ INDEX DIRECTORY [=] 'absolute path to directory'
1342+ MAX_ROWS [=] value
1343+
1344+------- 2008-03-28
1345+
1346+0058: Consolidate writes when changes in the log are applied to the database.
1347+
1348+------- 2008-03-07
1349+
1350+0057: Cluster updates onto a single page.
1351+
1352+0056: Add checksum to index and data pages.
1353+
1354+0055: When no index cache is available, the complete index must be flushed (not just single pages).
1355+
1356+0054: Optimize indexes by not creating indexes that are a complete sub-set of some other index. In this case we must be able to identify part of an index as unique. For example: primary key (a, b), index (a, b, c). Here we would just create index (a, b, c), and specify that the part (a, b) must be unique. Operations on (a, b) will be directed to index (a, b, c).
1357+
1358+0053: Check and test lock tables.
1359+
1360+0052: Cache data log data in the handle data cache. Must be purged when a handle data record is written.
1361+
1362+0051: Write data log data alternatively to the transaction log. The compactor must then compact transaction logs.
1363+
1364+0050: [RESOLVED: RN126] Implement consistent write for indexes.
1365+
1366+0049: [RESOLVED: RN114] Set the index block size to 4K, or 16K as used by InnoDB.
1367+
1368+0048: [RESOLVED: RN110] Add row ID to indexes. This should only be set once the row is cleaned by the sweeper. Then the row ID can be used to make a quite check if the row is the most recent version.
1369+
1370+------- 2007-06-19
1371+
1372+0047: Test build with ./configure --with-innodb under Linux (Vadim).
1373+
1374+0046: [RESOLVED: RN85] Add plug.in file to enable drop in compile under Linux.
1375+
1376+0045: Provide libstdc++.so.6 binaries (Vadim).
1377+
1378+0044: [RESOLVED: RN73] Limit number of file handles used per table (Brian).
1379+
1380+0043: XA (two-phase commit) support (Peter).
1381+
1382+------- 2007-03-13
1383+
1384+0042: [RESOLVED: RN108] Implemement STATUS commands.
1385+
1386+0041: Implement index prefix compression.
1387+
1388+------- 2007-03-07
1389+
1390+0040: [RESOLVED: RN60] Update in-place when a transaction updates the same record more than once.
1391+
1392+0039: Set the number and size of the segments dynamically according to the amount of memory in the cache (and the number of CPUs?) (as discussed with: Peter & Vadim).
1393+
1394+0038: [RESOLVED: RN133] Improve the efficiency of the locks by using atomic compare and swap (Peter & Vadim).
1395+
1396+0037: [RESOLVED: RN133] Instead of a global LRU list, use a LRU list for segment of the cache (Peter & Vadim). [ Note: a global list using a TAS lock and change time (so that LRU is not always updated) is most efficient].
1397+
1398+0036: Add support for deferred foreign key checking (requested by: Mark).
1399+
1400+0035: [RESOLVED: RN71] Remove the 2000 table limit (reported by: Hakan).
1401+
1402+------- 2007-02-28
1403+
1404+0035: [RESOLVED: RN74, RN107] Build in the PBXT system parameters (currently they must be set using environment variables.
1405+
1406+0034: [RESOLVED: RN117] Initial documentation (yes, it must be done!)
1407+
1408+0033: Make the error code returned on lock error configurable.
1409+
1410+0032: [RESOLVED: RN65] Create a source code pluggable version for Windows.
1411+
1412+0031: [RESOLVED: RN66] PBXT corrupts the index file when the size exceeds 4 GB (reported by: Luciano)
1413+
1414+0030: [RESOLVED: RN102] Implement pbxt_index_flush_delay. Postpones index writing in order to speed up imports. [Resolution uses that fact hat index entries that are missing are added during recovery. As a result, index flushing can be delayed.]
1415+
1416+0029: [RESOLVED: RN103] Implement SELECT ... FOR UPDATE (recommended by: Robin).
1417+
1418+------- 2007-02-14
1419+
1420+0028: Implement CREATE TABLE ... DATA/INDEX DIRECTORY (suggested by: Robin).
1421+
1422+------- 2006-12-06
1423+
1424+0027: [RESOLVED: RN53] Bug in pbxt with query caching (reported by: Giuseppe) caused violation of transaction isolation.
1425+
1426+------- 2006-08-05
1427+
1428+0026: Implement BACKUP and RESTORE table (planned for the first post release version).
1429+
1430+0025: Implement DISABLE/ENABLE KEYS. Works for FOREIGN KEYs, currently no plans to implement for disabling indexes.
1431+
1432+0024: Implement ANALYZE TABLE (planned for the first post release version).
1433+
1434+0023: Implement CHECK TABLE (planned for the first release candidate).
1435+
1436+0022: [RESOLVED: RN18] Implement TRUNCATE TABLE and DELETE FROM <table>; (i.e. a DELETE without WHERE clause). Currently this function does not cause an error, but no rows are deleted.
1437+
1438+------- 2006-07-06
1439+
1440+0021: [RESOLVED: RN28] .../mysql-test/mysql-test-run --force --mysqld=--default-storage-engine=pbxt produces a number of errors (reported by: Hakan): As far as I can tell some failures are unnessary but others are bugs. All need to be checked.
1441+
1442+------- 2006-07-03
1443+
1444+0020: [RESOLVED: RN49] Implement referential integrity (planned for the first release candidate).
1445+
1446+------- 2006-04-01
1447+
1448+0019: [RESOLVED: RN28] mysql-test-run hangs on alter table (reported by: Hakan): Running a test like ./mysql-test-run.pl --mysqld=--default-storage-engine=pbxt, hangs on ALTER TABLE.
1449+
1450+0018: Implement GEOMETRY date type. Note: There are currently no plans to implement this feature.
1451+
1452+------- 2006-03-31
1453+
1454+0017: [RESOLVED: RN37] MySQL 5.x Version (reported by: Ronald, Giuseppe).
1455+
1456+0016: [RESOLVED: RN13] Hang on "DROP DATABASE" (reported by: Giuseppe). Load the world database (http://downloads.mysql.com/docs/world.sql) and convert all tables into PBXT. Then, the drop database command hangs.
1457+
1458+0015: [RESOLVED: RN12] Implement isolation level "repeatable read" (reported by: Giuseppe). Current PBXT only supports isolation level "committed read". This means committed data can be seen no matter when it was committed. Use SELECT ... FOR UPDATE to guarantee repeatable read, on data already read.
1459+
1460+0014: [RESOLVED: RN7] Two transactions cannot insert simaltaneously if they use auto_increment (reported by: Giuseppe). See also 0005.
1461+
1462+0013: [RESOLVED: RN11] Implement buffered write (reported by: Giuseppe): Lack of buffered write leads to bad performance in operations such as ALTER TABLE ENGINE = PBXT and INSERT ... SELECT.
1463+
1464+0012: [RESOLVED: RN18] TRUNCATE does not work (reported by: Giuseppe)
1465+
1466+0011: [RESOLVED: RN2] Load Sakila Sample Database (reported by: Ronald): ALTER TABLE film ENGINE=PBXT; fails
1467+
1468+0010: [RESOLVED: RN6] sql-bench (reported by: Dmitry): ./run-all-tests --create-options=TYPE=PBXT fails.
1469+
1470+0009: [RESOLVED: RN29] 64-bit Linux (reported by: Hakan): PBXT current does not compile under 64-bit Linux.
1471+
1472+------- 2006-03-16
1473+
1474+0008: [RESOLVED: RN10] Enforcing the unique index constraint:
1475+
1476+An index declared as "unique" must return a "duplicate unique key" error when inserting a duplicate value. The difficulty part of implementing this in PBXT is that we may encounter a duplicate value that has not yet been committed. The index reading thread must then wait for the transaction to commit or abort.
1477+
1478+0007: [RESOLVED: RN9] Cleaning up empty index nodes:
1479+
1480+The Lehman and Yoa algorithm used for indexing does not describe a way of cleaning up empty index nodes on-the-fly. A search of the relevant literature for an algorithm also turns up empty handed (periodic "reorg" is mostly suggested). I have subsequently devised an algorithm that will do the job. This needs to be implemented.
1481+
1482+0006: [RESOLVED: RN8] Cache Balancing:
1483+
1484+PBXT uses a number of small caches in order to improve concurrency (rather than one large cache). A process is required to manage the amount of cache memory used as a whole. The process must distribute the overall amount of memory available for caching over the small caches, according to demand.
1485+
1486+0005: [RESOLVED: RN7] Implement a faster auto-increment method
1487+
1488+Currently the auto-increment is handled by the default method used in MySQL. This is done by performing a "fetch-last" on the index for each insert to find the highest key value. This works well unless there are large number empty index nodes due to the problem described in (2) above.
1489+
1490+PBXT Testing To-Do List
1491+
1492+This is my first take on what still must be tested. My thanks to Ronald Bradford who is working on a generic testing framework that can be used to test PBXT.
1493+
1494+0004: [RESOLVED: RN6, RN28] MySQL Tests:
1495+
1496+Several tests (for mysql-test-run) written for other engines can be adapted and used to test PBXT.
1497+
1498+0003: [RESOLVED: RN30] Multi-processor Test:
1499+
1500+There is a difference between preemptive multitasking and true multitasking, which you have on a multi-processor (or dual core) machine. I don't expect any fundamental problems here, but it must be tested.
1501+
1502+0002: [RESOLVED: RN5, RN30, RN43] Multi-user/locking Test:
1503+
1504+How does the engine perform with a number of concurrent users running various transactions on a number of different tables?
1505+This is a difficult test to write because it need to simulate a production situation. To test at least 2 or 3 machines is required. The idea is not to use too much data so that a lot of conflicts may occur.
1506+
1507+0001: [RESOLVED: RN4, RN43] Load/Stability Test:
1508+
1509+How does the engine perform under heavy load over a long period of time? How stable is the engine on power outage, etc?
1510+
1511+The test could use a variation of the test program written for test (3) above. At least 3 test machines would be required. The test must be modified to cause as much activity as possible. The test should monitor the performance under load.
1512+
1513+
1514
1515=== added file 'plugin/pbxt/plugin.am'
1516--- plugin/pbxt/plugin.am 1970-01-01 00:00:00 +0000
1517+++ plugin/pbxt/plugin.am 2010-04-01 14:19:35 +0000
1518@@ -0,0 +1,76 @@
1519+# Used to build Makefile.in
1520+
1521+noinst_LTLIBRARIES+= plugin/pbxt/libpbxt.la
1522+
1523+noinst_HEADERS+= \
1524+ plugin/pbxt/src/bsearch_xt.h
1525+ plugin/pbxt/src/cache_xt.h \
1526+ plugin/pbxt/src/ccutils_xt.h \
1527+ plugin/pbxt/src/database_xt.h \
1528+ plugin/pbxt/src/datadic_xt.h \
1529+ plugin/pbxt/src/datalog_xt.h \
1530+ plugin/pbxt/src/filesys_xt.h \
1531+ plugin/pbxt/src/hashtab_xt.h \
1532+ plugin/pbxt/src/ha_pbxt.h \
1533+ plugin/pbxt/src/heap_xt.h \
1534+ plugin/pbxt/src/index_xt.h \
1535+ plugin/pbxt/src/linklist_xt.h \
1536+ plugin/pbxt/src/memory_xt.h \
1537+ plugin/pbxt/src/myxt_xt.h \
1538+ plugin/pbxt/src/pthread_xt.h \
1539+ plugin/pbxt/src/restart_xt.h \
1540+ plugin/pbxt/src/sortedlist_xt.h \
1541+ plugin/pbxt/src/strutil_xt.h \
1542+ plugin/pbxt/src/tabcache_xt.h \
1543+ plugin/pbxt/src/table_xt.h \
1544+ plugin/pbxt/src/trace_xt.h \
1545+ plugin/pbxt/src/thread_xt.h \
1546+ plugin/pbxt/src/util_xt.h \
1547+ plugin/pbxt/src/xaction_xt.h \
1548+ plugin/pbxt/src/xactlog_xt.h \
1549+ plugin/pbxt/src/lock_xt.h \
1550+ plugin/pbxt/src/systab_xt.h \
1551+ plugin/pbxt/src/ha_xtsys.h \
1552+ plugin/pbxt/src/discover_xt.h \
1553+ plugin/pbxt/src/pbms.h \
1554+ plugin/pbxt/src/xt_config.h \
1555+ plugin/pbxt/src/xt_defs.h \
1556+ plugin/pbxt/src/xt_errno.h
1557+
1558+
1559+plugin_pbxt_libpbxt_la_CXXFLAGS= ${AM_CXXFLAGS} -DDRIZZLED -Wno-long-long -Wno-overloaded-virtual -Wno-sign-compare -Wno-unused-function
1560+plugin_pbxt_libpbxt_la_CFLAGS= ${AM_CFLAGS} -DDRIZZLED -std=c99
1561+
1562+plugin_pbxt_libpbxt_la_SOURCES= \
1563+ plugin/pbxt/src/bsearch_xt.cc \
1564+ plugin/pbxt/src/cache_xt.cc \
1565+ plugin/pbxt/src/ccutils_xt.cc \
1566+ plugin/pbxt/src/database_xt.cc \
1567+ plugin/pbxt/src/datadic_xt.cc \
1568+ plugin/pbxt/src/datalog_xt.cc \
1569+ plugin/pbxt/src/filesys_xt.cc \
1570+ plugin/pbxt/src/hashtab_xt.cc \
1571+ plugin/pbxt/src/ha_pbxt.cc \
1572+ plugin/pbxt/src/heap_xt.cc \
1573+ plugin/pbxt/src/index_xt.cc \
1574+ plugin/pbxt/src/linklist_xt.cc \
1575+ plugin/pbxt/src/memory_xt.cc \
1576+ plugin/pbxt/src/myxt_xt.cc \
1577+ plugin/pbxt/src/pthread_xt.cc \
1578+ plugin/pbxt/src/restart_xt.cc \
1579+ plugin/pbxt/src/sortedlist_xt.cc \
1580+ plugin/pbxt/src/strutil_xt.cc \
1581+ plugin/pbxt/src/tabcache_xt.cc \
1582+ plugin/pbxt/src/table_xt.cc \
1583+ plugin/pbxt/src/trace_xt.cc \
1584+ plugin/pbxt/src/thread_xt.cc \
1585+ plugin/pbxt/src/systab_xt.cc \
1586+ plugin/pbxt/src/ha_xtsys.cc \
1587+ plugin/pbxt/src/discover_xt.cc \
1588+ plugin/pbxt/src/util_xt.cc \
1589+ plugin/pbxt/src/xaction_xt.cc \
1590+ plugin/pbxt/src/xactlog_xt.cc \
1591+ plugin/pbxt/src/lock_xt.cc
1592+
1593+
1594+EXTRA_DIST+= CMakeLists.txt
1595
1596=== added file 'plugin/pbxt/plugin.ini'
1597--- plugin/pbxt/plugin.ini 1970-01-01 00:00:00 +0000
1598+++ plugin/pbxt/plugin.ini 2010-04-01 14:19:35 +0000
1599@@ -0,0 +1,25 @@
1600+#
1601+# Copyright (c) 2006, 2009, Innobase Oy. All Rights Reserved.
1602+#
1603+# This program is free software; you can redistribute it and/or modify it under
1604+# the terms of the GNU General Public License as published by the Free Software
1605+# Foundation; version 2 of the License.
1606+#
1607+# This program is distributed in the hope that it will be useful, but WITHOUT
1608+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
1609+# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
1610+#
1611+# You should have received a copy of the GNU General Public License along with
1612+# this program; if not, write to the Free Software Foundation, Inc., 59 Temple
1613+# Place, Suite 330, Boston, MA 02111-1307 USA
1614+#
1615+
1616+[plugin]
1617+name=pbxt
1618+title=PBXT Storage Engine
1619+description=MVCC-based transactional engine
1620+sources=src/ha_pbxt.cc
1621+load_by_default=yes
1622+libs=plugin/pbxt/libpbxt.la
1623+cflags=-DDRIZZLED -std=c99
1624+cxxflags=-DDRIZZLED -Wno-long-long -Wno-overloaded-virtual
1625
1626=== added directory 'plugin/pbxt/src'
1627=== added file 'plugin/pbxt/src/Makefile.am'
1628--- plugin/pbxt/src/Makefile.am 1970-01-01 00:00:00 +0000
1629+++ plugin/pbxt/src/Makefile.am 2010-04-01 14:19:35 +0000
1630@@ -0,0 +1,50 @@
1631+# Used to build Makefile.in
1632+
1633+MYSQLDATAdir = $(localstatedir)
1634+MYSQLSHAREdir = $(pkgdatadir)
1635+MYSQLBASEdir= $(prefix)
1636+MYSQLLIBdir= $(pkglibdir)
1637+pkgplugindir = $(pkglibdir)/plugin
1638+
1639+AM_CPPFLAGS = -I$(top_srcdir)
1640+
1641+LIBS =
1642+
1643+LDADD =
1644+
1645+noinst_HEADERS = bsearch_xt.h cache_xt.h ccutils_xt.h database_xt.h \
1646+ datadic_xt.h datalog_xt.h filesys_xt.h hashtab_xt.h \
1647+ ha_pbxt.h heap_xt.h index_xt.h linklist_xt.h \
1648+ memory_xt.h myxt_xt.h pthread_xt.h restart_xt.h \
1649+ sortedlist_xt.h strutil_xt.h \
1650+ tabcache_xt.h table_xt.h trace_xt.h thread_xt.h \
1651+ util_xt.h xaction_xt.h xactlog_xt.h lock_xt.h \
1652+ systab_xt.h ha_xtsys.h discover_xt.h backup_xt.h \
1653+ pbms.h pbms_enabled.h xt_config.h xt_defs.h xt_errno.h locklist_xt.h
1654+
1655+plugin_LTLIBRARIES = libpbxt.la
1656+
1657+libpbxt_la_SOURCES = bsearch_xt.cc cache_xt.cc ccutils_xt.cc database_xt.cc \
1658+ datadic_xt.cc datalog_xt.cc filesys_xt.cc hashtab_xt.cc \
1659+ ha_pbxt.cc heap_xt.cc index_xt.cc linklist_xt.cc \
1660+ memory_xt.cc myxt_xt.cc pthread_xt.cc restart_xt.cc \
1661+ pbms_enabled.cc sortedlist_xt.cc strutil_xt.cc \
1662+ tabcache_xt.cc table_xt.cc trace_xt.cc thread_xt.cc \
1663+ systab_xt.cc ha_xtsys.cc discover_xt.cc backup_xt.cc \
1664+ util_xt.cc xaction_xt.cc xactlog_xt.cc lock_xt.cc locklist_xt.cc
1665+
1666+libpbxt_la_LDFLAGS = -module
1667+
1668+# These are the warning Drizzle uses:
1669+# DRIZZLE_WARNINGS = -W -Wall -Wextra -pedantic -Wundef -Wredundant-decls -Wno-strict-aliasing -Wno-long-long -Wno-unused-parameter
1670+
1671+libpbxt_la_CXXFLAGS = $(AM_CXXFLAGS) -DMYSQL_DYNAMIC_PLUGIN -Wno-overloaded-virtual
1672+libpbxt_la_CFLAGS = $(AM_CFLAGS) -DMYSQL_DYNAMIC_PLUGIN -std=c99
1673+
1674+EXTRA_LIBRARIES = libpbxt.a
1675+noinst_LIBRARIES = libpbxt.a
1676+libpbxt_a_SOURCES = $(libpbxt_la_SOURCES)
1677+libpbxt_a_CXXFLAGS = $(AM_CXXFLAGS) -DDRIZZLED -Wno-long-long -Wno-overloaded-virtual
1678+libpbxt_a_CFLAGS = $(AM_CFLAGS) -DDRIZZLED -std=c99
1679+
1680+EXTRA_DIST = CMakeLists.txt
1681
1682=== added file 'plugin/pbxt/src/backup_xt.cc'
1683--- plugin/pbxt/src/backup_xt.cc 1970-01-01 00:00:00 +0000
1684+++ plugin/pbxt/src/backup_xt.cc 2010-04-01 14:19:35 +0000
1685@@ -0,0 +1,802 @@
1686+/* Copyright (c) 2009 PrimeBase Technologies GmbH
1687+ *
1688+ * PrimeBase XT
1689+ *
1690+ * This program is free software; you can redistribute it and/or modify
1691+ * it under the terms of the GNU General Public License as published by
1692+ * the Free Software Foundation; either version 2 of the License, or
1693+ * (at your option) any later version.
1694+ *
1695+ * This program is distributed in the hope that it will be useful,
1696+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
1697+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1698+ * GNU General Public License for more details.
1699+ *
1700+ * You should have received a copy of the GNU General Public License
1701+ * along with this program; if not, write to the Free Software
1702+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
1703+ *
1704+ * 2009-09-07 Paul McCullagh
1705+ *
1706+ * H&G2JCtL
1707+ */
1708+
1709+#include "xt_config.h"
1710+
1711+#ifdef MYSQL_SUPPORTS_BACKUP
1712+
1713+#include <string.h>
1714+#include <stdio.h>
1715+#include <stdlib.h>
1716+#include <time.h>
1717+#include <ctype.h>
1718+
1719+#include "mysql_priv.h"
1720+#include <backup/api_types.h>
1721+#include <backup/backup_engine.h>
1722+#include <backup/backup_aux.h> // for build_table_list()
1723+#include <hash.h>
1724+
1725+#include "ha_pbxt.h"
1726+
1727+#include "backup_xt.h"
1728+#include "pthread_xt.h"
1729+#include "filesys_xt.h"
1730+#include "database_xt.h"
1731+#include "strutil_xt.h"
1732+#include "memory_xt.h"
1733+#include "trace_xt.h"
1734+#include "myxt_xt.h"
1735+
1736+#ifdef OK
1737+#undef OK
1738+#endif
1739+
1740+#ifdef byte
1741+#undef byte
1742+#endif
1743+
1744+#ifdef DEBUG
1745+//#define TRACE_BACKUP_CALLS
1746+//#define TEST_SMALL_BLOCK 100000
1747+#endif
1748+
1749+using backup::byte;
1750+using backup::result_t;
1751+using backup::version_t;
1752+using backup::Table_list;
1753+using backup::Table_ref;
1754+using backup::Buffer;
1755+
1756+#ifdef TRACE_BACKUP_CALLS
1757+#define XT_TRACE_CALL() ha_trace_function(__FUNC__, NULL)
1758+#else
1759+#define XT_TRACE_CALL()
1760+#endif
1761+
1762+#define XT_RESTORE_BATCH_SIZE 10000
1763+
1764+#define BUP_STATE_BEFORE_LOCK 0
1765+#define BUP_STATE_AFTER_LOCK 1
1766+
1767+#define BUP_STANDARD_VAR_RECORD 1
1768+#define BUP_RECORD_BLOCK_4_START 2 // Part of a record, with a 4 byte total length, and 4 byte data length
1769+#define BUP_RECORD_BLOCK_4 3 // Part of a record, with a 4 byte length
1770+#define BUP_RECORD_BLOCK_4_END 4 // Last part of a record with a 4 byte length
1771+
1772+/*
1773+ * -----------------------------------------------------------------------
1774+ * UTILITIES
1775+ */
1776+
1777+#ifdef TRACE_BACKUP_CALLS
1778+static void ha_trace_function(const char *function, char *table)
1779+{
1780+ char func_buf[50], *ptr;
1781+ XTThreadPtr thread = xt_get_self();
1782+
1783+ if ((ptr = strchr(function, '('))) {
1784+ ptr--;
1785+ while (ptr > function) {
1786+ if (!(isalnum(*ptr) || *ptr == '_'))
1787+ break;
1788+ ptr--;
1789+ }
1790+ ptr++;
1791+ xt_strcpy(50, func_buf, ptr);
1792+ if ((ptr = strchr(func_buf, '(')))
1793+ *ptr = 0;
1794+ }
1795+ else
1796+ xt_strcpy(50, func_buf, function);
1797+ if (table)
1798+ printf("%s %s (%s)\n", thread ? thread->t_name : "-unknown-", func_buf, table);
1799+ else
1800+ printf("%s %s\n", thread ? thread->t_name : "-unknown-", func_buf);
1801+}
1802+#endif
1803+
1804+/*
1805+ * -----------------------------------------------------------------------
1806+ * BACKUP DRIVER
1807+ */
1808+
1809+class PBXTBackupDriver: public Backup_driver
1810+{
1811+ public:
1812+ PBXTBackupDriver(const Table_list &);
1813+ virtual ~PBXTBackupDriver();
1814+
1815+ virtual size_t size();
1816+ virtual size_t init_size();
1817+ virtual result_t begin(const size_t);
1818+ virtual result_t end();
1819+ virtual result_t get_data(Buffer &);
1820+ virtual result_t prelock();
1821+ virtual result_t lock();
1822+ virtual result_t unlock();
1823+ virtual result_t cancel();
1824+ virtual void free();
1825+ void lock_tables_TL_READ_NO_INSERT();
1826+
1827+ private:
1828+ XTThreadPtr bd_thread;
1829+ int bd_state;
1830+ u_int bd_table_no;
1831+ XTOpenTablePtr bd_ot;
1832+ xtWord1 *bd_row_buf;
1833+
1834+ /* Non-zero if we last returned only part of
1835+ * a row.
1836+ */
1837+ xtWord1 *db_write_block(xtWord1 *buffer, xtWord1 bup_type, size_t *size, xtWord4 row_len);
1838+ xtWord1 *db_write_block(xtWord1 *buffer, xtWord1 bup_type, size_t *size, xtWord4 total_len, xtWord4 row_len);
1839+
1840+ xtWord4 bd_row_offset;
1841+ xtWord4 bd_row_size;
1842+};
1843+
1844+
1845+PBXTBackupDriver::PBXTBackupDriver(const Table_list &tables):
1846+Backup_driver(tables),
1847+bd_state(BUP_STATE_BEFORE_LOCK),
1848+bd_table_no(0),
1849+bd_ot(NULL),
1850+bd_row_buf(NULL),
1851+bd_row_offset(0),
1852+bd_row_size(0)
1853+{
1854+}
1855+
1856+PBXTBackupDriver::~PBXTBackupDriver()
1857+{
1858+}
1859+
1860+/** Estimates total size of backup. @todo improve it */
1861+size_t PBXTBackupDriver::size()
1862+{
1863+ XT_TRACE_CALL();
1864+ return UNKNOWN_SIZE;
1865+}
1866+
1867+/** Estimates size of backup before lock. @todo improve it */
1868+size_t PBXTBackupDriver::init_size()
1869+{
1870+ XT_TRACE_CALL();
1871+ return 0;
1872+}
1873+
1874+result_t PBXTBackupDriver::begin(const size_t)
1875+{
1876+ THD *thd = current_thd;
1877+ XTExceptionRec e;
1878+
1879+ XT_TRACE_CALL();
1880+
1881+ if (!(bd_thread = xt_ha_set_current_thread(thd, &e))) {
1882+ xt_log_exception(NULL, &e, XT_LOG_DEFAULT);
1883+ return backup::ERROR;
1884+ }
1885+
1886+ return backup::OK;
1887+}
1888+
1889+result_t PBXTBackupDriver::end()
1890+{
1891+ XT_TRACE_CALL();
1892+ if (bd_ot) {
1893+ xt_tab_seq_exit(bd_ot);
1894+ xt_db_return_table_to_pool_ns(bd_ot);
1895+ bd_ot = NULL;
1896+ }
1897+ if (bd_thread->st_xact_data) {
1898+ if (!xt_xn_commit(bd_thread))
1899+ return backup::ERROR;
1900+ }
1901+ return backup::OK;
1902+}
1903+
1904+xtWord1 *PBXTBackupDriver::db_write_block(xtWord1 *buffer, xtWord1 bup_type, size_t *ret_size, xtWord4 row_len)
1905+{
1906+ register size_t size = *ret_size;
1907+
1908+ *buffer = bup_type; // Record type identifier.
1909+ buffer++;
1910+ size--;
1911+ memcpy(buffer, bd_ot->ot_row_wbuffer, row_len);
1912+ buffer += row_len;
1913+ size -= row_len;
1914+ *ret_size = size;
1915+ return buffer;
1916+}
1917+
1918+xtWord1 *PBXTBackupDriver::db_write_block(xtWord1 *buffer, xtWord1 bup_type, size_t *ret_size, xtWord4 total_len, xtWord4 row_len)
1919+{
1920+ register size_t size = *ret_size;
1921+
1922+ *buffer = bup_type; // Record type identifier.
1923+ buffer++;
1924+ size--;
1925+ if (bup_type == BUP_RECORD_BLOCK_4_START) {
1926+ XT_SET_DISK_4(buffer, total_len);
1927+ buffer += 4;
1928+ size -= 4;
1929+ }
1930+ XT_SET_DISK_4(buffer, row_len);
1931+ buffer += 4;
1932+ size -= 4;
1933+ memcpy(buffer, bd_ot->ot_row_wbuffer+bd_row_offset, row_len);
1934+ buffer += row_len;
1935+ size -= row_len;
1936+ bd_row_size -= row_len;
1937+ bd_row_offset += row_len;
1938+ *ret_size = size;
1939+ return buffer;
1940+}
1941+
1942+result_t PBXTBackupDriver::get_data(Buffer &buf)
1943+{
1944+ xtBool eof = FALSE;
1945+ size_t size;
1946+ xtWord4 row_len;
1947+ xtWord1 *buffer;
1948+
1949+ XT_TRACE_CALL();
1950+
1951+ if (bd_state == BUP_STATE_BEFORE_LOCK) {
1952+ buf.table_num = 0;
1953+ buf.size = 0;
1954+ buf.last = FALSE;
1955+ return backup::READY;
1956+ }
1957+
1958+ /* Open the backup table: */
1959+ if (!bd_ot) {
1960+ XTThreadPtr self = bd_thread;
1961+ XTTableHPtr tab;
1962+ char path[PATH_MAX];
1963+
1964+ if (bd_table_no == m_tables.count()) {
1965+ buf.size = 0;
1966+ buf.table_num = 0;
1967+ buf.last = TRUE;
1968+ return backup::DONE;
1969+ }
1970+
1971+ m_tables[bd_table_no].internal_name(path, sizeof(path));
1972+ bd_table_no++;
1973+ try_(a) {
1974+ xt_ha_open_database_of_table(self, (XTPathStrPtr) path);
1975+ tab = xt_use_table(self, (XTPathStrPtr) path, FALSE, FALSE, NULL);
1976+ pushr_(xt_heap_release, tab);
1977+ if (!(bd_ot = xt_db_open_table_using_tab(tab, bd_thread)))
1978+ xt_throw(self);
1979+ freer_(); // xt_heap_release(tab)
1980+
1981+ /* Prepare the seqential scan: */
1982+ xt_tab_seq_exit(bd_ot);
1983+ if (!xt_tab_seq_init(bd_ot))
1984+ xt_throw(self);
1985+
1986+ if (bd_row_buf) {
1987+ xt_free(self, bd_row_buf);
1988+ bd_row_buf = NULL;
1989+ }
1990+ bd_row_buf = (xtWord1 *) xt_malloc(self, bd_ot->ot_table->tab_dic.dic_mysql_buf_size);
1991+ bd_ot->ot_cols_req = bd_ot->ot_table->tab_dic.dic_no_of_cols;
1992+ }
1993+ catch_(a) {
1994+ ;
1995+ }
1996+ cont_(a);
1997+
1998+ if (!bd_ot)
1999+ goto failed;
2000+ }
2001+
2002+ buf.table_num = bd_table_no;
2003+#ifdef TEST_SMALL_BLOCK
2004+ buf.size = TEST_SMALL_BLOCK;
2005+#endif
2006+ size = buf.size;
2007+ buffer = (xtWord1 *) buf.data;
2008+ ASSERT_NS(size > 9);
2009+
2010+ /* First check of a record was partically written
2011+ * last time.
2012+ */
2013+ write_row:
2014+ if (bd_row_size > 0) {
2015+ row_len = bd_row_size;
2016+ if (bd_row_offset == 0) {
2017+ if (row_len+1 > size) {
2018+ ASSERT_NS(size > 9);
2019+ row_len = size - 9;
2020+ buffer = db_write_block(buffer, BUP_RECORD_BLOCK_4_START, &size, bd_row_size, row_len);
2021+ goto done;
2022+ }
2023+ buffer = db_write_block(buffer, BUP_STANDARD_VAR_RECORD, &size, row_len);
2024+ bd_row_size = 0;
2025+ }
2026+ else {
2027+ if (row_len+5 > size) {
2028+ row_len = size - 5;
2029+ buffer = db_write_block(buffer, BUP_RECORD_BLOCK_4, &size, 0, row_len);
2030+ goto done;
2031+ }
2032+ buffer = db_write_block(buffer, BUP_RECORD_BLOCK_4_END, &size, 0, row_len);
2033+ }
2034+ }
2035+
2036+ /* Now continue with the sequential scan. */
2037+ while (size > 1) {
2038+ if (!xt_tab_seq_next(bd_ot, bd_row_buf, &eof))
2039+ goto failed;
2040+ if (eof) {
2041+ /* We will go the next table, on the next call. */
2042+ xt_tab_seq_exit(bd_ot);
2043+ xt_db_return_table_to_pool_ns(bd_ot);
2044+ bd_ot = NULL;
2045+ break;
2046+ }
2047+ if (!(row_len = myxt_store_row_data(bd_ot, 0, (char *) bd_row_buf)))
2048+ goto failed;
2049+ if (row_len+1 > size) {
2050+ /* Does not fit: */
2051+ bd_row_offset = 0;
2052+ bd_row_size = row_len;
2053+ /* Only add part of the row, if there is still
2054+ * quite a bit of space left:
2055+ */
2056+ if (size >= (32 * 1024))
2057+ goto write_row;
2058+ break;
2059+ }
2060+ buffer = db_write_block(buffer, BUP_STANDARD_VAR_RECORD, &size, row_len);
2061+ }
2062+
2063+ done:
2064+ buf.size = buf.size - size;
2065+ /* This indicates wnd of data for a table! */
2066+ buf.last = eof;
2067+
2068+ return backup::OK;
2069+
2070+ failed:
2071+ xt_log_and_clear_exception(bd_thread);
2072+ return backup::ERROR;
2073+}
2074+
2075+result_t PBXTBackupDriver::prelock()
2076+{
2077+ XT_TRACE_CALL();
2078+ return backup::READY;
2079+}
2080+
2081+result_t PBXTBackupDriver::lock()
2082+{
2083+ XT_TRACE_CALL();
2084+ bd_thread->st_xact_mode = XT_XACT_COMMITTED_READ;
2085+ bd_thread->st_ignore_fkeys = FALSE;
2086+ bd_thread->st_auto_commit = FALSE;
2087+ bd_thread->st_table_trans = FALSE;
2088+ bd_thread->st_abort_trans = FALSE;
2089+ bd_thread->st_stat_ended = FALSE;
2090+ bd_thread->st_stat_trans = FALSE;
2091+ bd_thread->st_is_update = FALSE;
2092+ if (!xt_xn_begin(bd_thread))
2093+ return backup::ERROR;
2094+ bd_state = BUP_STATE_AFTER_LOCK;
2095+ return backup::OK;
2096+}
2097+
2098+result_t PBXTBackupDriver::unlock()
2099+{
2100+ XT_TRACE_CALL();
2101+ return backup::OK;
2102+}
2103+
2104+result_t PBXTBackupDriver::cancel()
2105+{
2106+ XT_TRACE_CALL();
2107+ return backup::OK; // free() will be called and suffice
2108+}
2109+
2110+void PBXTBackupDriver::free()
2111+{
2112+ XT_TRACE_CALL();
2113+ if (bd_ot) {
2114+ xt_tab_seq_exit(bd_ot);
2115+ xt_db_return_table_to_pool_ns(bd_ot);
2116+ bd_ot = NULL;
2117+ }
2118+ if (bd_row_buf) {
2119+ xt_free_ns(bd_row_buf);
2120+ bd_row_buf = NULL;
2121+ }
2122+ if (bd_thread->st_xact_data)
2123+ xt_xn_rollback(bd_thread);
2124+ delete this;
2125+}
2126+
2127+void PBXTBackupDriver::lock_tables_TL_READ_NO_INSERT()
2128+{
2129+ XT_TRACE_CALL();
2130+}
2131+
2132+/*
2133+ * -----------------------------------------------------------------------
2134+ * BACKUP DRIVER
2135+ */
2136+
2137+class PBXTRestoreDriver: public Restore_driver
2138+{
2139+ public:
2140+ PBXTRestoreDriver(const Table_list &tables);
2141+ virtual ~PBXTRestoreDriver();
2142+
2143+ virtual result_t begin(const size_t);
2144+ virtual result_t end();
2145+ virtual result_t send_data(Buffer &buf);
2146+ virtual result_t cancel();
2147+ virtual void free();
2148+
2149+ private:
2150+ XTThreadPtr rd_thread;
2151+ u_int rd_table_no;
2152+ XTOpenTablePtr rd_ot;
2153+ STRUCT_TABLE *rd_my_table;
2154+ xtWord1 *rb_row_buf;
2155+ u_int rb_col_cnt;
2156+ u_int rb_insert_count;
2157+
2158+ /* Long rows are accumulated here: */
2159+ xtWord4 rb_row_len;
2160+ xtWord4 rb_data_size;
2161+ xtWord1 *rb_row_data;
2162+};
2163+
2164+PBXTRestoreDriver::PBXTRestoreDriver(const Table_list &tables):
2165+Restore_driver(tables),
2166+rd_thread(NULL),
2167+rd_table_no(0),
2168+rd_ot(NULL),
2169+rb_row_buf(NULL),
2170+rb_row_len(0),
2171+rb_data_size(0),
2172+rb_row_data(NULL)
2173+{
2174+}
2175+
2176+PBXTRestoreDriver::~PBXTRestoreDriver()
2177+{
2178+}
2179+
2180+result_t PBXTRestoreDriver::begin(const size_t)
2181+{
2182+ THD *thd = current_thd;
2183+ XTExceptionRec e;
2184+
2185+ XT_TRACE_CALL();
2186+
2187+ if (!(rd_thread = xt_ha_set_current_thread(thd, &e))) {
2188+ xt_log_exception(NULL, &e, XT_LOG_DEFAULT);
2189+ return backup::ERROR;
2190+ }
2191+
2192+ return backup::OK;
2193+}
2194+
2195+result_t PBXTRestoreDriver::end()
2196+{
2197+ XT_TRACE_CALL();
2198+ if (rd_ot) {
2199+ xt_db_return_table_to_pool_ns(rd_ot);
2200+ rd_ot = NULL;
2201+ }
2202+ //if (rb_row_buf) {
2203+ // xt_free_ns(rb_row_buf);
2204+ // rb_row_buf = NULL;
2205+ //}
2206+ if (rb_row_data) {
2207+ xt_free_ns(rb_row_data);
2208+ rb_row_data = NULL;
2209+ }
2210+ if (rd_thread->st_xact_data) {
2211+ if (!xt_xn_commit(rd_thread))
2212+ return backup::ERROR;
2213+ }
2214+ return backup::OK;
2215+}
2216+
2217+
2218+result_t PBXTRestoreDriver::send_data(Buffer &buf)
2219+{
2220+ size_t size;
2221+ xtWord1 type;
2222+ xtWord1 *buffer;
2223+ xtWord4 row_len;
2224+ xtWord1 *rec_data;
2225+
2226+ XT_TRACE_CALL();
2227+
2228+ if (buf.table_num != rd_table_no) {
2229+ XTThreadPtr self = rd_thread;
2230+ XTTableHPtr tab;
2231+ char path[PATH_MAX];
2232+
2233+ if (rd_ot) {
2234+ xt_db_return_table_to_pool_ns(rd_ot);
2235+ rd_ot = NULL;
2236+ }
2237+
2238+ if (rd_thread->st_xact_data) {
2239+ if (!xt_xn_commit(rd_thread))
2240+ goto failed;
2241+ }
2242+ if (!xt_xn_begin(rd_thread))
2243+ goto failed;
2244+ rb_insert_count = 0;
2245+
2246+ rd_table_no = buf.table_num;
2247+ m_tables[rd_table_no-1].internal_name(path, sizeof(path));
2248+ try_(a) {
2249+ xt_ha_open_database_of_table(self, (XTPathStrPtr) path);
2250+ tab = xt_use_table(self, (XTPathStrPtr) path, FALSE, FALSE, NULL);
2251+ pushr_(xt_heap_release, tab);
2252+ if (!(rd_ot = xt_db_open_table_using_tab(tab, rd_thread)))
2253+ xt_throw(self);
2254+ freer_(); // xt_heap_release(tab)
2255+
2256+ rd_my_table = rd_ot->ot_table->tab_dic.dic_my_table;
2257+ if (rd_my_table->found_next_number_field) {
2258+ rd_my_table->in_use = current_thd;
2259+ rd_my_table->next_number_field = rd_my_table->found_next_number_field;
2260+ rd_my_table->mark_columns_used_by_index_no_reset(rd_my_table->s->next_number_index, rd_my_table->read_set);
2261+ }
2262+
2263+ /* This is safe because only one thread can restore a table at
2264+ * a time!
2265+ */
2266+ rb_row_buf = (xtWord1 *) rd_my_table->record[0];
2267+ //if (rb_row_buf) {
2268+ // xt_free(self, rb_row_buf);
2269+ // rb_row_buf = NULL;
2270+ //}
2271+ //rb_row_buf = (xtWord1 *) xt_malloc(self, rd_ot->ot_table->tab_dic.dic_mysql_buf_size);
2272+
2273+ rb_col_cnt = rd_ot->ot_table->tab_dic.dic_no_of_cols;
2274+
2275+ }
2276+ catch_(a) {
2277+ ;
2278+ }
2279+ cont_(a);
2280+
2281+ if (!rd_ot)
2282+ goto failed;
2283+ }
2284+
2285+ buffer = (xtWord1 *) buf.data;
2286+ size = buf.size;
2287+
2288+ while (size > 0) {
2289+ type = *buffer;
2290+ switch (type) {
2291+ case BUP_STANDARD_VAR_RECORD:
2292+ rec_data = buffer + 1;
2293+ break;
2294+ case BUP_RECORD_BLOCK_4_START:
2295+ buffer++;
2296+ row_len = XT_GET_DISK_4(buffer);
2297+ buffer += 4;
2298+ if (rb_data_size < row_len) {
2299+ if (!xt_realloc_ns((void **) &rb_row_data, row_len))
2300+ goto failed;
2301+ rb_data_size = row_len;
2302+ }
2303+ row_len = XT_GET_DISK_4(buffer);
2304+ buffer += 4;
2305+ ASSERT_NS(row_len <= rb_data_size);
2306+ if (row_len > rb_data_size) {
2307+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2308+ goto failed;
2309+ }
2310+ memcpy(rb_row_data, buffer, row_len);
2311+ rb_row_len = row_len;
2312+ buffer += row_len;
2313+ if (row_len + 9 > size) {
2314+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2315+ goto failed;
2316+ }
2317+ size -= row_len + 9;
2318+ continue;
2319+ case BUP_RECORD_BLOCK_4:
2320+ buffer++;
2321+ row_len = XT_GET_DISK_4(buffer);
2322+ buffer += 4;
2323+ ASSERT_NS(rb_row_len + row_len <= rb_data_size);
2324+ if (rb_row_len + row_len > rb_data_size) {
2325+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2326+ goto failed;
2327+ }
2328+ memcpy(rb_row_data + rb_row_len, buffer, row_len);
2329+ rb_row_len += row_len;
2330+ buffer += row_len;
2331+ if (row_len + 5 > size) {
2332+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2333+ goto failed;
2334+ }
2335+ size -= row_len + 5;
2336+ continue;
2337+ case BUP_RECORD_BLOCK_4_END:
2338+ buffer++;
2339+ row_len = XT_GET_DISK_4(buffer);
2340+ buffer += 4;
2341+ ASSERT_NS(rb_row_len + row_len <= rb_data_size);
2342+ if (rb_row_len + row_len > rb_data_size) {
2343+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2344+ goto failed;
2345+ }
2346+ memcpy(rb_row_data + rb_row_len, buffer, row_len);
2347+ buffer += row_len;
2348+ if (row_len + 5 > size) {
2349+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2350+ goto failed;
2351+ }
2352+ size -= row_len + 5;
2353+ rec_data = rb_row_data;
2354+ break;
2355+ default:
2356+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2357+ goto failed;
2358+ }
2359+
2360+ if (!(row_len = myxt_load_row_data(rd_ot, rec_data, rb_row_buf, rb_col_cnt)))
2361+ goto failed;
2362+
2363+ if (rd_ot->ot_table->tab_dic.dic_my_table->found_next_number_field)
2364+ ha_set_auto_increment(rd_ot, rd_ot->ot_table->tab_dic.dic_my_table->found_next_number_field);
2365+
2366+ if (!xt_tab_new_record(rd_ot, rb_row_buf))
2367+ goto failed;
2368+
2369+ if (type == BUP_STANDARD_VAR_RECORD) {
2370+ buffer += row_len+1;
2371+ if (row_len + 1 > size) {
2372+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_BAD_BACKUP_FORMAT);
2373+ goto failed;
2374+ }
2375+ size -= row_len + 1;
2376+ }
2377+
2378+ rb_insert_count++;
2379+ if (rb_insert_count == XT_RESTORE_BATCH_SIZE) {
2380+ if (!xt_xn_commit(rd_thread))
2381+ goto failed;
2382+ if (!xt_xn_begin(rd_thread))
2383+ goto failed;
2384+ rb_insert_count = 0;
2385+ }
2386+ }
2387+
2388+ return backup::OK;
2389+
2390+ failed:
2391+ xt_log_and_clear_exception(rd_thread);
2392+ return backup::ERROR;
2393+}
2394+
2395+
2396+result_t PBXTRestoreDriver::cancel()
2397+{
2398+ XT_TRACE_CALL();
2399+ /* Nothing to do in cancel(); free() will suffice */
2400+ return backup::OK;
2401+}
2402+
2403+void PBXTRestoreDriver::free()
2404+{
2405+ XT_TRACE_CALL();
2406+ if (rd_ot) {
2407+ xt_db_return_table_to_pool_ns(rd_ot);
2408+ rd_ot = NULL;
2409+ }
2410+ //if (rb_row_buf) {
2411+ // xt_free_ns(rb_row_buf);
2412+ // rb_row_buf = NULL;
2413+ //}
2414+ if (rb_row_data) {
2415+ xt_free_ns(rb_row_data);
2416+ rb_row_data = NULL;
2417+ }
2418+ if (rd_thread->st_xact_data)
2419+ xt_xn_rollback(rd_thread);
2420+ delete this;
2421+}
2422+
2423+/*
2424+ * -----------------------------------------------------------------------
2425+ * BACKUP ENGINE FACTORY
2426+ */
2427+
2428+#define PBXT_BACKUP_VERSION 1
2429+
2430+
2431+class PBXTBackupEngine: public Backup_engine
2432+{
2433+ public:
2434+ PBXTBackupEngine() { };
2435+
2436+ virtual version_t version() const {
2437+ return PBXT_BACKUP_VERSION;
2438+ };
2439+
2440+ virtual result_t get_backup(const uint32, const Table_list &, Backup_driver* &);
2441+
2442+ virtual result_t get_restore(const version_t, const uint32, const Table_list &,Restore_driver* &);
2443+
2444+ virtual void free()
2445+ {
2446+ delete this;
2447+ }
2448+};
2449+
2450+result_t PBXTBackupEngine::get_backup(const u_int count, const Table_list &tables, Backup_driver* &drv)
2451+{
2452+ PBXTBackupDriver *ptr = new PBXTBackupDriver(tables);
2453+
2454+ if (!ptr)
2455+ return backup::ERROR;
2456+ drv = ptr;
2457+ return backup::OK;
2458+}
2459+
2460+result_t PBXTBackupEngine::get_restore(const version_t ver, const uint32,
2461+ const Table_list &tables, Restore_driver* &drv)
2462+{
2463+ if (ver > PBXT_BACKUP_VERSION)
2464+ {
2465+ return backup::ERROR;
2466+ }
2467+
2468+ PBXTRestoreDriver *ptr = new PBXTRestoreDriver(tables);
2469+
2470+ if (!ptr)
2471+ return backup::ERROR;
2472+ drv = (Restore_driver *) ptr;
2473+ return backup::OK;
2474+}
2475+
2476+
2477+Backup_result_t pbxt_backup_engine(handlerton *self, Backup_engine* &be)
2478+{
2479+ be = new PBXTBackupEngine();
2480+
2481+ if (!be)
2482+ return backup::ERROR;
2483+
2484+ return backup::OK;
2485+}
2486+
2487+#endif
2488
2489=== added file 'plugin/pbxt/src/backup_xt.h'
2490--- plugin/pbxt/src/backup_xt.h 1970-01-01 00:00:00 +0000
2491+++ plugin/pbxt/src/backup_xt.h 2010-04-01 14:19:35 +0000
2492@@ -0,0 +1,34 @@
2493+/* Copyright (c) 2009 PrimeBase Technologies GmbH
2494+ *
2495+ * PrimeBase XT
2496+ *
2497+ * This program is free software; you can redistribute it and/or modify
2498+ * it under the terms of the GNU General Public License as published by
2499+ * the Free Software Foundation; either version 2 of the License, or
2500+ * (at your option) any later version.
2501+ *
2502+ * This program is distributed in the hope that it will be useful,
2503+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2504+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2505+ * GNU General Public License for more details.
2506+ *
2507+ * You should have received a copy of the GNU General Public License
2508+ * along with this program; if not, write to the Free Software
2509+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2510+ *
2511+ * 2009-09-07 Paul McCullagh
2512+ *
2513+ * H&G2JCtL
2514+ */
2515+
2516+#ifndef __backup_xt_h__
2517+#define __backup_xt_h__
2518+
2519+#include "xt_defs.h"
2520+
2521+#ifdef MYSQL_SUPPORTS_BACKUP
2522+
2523+Backup_result_t pbxt_backup_engine(handlerton *self, Backup_engine* &be);
2524+
2525+#endif
2526+#endif
2527
2528=== added file 'plugin/pbxt/src/bsearch_xt.cc'
2529--- plugin/pbxt/src/bsearch_xt.cc 1970-01-01 00:00:00 +0000
2530+++ plugin/pbxt/src/bsearch_xt.cc 2010-04-01 14:19:35 +0000
2531@@ -0,0 +1,66 @@
2532+/* Copyright (c) 2005 PrimeBase Technologies GmbH
2533+ *
2534+ * PrimeBase XT
2535+ *
2536+ * This program is free software; you can redistribute it and/or modify
2537+ * it under the terms of the GNU General Public License as published by
2538+ * the Free Software Foundation; either version 2 of the License, or
2539+ * (at your option) any later version.
2540+ *
2541+ * This program is distributed in the hope that it will be useful,
2542+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2543+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2544+ * GNU General Public License for more details.
2545+ *
2546+ * You should have received a copy of the GNU General Public License
2547+ * along with this program; if not, write to the Free Software
2548+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2549+ *
2550+ * 2004-01-03 Paul McCullagh
2551+ *
2552+ * H&G2JCtL
2553+ */
2554+
2555+#include "xt_config.h"
2556+
2557+#include <stdio.h>
2558+
2559+#include "bsearch_xt.h"
2560+#include "pthread_xt.h"
2561+#include "thread_xt.h"
2562+
2563+/**
2564+ * Binary search a array of 'count' items, with byte size 'size'. This
2565+ * function returns a pointer to the element and the 'index'
2566+ * of the element if found.
2567+ *
2568+ * If not found the index of the insert point of the item
2569+ * is returned (0 <= index <= count).
2570+ *
2571+ * The comparison routine 'compar' may throw an exception.
2572+ * In this case the error details will be stored in 'thread'.
2573+ */
2574+void *xt_bsearch(XTThreadPtr thread, const void *key, register const void *base, size_t count, size_t size, size_t *idx, const void *thunk, XTCompareFunc compar)
2575+{
2576+ register size_t i;
2577+ register size_t guess;
2578+ register int r;
2579+
2580+ i = 0;
2581+ while (i < count) {
2582+ guess = (i + count - 1) >> 1;
2583+ r = (compar)(thread, thunk, key, ((char *) base) + guess * size);
2584+ if (r == 0) {
2585+ *idx = guess;
2586+ return ((char *) base) + guess * size;
2587+ }
2588+ if (r < 0)
2589+ count = guess;
2590+ else
2591+ i = guess + 1;
2592+ }
2593+
2594+ *idx = i;
2595+ return NULL;
2596+}
2597+
2598
2599=== added file 'plugin/pbxt/src/bsearch_xt.h'
2600--- plugin/pbxt/src/bsearch_xt.h 1970-01-01 00:00:00 +0000
2601+++ plugin/pbxt/src/bsearch_xt.h 2010-04-01 14:19:35 +0000
2602@@ -0,0 +1,32 @@
2603+/* Copyright (c) 2005 PrimeBase Technologies GmbH
2604+ *
2605+ * PrimeBase XT
2606+ *
2607+ * This program is free software; you can redistribute it and/or modify
2608+ * it under the terms of the GNU General Public License as published by
2609+ * the Free Software Foundation; either version 2 of the License, or
2610+ * (at your option) any later version.
2611+ *
2612+ * This program is distributed in the hope that it will be useful,
2613+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2614+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2615+ * GNU General Public License for more details.
2616+ *
2617+ * You should have received a copy of the GNU General Public License
2618+ * along with this program; if not, write to the Free Software
2619+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2620+ *
2621+ * 2004-01-03 Paul McCullagh
2622+ *
2623+ * H&G2JCtL
2624+ */
2625+#ifndef __xt_bsearch_h__
2626+#define __xt_bsearch_h__
2627+
2628+#include "xt_defs.h"
2629+
2630+struct XTThread;
2631+
2632+void *xt_bsearch(struct XTThread *self, const void *key, register const void *base, size_t count, size_t size, size_t *idx, const void *thunk, XTCompareFunc compar);
2633+
2634+#endif
2635
2636=== added file 'plugin/pbxt/src/cache_xt.cc'
2637--- plugin/pbxt/src/cache_xt.cc 1970-01-01 00:00:00 +0000
2638+++ plugin/pbxt/src/cache_xt.cc 2010-04-01 14:19:35 +0000
2639@@ -0,0 +1,1577 @@
2640+/* Copyright (c) 2005 PrimeBase Technologies GmbH, Germany
2641+ *
2642+ * PrimeBase XT
2643+ *
2644+ * This program is free software; you can redistribute it and/or modify
2645+ * it under the terms of the GNU General Public License as published by
2646+ * the Free Software Foundation; either version 2 of the License, or
2647+ * (at your option) any later version.
2648+ *
2649+ * This program is distributed in the hope that it will be useful,
2650+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2651+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2652+ * GNU General Public License for more details.
2653+ *
2654+ * You should have received a copy of the GNU General Public License
2655+ * along with this program; if not, write to the Free Software
2656+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2657+ *
2658+ * 2005-05-24 Paul McCullagh
2659+ *
2660+ * H&G2JCtL
2661+ */
2662+
2663+#include "xt_config.h"
2664+
2665+#ifdef DRIZZLED
2666+#include <bitset>
2667+#endif
2668+
2669+#ifndef XT_WIN
2670+#include <unistd.h>
2671+#endif
2672+
2673+#include <stdio.h>
2674+#include <time.h>
2675+
2676+#include "pthread_xt.h"
2677+#include "thread_xt.h"
2678+#include "filesys_xt.h"
2679+#include "cache_xt.h"
2680+#include "table_xt.h"
2681+#include "trace_xt.h"
2682+#include "util_xt.h"
2683+
2684+#define XT_TIME_DIFF(start, now) (\
2685+ ((xtWord4) (now) < (xtWord4) (start)) ? \
2686+ ((xtWord4) 0XFFFFFFFF - ((xtWord4) (start) - (xtWord4) (now))) : \
2687+ ((xtWord4) (now) - (xtWord4) (start)))
2688+
2689+/*
2690+ * -----------------------------------------------------------------------
2691+ * D I S K C A C H E
2692+ */
2693+
2694+#define IDX_CAC_SEGMENT_COUNT ((off_t) 1 << XT_INDEX_CACHE_SEGMENT_SHIFTS)
2695+#define IDX_CAC_SEGMENT_MASK (IDX_CAC_SEGMENT_COUNT - 1)
2696+
2697+#ifdef XT_NO_ATOMICS
2698+#define IDX_CAC_USE_PTHREAD_RW
2699+#else
2700+//#define IDX_CAC_USE_RWMUTEX
2701+//#define IDX_CAC_USE_PTHREAD_RW
2702+//#define IDX_USE_SPINXSLOCK
2703+#define IDX_CAC_USE_XSMUTEX
2704+#endif
2705+
2706+#ifdef IDX_CAC_USE_XSMUTEX
2707+#define IDX_CAC_LOCK_TYPE XTXSMutexRec
2708+#define IDX_CAC_INIT_LOCK(s, i) xt_xsmutex_init_with_autoname(s, &(i)->cs_lock)
2709+#define IDX_CAC_FREE_LOCK(s, i) xt_xsmutex_free(s, &(i)->cs_lock)
2710+#define IDX_CAC_READ_LOCK(i, o) xt_xsmutex_slock(&(i)->cs_lock, (o)->t_id)
2711+#define IDX_CAC_WRITE_LOCK(i, o) xt_xsmutex_xlock(&(i)->cs_lock, (o)->t_id)
2712+#define IDX_CAC_UNLOCK(i, o) xt_xsmutex_unlock(&(i)->cs_lock, (o)->t_id)
2713+#elif defined(IDX_CAC_USE_PTHREAD_RW)
2714+#define IDX_CAC_LOCK_TYPE xt_rwlock_type
2715+#define IDX_CAC_INIT_LOCK(s, i) xt_init_rwlock(s, &(i)->cs_lock)
2716+#define IDX_CAC_FREE_LOCK(s, i) xt_free_rwlock(&(i)->cs_lock)
2717+#define IDX_CAC_READ_LOCK(i, o) xt_slock_rwlock_ns(&(i)->cs_lock)
2718+#define IDX_CAC_WRITE_LOCK(i, o) xt_xlock_rwlock_ns(&(i)->cs_lock)
2719+#define IDX_CAC_UNLOCK(i, o) xt_unlock_rwlock_ns(&(i)->cs_lock)
2720+#elif defined(IDX_CAC_USE_RWMUTEX)
2721+#define IDX_CAC_LOCK_TYPE XTRWMutexRec
2722+#define IDX_CAC_INIT_LOCK(s, i) xt_rwmutex_init_with_autoname(s, &(i)->cs_lock)
2723+#define IDX_CAC_FREE_LOCK(s, i) xt_rwmutex_free(s, &(i)->cs_lock)
2724+#define IDX_CAC_READ_LOCK(i, o) xt_rwmutex_slock(&(i)->cs_lock, (o)->t_id)
2725+#define IDX_CAC_WRITE_LOCK(i, o) xt_rwmutex_xlock(&(i)->cs_lock, (o)->t_id)
2726+#define IDX_CAC_UNLOCK(i, o) xt_rwmutex_unlock(&(i)->cs_lock, (o)->t_id)
2727+#elif defined(IDX_CAC_USE_SPINXSLOCK)
2728+#define IDX_CAC_LOCK_TYPE XTSpinXSLockRec
2729+#define IDX_CAC_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, &(i)->cs_lock)
2730+#define IDX_CAC_FREE_LOCK(s, i) xt_spinxslock_free(s, &(i)->cs_lock)
2731+#define IDX_CAC_READ_LOCK(i, s) xt_spinxslock_slock(&(i)->cs_lock, (s)->t_id)
2732+#define IDX_CAC_WRITE_LOCK(i, s) xt_spinxslock_xlock(&(i)->cs_lock, (s)->t_id)
2733+#define IDX_CAC_UNLOCK(i, s) xt_spinxslock_unlock(&(i)->cs_lock, (s)->t_id)
2734+#endif
2735+
2736+#define ID_HANDLE_USE_SPINLOCK
2737+//#define ID_HANDLE_USE_PTHREAD_RW
2738+
2739+#if defined(ID_HANDLE_USE_PTHREAD_RW)
2740+#define ID_HANDLE_LOCK_TYPE xt_mutex_type
2741+#define ID_HANDLE_INIT_LOCK(s, i) xt_init_mutex_with_autoname(s, i)
2742+#define ID_HANDLE_FREE_LOCK(s, i) xt_free_mutex(i)
2743+#define ID_HANDLE_LOCK(i) xt_lock_mutex_ns(i)
2744+#define ID_HANDLE_UNLOCK(i) xt_unlock_mutex_ns(i)
2745+#elif defined(ID_HANDLE_USE_SPINLOCK)
2746+#define ID_HANDLE_LOCK_TYPE XTSpinLockRec
2747+#define ID_HANDLE_INIT_LOCK(s, i) xt_spinlock_init_with_autoname(s, i)
2748+#define ID_HANDLE_FREE_LOCK(s, i) xt_spinlock_free(s, i)
2749+#define ID_HANDLE_LOCK(i) xt_spinlock_lock(i)
2750+#define ID_HANDLE_UNLOCK(i) xt_spinlock_unlock(i)
2751+#endif
2752+
2753+#define XT_HANDLE_SLOTS 37
2754+
2755+/*
2756+#ifdef DEBUG
2757+#define XT_INIT_HANDLE_COUNT 0
2758+#define XT_INIT_HANDLE_BLOCKS 0
2759+#else
2760+#define XT_INIT_HANDLE_COUNT 40
2761+#define XT_INIT_HANDLE_BLOCKS 10
2762+#endif
2763+*/
2764+
2765+/* A disk cache segment. The cache is divided into a number of segments
2766+ * to improve concurrency.
2767+ */
2768+typedef struct DcSegment {
2769+ IDX_CAC_LOCK_TYPE cs_lock; /* The cache segment lock. */
2770+ XTIndBlockPtr *cs_hash_table;
2771+} DcSegmentRec, *DcSegmentPtr;
2772+
2773+typedef struct DcHandleSlot {
2774+ ID_HANDLE_LOCK_TYPE hs_handles_lock;
2775+ XTIndHandleBlockPtr hs_free_blocks;
2776+ XTIndHandlePtr hs_free_handles;
2777+ XTIndHandlePtr hs_used_handles;
2778+} DcHandleSlotRec, *DcHandleSlotPtr;
2779+
2780+typedef struct DcGlobals {
2781+ xt_mutex_type cg_lock; /* The public cache lock. */
2782+ DcSegmentRec cg_segment[IDX_CAC_SEGMENT_COUNT];
2783+ XTIndBlockPtr cg_blocks;
2784+#ifdef XT_USE_DIRECT_IO_ON_INDEX
2785+ xtWord1 *cg_buffer;
2786+#endif
2787+ XTIndBlockPtr cg_free_list;
2788+ xtWord4 cg_free_count;
2789+ xtWord4 cg_ru_now; /* A counter as described by Jim Starkey (my thanks) */
2790+ XTIndBlockPtr cg_lru_block;
2791+ XTIndBlockPtr cg_mru_block;
2792+ xtWord4 cg_hash_size;
2793+ xtWord4 cg_block_count;
2794+ xtWord4 cg_max_free;
2795+#ifdef DEBUG_CHECK_IND_CACHE
2796+ u_int cg_reserved_by_ots; /* Number of blocks reserved by open tables. */
2797+ u_int cg_read_count; /* Number of blocks being read. */
2798+#endif
2799+
2800+ /* Index cache handles: */
2801+ DcHandleSlotRec cg_handle_slot[XT_HANDLE_SLOTS];
2802+} DcGlobalsRec;
2803+
2804+static DcGlobalsRec ind_cac_globals;
2805+
2806+#ifdef XT_USE_MYSYS
2807+#ifdef xtPublic
2808+#undef xtPublic
2809+#endif
2810+#include "my_global.h"
2811+#include "my_sys.h"
2812+#include "keycache.h"
2813+KEY_CACHE my_cache;
2814+#undef pthread_rwlock_rdlock
2815+#undef pthread_rwlock_wrlock
2816+#undef pthread_rwlock_unlock
2817+#undef pthread_mutex_lock
2818+#undef pthread_mutex_unlock
2819+#undef pthread_cond_wait
2820+#undef pthread_cond_broadcast
2821+#undef xt_mutex_type
2822+#define xtPublic
2823+#endif
2824+
2825+/*
2826+ * -----------------------------------------------------------------------
2827+ * INDEX CACHE HANDLES
2828+ */
2829+
2830+static XTIndHandlePtr ind_alloc_handle()
2831+{
2832+ XTIndHandlePtr handle;
2833+
2834+ if (!(handle = (XTIndHandlePtr) xt_calloc_ns(sizeof(XTIndHandleRec))))
2835+ return NULL;
2836+ xt_spinlock_init_with_autoname(NULL, &handle->ih_lock);
2837+ return handle;
2838+}
2839+
2840+static void ind_free_handle(XTIndHandlePtr handle)
2841+{
2842+ xt_spinlock_free(NULL, &handle->ih_lock);
2843+ xt_free_ns(handle);
2844+}
2845+
2846+static void ind_handle_exit(XTThreadPtr self)
2847+{
2848+ DcHandleSlotPtr hs;
2849+ XTIndHandlePtr handle;
2850+ XTIndHandleBlockPtr hptr;
2851+
2852+ for (int i=0; i<XT_HANDLE_SLOTS; i++) {
2853+ hs = &ind_cac_globals.cg_handle_slot[i];
2854+
2855+ while (hs->hs_used_handles) {
2856+ handle = hs->hs_used_handles;
2857+ xt_ind_release_handle(handle, FALSE, self);
2858+ }
2859+
2860+ while (hs->hs_free_blocks) {
2861+ hptr = hs->hs_free_blocks;
2862+ hs->hs_free_blocks = hptr->hb_next;
2863+ xt_free(self, hptr);
2864+ }
2865+
2866+ while (hs->hs_free_handles) {
2867+ handle = hs->hs_free_handles;
2868+ hs->hs_free_handles = handle->ih_next;
2869+ ind_free_handle(handle);
2870+ }
2871+
2872+ ID_HANDLE_FREE_LOCK(self, &hs->hs_handles_lock);
2873+ }
2874+}
2875+
2876+static void ind_handle_init(XTThreadPtr self)
2877+{
2878+ DcHandleSlotPtr hs;
2879+
2880+ for (int i=0; i<XT_HANDLE_SLOTS; i++) {
2881+ hs = &ind_cac_globals.cg_handle_slot[i];
2882+ memset(hs, 0, sizeof(DcHandleSlotRec));
2883+ ID_HANDLE_INIT_LOCK(self, &hs->hs_handles_lock);
2884+ }
2885+}
2886+
2887+//#define CHECK_HANDLE_STRUCTS
2888+
2889+#ifdef CHECK_HANDLE_STRUCTS
2890+static int gdummy = 0;
2891+
2892+static void ic_stop_here()
2893+{
2894+ gdummy = gdummy + 1;
2895+ printf("Nooo %d!\n", gdummy);
2896+}
2897+
2898+static void ic_check_handle_structs()
2899+{
2900+ XTIndHandlePtr handle, phandle;
2901+ XTIndHandleBlockPtr hptr, phptr;
2902+ int count = 0;
2903+ int ctest;
2904+
2905+ phandle = NULL;
2906+ handle = ind_cac_globals.cg_used_handles;
2907+ while (handle) {
2908+ if (handle == phandle)
2909+ ic_stop_here();
2910+ if (handle->ih_prev != phandle)
2911+ ic_stop_here();
2912+ if (handle->ih_cache_reference) {
2913+ ctest = handle->x.ih_cache_block->cb_handle_count;
2914+ if (ctest == 0 || ctest > 100)
2915+ ic_stop_here();
2916+ }
2917+ else {
2918+ ctest = handle->x.ih_handle_block->hb_ref_count;
2919+ if (ctest == 0 || ctest > 100)
2920+ ic_stop_here();
2921+ }
2922+ phandle = handle;
2923+ handle = handle->ih_next;
2924+ count++;
2925+ if (count > 1000)
2926+ ic_stop_here();
2927+ }
2928+
2929+ count = 0;
2930+ hptr = ind_cac_globals.cg_free_blocks;
2931+ while (hptr) {
2932+ if (hptr == phptr)
2933+ ic_stop_here();
2934+ phptr = hptr;
2935+ hptr = hptr->hb_next;
2936+ count++;
2937+ if (count > 1000)
2938+ ic_stop_here();
2939+ }
2940+
2941+ count = 0;
2942+ handle = ind_cac_globals.cg_free_handles;
2943+ while (handle) {
2944+ if (handle == phandle)
2945+ ic_stop_here();
2946+ phandle = handle;
2947+ handle = handle->ih_next;
2948+ count++;
2949+ if (count > 1000)
2950+ ic_stop_here();
2951+ }
2952+}
2953+#endif
2954+
2955+/*
2956+ * Get a handle to the index block.
2957+ * This function is called by index scanners (readers).
2958+ */
2959+xtPublic XTIndHandlePtr xt_ind_get_handle(XTOpenTablePtr ot, XTIndexPtr ind, XTIndReferencePtr iref)
2960+{
2961+ DcHandleSlotPtr hs;
2962+ XTIndHandlePtr handle;
2963+
2964+ hs = &ind_cac_globals.cg_handle_slot[iref->ir_block->cb_address % XT_HANDLE_SLOTS];
2965+
2966+ ASSERT_NS(iref->ir_xlock == FALSE);
2967+ ASSERT_NS(iref->ir_updated == FALSE);
2968+ ID_HANDLE_LOCK(&hs->hs_handles_lock);
2969+#ifdef CHECK_HANDLE_STRUCTS
2970+ ic_check_handle_structs();
2971+#endif
2972+ if ((handle = hs->hs_free_handles))
2973+ hs->hs_free_handles = handle->ih_next;
2974+ else {
2975+ if (!(handle = ind_alloc_handle())) {
2976+ ID_HANDLE_UNLOCK(&hs->hs_handles_lock);
2977+ xt_ind_release(ot, ind, XT_UNLOCK_READ, iref);
2978+ return NULL;
2979+ }
2980+ }
2981+ if (hs->hs_used_handles)
2982+ hs->hs_used_handles->ih_prev = handle;
2983+ handle->ih_next = hs->hs_used_handles;
2984+ handle->ih_prev = NULL;
2985+ handle->ih_address = iref->ir_block->cb_address;
2986+ handle->ih_cache_reference = TRUE;
2987+ handle->x.ih_cache_block = iref->ir_block;
2988+ handle->ih_branch = iref->ir_branch;
2989+ /* {HANDLE-COUNT-USAGE}
2990+ * This is safe because:
2991+ *
2992+ * I have an Slock on the cache block, and I have
2993+ * at least an Slock on the index.
2994+ * So this excludes anyone who is reading
2995+ * cb_handle_count in the index.
2996+ * (all cache block writers, and the freeer).
2997+ *
2998+ * The increment is safe because I have the list
2999+ * lock (hs_handles_lock), which is required by anyone else
3000+ * who increments or decrements this value.
3001+ */
3002+ iref->ir_block->cb_handle_count++;
3003+ hs->hs_used_handles = handle;
3004+#ifdef CHECK_HANDLE_STRUCTS
3005+ ic_check_handle_structs();
3006+#endif
3007+ ID_HANDLE_UNLOCK(&hs->hs_handles_lock);
3008+ xt_ind_release(ot, ind, XT_UNLOCK_READ, iref);
3009+ return handle;
3010+}
3011+
3012+xtPublic void xt_ind_release_handle(XTIndHandlePtr handle, xtBool have_lock, XTThreadPtr thread)
3013+{
3014+ DcHandleSlotPtr hs;
3015+ XTIndBlockPtr block = NULL;
3016+ u_int hash_idx = 0;
3017+ DcSegmentPtr seg = NULL;
3018+ XTIndBlockPtr xblock;
3019+
3020+ /* The lock order is:
3021+ * 1. Cache segment (cs_lock) - This is only by ind_free_block()!
3022+ * 1. S/Slock cache block (cb_lock)
3023+ * 2. List lock (cg_handles_lock).
3024+ * 3. Handle lock (ih_lock)
3025+ */
3026+ if (!have_lock)
3027+ xt_spinlock_lock(&handle->ih_lock);
3028+
3029+ /* Get the lock on the cache page if required: */
3030+ if (handle->ih_cache_reference) {
3031+ u_int file_id;
3032+ xtIndexNodeID address;
3033+
3034+ block = handle->x.ih_cache_block;
3035+
3036+ file_id = block->cb_file_id;
3037+ address = block->cb_address;
3038+ hash_idx = XT_NODE_ID(address) + (file_id * 223);
3039+ seg = &ind_cac_globals.cg_segment[hash_idx & IDX_CAC_SEGMENT_MASK];
3040+ hash_idx = (hash_idx >> XT_INDEX_CACHE_SEGMENT_SHIFTS) % ind_cac_globals.cg_hash_size;
3041+ }
3042+
3043+ xt_spinlock_unlock(&handle->ih_lock);
3044+
3045+ /* Because of the lock order, I have to release the
3046+ * handle before I get a lock on the cache block.
3047+ *
3048+ * But, by doing this, thie cache block may be gone!
3049+ */
3050+ if (block) {
3051+ IDX_CAC_READ_LOCK(seg, thread);
3052+ xblock = seg->cs_hash_table[hash_idx];
3053+ while (xblock) {
3054+ if (block == xblock) {
3055+ /* Found the block...
3056+ * {HANDLE-COUNT-SLOCK}
3057+ * 04.05.2009, changed to slock.
3058+ */
3059+ XT_IPAGE_READ_LOCK(&block->cb_lock);
3060+ goto block_found;
3061+ }
3062+ xblock = xblock->cb_next;
3063+ }
3064+ block = NULL;
3065+ block_found:
3066+ IDX_CAC_UNLOCK(seg, thread);
3067+ }
3068+
3069+ hs = &ind_cac_globals.cg_handle_slot[handle->ih_address % XT_HANDLE_SLOTS];
3070+
3071+ ID_HANDLE_LOCK(&hs->hs_handles_lock);
3072+#ifdef CHECK_HANDLE_STRUCTS
3073+ ic_check_handle_structs();
3074+#endif
3075+
3076+ /* I don't need to lock the handle because I have locked
3077+ * the list, and no other thread can change the
3078+ * handle without first getting a lock on the list.
3079+ *
3080+ * In addition, the caller is the only owner of the
3081+ * handle, and the only thread with an independent
3082+ * reference to the handle.
3083+ * All other access occur over the list.
3084+ */
3085+
3086+ /* Remove the reference to the cache or a handle block: */
3087+ if (handle->ih_cache_reference) {
3088+ ASSERT_NS(block == handle->x.ih_cache_block);
3089+ ASSERT_NS(block && block->cb_handle_count > 0);
3090+ /* {HANDLE-COUNT-USAGE}
3091+ * This is safe here because I have excluded
3092+ * all readers by taking an Xlock on the
3093+ * cache block (CHANGED - see below).
3094+ *
3095+ * {HANDLE-COUNT-SLOCK}
3096+ * 04.05.2009, changed to slock.
3097+ * Should be OK, because:
3098+ * A have a lock on the list lock (hs_handles_lock),
3099+ * which prevents concurrent updates to cb_handle_count.
3100+ *
3101+ * I have also have a read lock on the cache block
3102+ * but not a lock on the index. As a result, we cannot
3103+ * excluded all index writers (and readers of
3104+ * cb_handle_count.
3105+ */
3106+ block->cb_handle_count--;
3107+ }
3108+ else {
3109+ XTIndHandleBlockPtr hptr = handle->x.ih_handle_block;
3110+
3111+ ASSERT_NS(!handle->ih_cache_reference);
3112+ ASSERT_NS(hptr->hb_ref_count > 0);
3113+ hptr->hb_ref_count--;
3114+ if (!hptr->hb_ref_count) {
3115+ /* Put it back on the free list: */
3116+ hptr->hb_next = hs->hs_free_blocks;
3117+ hs->hs_free_blocks = hptr;
3118+ }
3119+ }
3120+
3121+ /* Unlink the handle: */
3122+ if (handle->ih_next)
3123+ handle->ih_next->ih_prev = handle->ih_prev;
3124+ if (handle->ih_prev)
3125+ handle->ih_prev->ih_next = handle->ih_next;
3126+ if (hs->hs_used_handles == handle)
3127+ hs->hs_used_handles = handle->ih_next;
3128+
3129+ /* Put it on the free list: */
3130+ handle->ih_next = hs->hs_free_handles;
3131+ hs->hs_free_handles = handle;
3132+
3133+#ifdef CHECK_HANDLE_STRUCTS
3134+ ic_check_handle_structs();
3135+#endif
3136+ ID_HANDLE_UNLOCK(&hs->hs_handles_lock);
3137+
3138+ if (block)
3139+ XT_IPAGE_UNLOCK(&block->cb_lock, FALSE);
3140+}
3141+
3142+/* Call this function before a referenced cache block is modified!
3143+ * This function is called by index updaters.
3144+ */
3145+xtPublic xtBool xt_ind_copy_on_write(XTIndReferencePtr iref)
3146+{
3147+ DcHandleSlotPtr hs;
3148+ XTIndHandleBlockPtr hptr;
3149+ u_int branch_size;
3150+ XTIndHandlePtr handle;
3151+ u_int i = 0;
3152+
3153+ hs = &ind_cac_globals.cg_handle_slot[iref->ir_block->cb_address % XT_HANDLE_SLOTS];
3154+
3155+ ID_HANDLE_LOCK(&hs->hs_handles_lock);
3156+
3157+ /* {HANDLE-COUNT-USAGE}
3158+ * This is only called by updaters of this index block, or
3159+ * the free which holds an Xlock on the index block.
3160+ * These are all mutually exclusive for the index block.
3161+ *
3162+ * {HANDLE-COUNT-SLOCK}
3163+ * Do this check again, after we have the list lock (hs_handles_lock).
3164+ * There is a small chance that the count has changed, since we last
3165+ * checked because xt_ind_release_handle() only holds
3166+ * an slock on the index page.
3167+ *
3168+ * An updater can sometimes have a XLOCK on the index and an slock
3169+ * on the cache block. In this case xt_ind_release_handle()
3170+ * could have run through.
3171+ */
3172+ if (!iref->ir_block->cb_handle_count) {
3173+ ID_HANDLE_UNLOCK(&hs->hs_handles_lock);
3174+ return OK;
3175+ }
3176+
3177+#ifdef CHECK_HANDLE_STRUCTS
3178+ ic_check_handle_structs();
3179+#endif
3180+ if ((hptr = hs->hs_free_blocks))
3181+ hs->hs_free_blocks = hptr->hb_next;
3182+ else {
3183+ if (!(hptr = (XTIndHandleBlockPtr) xt_malloc_ns(sizeof(XTIndHandleBlockRec)))) {
3184+ ID_HANDLE_UNLOCK(&hs->hs_handles_lock);
3185+ return FAILED;
3186+ }
3187+ }
3188+
3189+ branch_size = XT_GET_INDEX_BLOCK_LEN(XT_GET_DISK_2(iref->ir_branch->tb_size_2));
3190+ memcpy(&hptr->hb_branch, iref->ir_branch, branch_size);
3191+ hptr->hb_ref_count = iref->ir_block->cb_handle_count;
3192+
3193+ handle = hs->hs_used_handles;
3194+ while (handle) {
3195+ if (handle->ih_branch == iref->ir_branch) {
3196+ i++;
3197+ xt_spinlock_lock(&handle->ih_lock);
3198+ ASSERT_NS(handle->ih_cache_reference);
3199+ handle->ih_cache_reference = FALSE;
3200+ handle->x.ih_handle_block = hptr;
3201+ handle->ih_branch = &hptr->hb_branch;
3202+ xt_spinlock_unlock(&handle->ih_lock);
3203+#ifndef DEBUG
3204+ if (i == hptr->hb_ref_count)
3205+ break;
3206+#endif
3207+ }
3208+ handle = handle->ih_next;
3209+ }
3210+#ifdef DEBUG
3211+ ASSERT_NS(hptr->hb_ref_count == i);
3212+#endif
3213+ /* {HANDLE-COUNT-USAGE}
3214+ * It is safe to modify cb_handle_count when I have the
3215+ * list lock, and I have excluded all readers!
3216+ */
3217+ iref->ir_block->cb_handle_count = 0;
3218+#ifdef CHECK_HANDLE_STRUCTS
3219+ ic_check_handle_structs();
3220+#endif
3221+ ID_HANDLE_UNLOCK(&hs->hs_handles_lock);
3222+
3223+ return OK;
3224+}
3225+
3226+xtPublic void xt_ind_lock_handle(XTIndHandlePtr handle)
3227+{
3228+ xt_spinlock_lock(&handle->ih_lock);
3229+}
3230+
3231+xtPublic void xt_ind_unlock_handle(XTIndHandlePtr handle)
3232+{
3233+ xt_spinlock_unlock(&handle->ih_lock);
3234+}
3235+
3236+/*
3237+ * -----------------------------------------------------------------------
3238+ * INIT/EXIT
3239+ */
3240+
3241+/*
3242+ * Initialize the disk cache.
3243+ */
3244+xtPublic void xt_ind_init(XTThreadPtr self, size_t cache_size)
3245+{
3246+ XTIndBlockPtr block;
3247+
3248+#ifdef XT_USE_MYSYS
3249+ init_key_cache(&my_cache, 1024, cache_size, 100, 300);
3250+#endif
3251+ /* Memory is devoted to the page data alone, I no longer count the size of the directory,
3252+ * or the page overhead: */
3253+ ind_cac_globals.cg_block_count = cache_size / XT_INDEX_PAGE_SIZE;
3254+ ind_cac_globals.cg_hash_size = ind_cac_globals.cg_block_count / (IDX_CAC_SEGMENT_COUNT >> 1);
3255+ ind_cac_globals.cg_max_free = ind_cac_globals.cg_block_count / 10;
3256+ if (ind_cac_globals.cg_max_free < 8)
3257+ ind_cac_globals.cg_max_free = 8;
3258+ if (ind_cac_globals.cg_max_free > 128)
3259+ ind_cac_globals.cg_max_free = 128;
3260+
3261+ try_(a) {
3262+ for (u_int i=0; i<IDX_CAC_SEGMENT_COUNT; i++) {
3263+ ind_cac_globals.cg_segment[i].cs_hash_table = (XTIndBlockPtr *) xt_calloc(self, ind_cac_globals.cg_hash_size * sizeof(XTIndBlockPtr));
3264+ IDX_CAC_INIT_LOCK(self, &ind_cac_globals.cg_segment[i]);
3265+ }
3266+
3267+ block = (XTIndBlockPtr) xt_malloc(self, ind_cac_globals.cg_block_count * sizeof(XTIndBlockRec));
3268+ ind_cac_globals.cg_blocks = block;
3269+ xt_init_mutex_with_autoname(self, &ind_cac_globals.cg_lock);
3270+#ifdef XT_USE_DIRECT_IO_ON_INDEX
3271+ xtWord1 *buffer;
3272+#ifdef XT_WIN
3273+ size_t psize = 512;
3274+#else
3275+ size_t psize = getpagesize();
3276+#endif
3277+ size_t diff;
3278+
3279+ buffer = (xtWord1 *) xt_malloc(self, (ind_cac_globals.cg_block_count * XT_INDEX_PAGE_SIZE));
3280+ diff = (size_t) buffer % psize;
3281+ if (diff != 0) {
3282+ xt_free(self, buffer);
3283+ buffer = (xtWord1 *) xt_malloc(self, (ind_cac_globals.cg_block_count * XT_INDEX_PAGE_SIZE) + psize);
3284+ diff = (size_t) buffer % psize;
3285+ if (diff != 0)
3286+ diff = psize - diff;
3287+ }
3288+ ind_cac_globals.cg_buffer = buffer;
3289+ buffer += diff;
3290+#endif
3291+
3292+ for (u_int i=0; i<ind_cac_globals.cg_block_count; i++) {
3293+ XT_IPAGE_INIT_LOCK(self, &block->cb_lock);
3294+ block->cb_state = IDX_CAC_BLOCK_FREE;
3295+ block->cb_next = ind_cac_globals.cg_free_list;
3296+#ifdef XT_USE_DIRECT_IO_ON_INDEX
3297+ block->cb_data = buffer;
3298+ buffer += XT_INDEX_PAGE_SIZE;
3299+#endif
3300+ ind_cac_globals.cg_free_list = block;
3301+ block++;
3302+ }
3303+ ind_cac_globals.cg_free_count = ind_cac_globals.cg_block_count;
3304+#ifdef DEBUG_CHECK_IND_CACHE
3305+ ind_cac_globals.cg_reserved_by_ots = 0;
3306+#endif
3307+ ind_handle_init(self);
3308+ }
3309+ catch_(a) {
3310+ xt_ind_exit(self);
3311+ throw_();
3312+ }
3313+ cont_(a);
3314+}
3315+
3316+xtPublic void xt_ind_exit(XTThreadPtr self)
3317+{
3318+#ifdef XT_USE_MYSYS
3319+ end_key_cache(&my_cache, 1);
3320+#endif
3321+ for (u_int i=0; i<IDX_CAC_SEGMENT_COUNT; i++) {
3322+ if (ind_cac_globals.cg_segment[i].cs_hash_table) {
3323+ xt_free(self, ind_cac_globals.cg_segment[i].cs_hash_table);
3324+ ind_cac_globals.cg_segment[i].cs_hash_table = NULL;
3325+ IDX_CAC_FREE_LOCK(self, &ind_cac_globals.cg_segment[i]);
3326+ }
3327+ }
3328+
3329+ if (ind_cac_globals.cg_blocks) {
3330+ xt_free(self, ind_cac_globals.cg_blocks);
3331+ ind_cac_globals.cg_blocks = NULL;
3332+ xt_free_mutex(&ind_cac_globals.cg_lock);
3333+ }
3334+#ifdef XT_USE_DIRECT_IO_ON_INDEX
3335+ if (ind_cac_globals.cg_buffer) {
3336+ xt_free(self, ind_cac_globals.cg_buffer);
3337+ ind_cac_globals.cg_buffer = NULL;
3338+ }
3339+#endif
3340+ ind_handle_exit(self);
3341+
3342+ memset(&ind_cac_globals, 0, sizeof(ind_cac_globals));
3343+}
3344+
3345+xtPublic xtInt8 xt_ind_get_usage()
3346+{
3347+ xtInt8 size = 0;
3348+
3349+ size = (xtInt8) (ind_cac_globals.cg_block_count - ind_cac_globals.cg_free_count) * (xtInt8) XT_INDEX_PAGE_SIZE;
3350+ return size;
3351+}
3352+
3353+xtPublic xtInt8 xt_ind_get_size()
3354+{
3355+ xtInt8 size = 0;
3356+
3357+ size = (xtInt8) ind_cac_globals.cg_block_count * (xtInt8) XT_INDEX_PAGE_SIZE;
3358+ return size;
3359+}
3360+
3361+/*
3362+ * -----------------------------------------------------------------------
3363+ * INDEX CHECKING
3364+ */
3365+
3366+xtPublic void xt_ind_check_cache(XTIndexPtr ind)
3367+{
3368+ XTIndBlockPtr block;
3369+ u_int free_count, inuse_count, clean_count;
3370+ xtBool check_count = FALSE;
3371+
3372+ if (ind == (XTIndex *) 1) {
3373+ ind = NULL;
3374+ check_count = TRUE;
3375+ }
3376+
3377+ // Check the dirty list:
3378+ if (ind) {
3379+ u_int cnt = 0;
3380+
3381+ block = ind->mi_dirty_list;
3382+ while (block) {
3383+ cnt++;
3384+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_DIRTY);
3385+ block = block->cb_dirty_next;
3386+ }
3387+ ASSERT_NS(ind->mi_dirty_blocks == cnt);
3388+ }
3389+
3390+ xt_lock_mutex_ns(&ind_cac_globals.cg_lock);
3391+
3392+ // Check the free list:
3393+ free_count = 0;
3394+ block = ind_cac_globals.cg_free_list;
3395+ while (block) {
3396+ free_count++;
3397+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_FREE);
3398+ block = block->cb_next;
3399+ }
3400+ ASSERT_NS(ind_cac_globals.cg_free_count == free_count);
3401+
3402+ /* Check the LRU list: */
3403+ XTIndBlockPtr list_block, plist_block;
3404+
3405+ plist_block = NULL;
3406+ list_block = ind_cac_globals.cg_lru_block;
3407+ if (list_block) {
3408+ ASSERT_NS(ind_cac_globals.cg_mru_block != NULL);
3409+ ASSERT_NS(ind_cac_globals.cg_mru_block->cb_mr_used == NULL);
3410+ ASSERT_NS(list_block->cb_lr_used == NULL);
3411+ inuse_count = 0;
3412+ clean_count = 0;
3413+ while (list_block) {
3414+ inuse_count++;
3415+ ASSERT_NS(list_block->cb_state == IDX_CAC_BLOCK_DIRTY || list_block->cb_state == IDX_CAC_BLOCK_CLEAN);
3416+ if (list_block->cb_state == IDX_CAC_BLOCK_CLEAN)
3417+ clean_count++;
3418+ ASSERT_NS(block != list_block);
3419+ ASSERT_NS(list_block->cb_lr_used == plist_block);
3420+ plist_block = list_block;
3421+ list_block = list_block->cb_mr_used;
3422+ }
3423+ ASSERT_NS(ind_cac_globals.cg_mru_block == plist_block);
3424+ }
3425+ else {
3426+ inuse_count = 0;
3427+ clean_count = 0;
3428+ ASSERT_NS(ind_cac_globals.cg_mru_block == NULL);
3429+ }
3430+
3431+#ifdef DEBUG_CHECK_IND_CACHE
3432+ ASSERT_NS(free_count + inuse_count + ind_cac_globals.cg_reserved_by_ots + ind_cac_globals.cg_read_count == ind_cac_globals.cg_block_count);
3433+#endif
3434+ xt_unlock_mutex_ns(&ind_cac_globals.cg_lock);
3435+ if (check_count) {
3436+ /* We have just flushed, check how much is now free/clean. */
3437+ if (free_count + clean_count < 10) {
3438+ /* This could be a problem: */
3439+ printf("Cache very low!\n");
3440+ }
3441+ }
3442+}
3443+
3444+#ifdef XXXXDEBUG
3445+static void ind_cac_check_on_dirty_list(DcSegmentPtr seg, XTIndBlockPtr block)
3446+{
3447+ XTIndBlockPtr list_block, plist_block;
3448+ xtBool found = FALSE;
3449+
3450+ plist_block = NULL;
3451+ list_block = seg->cs_dirty_list[block->cb_file_id % XT_INDEX_CACHE_FILE_SLOTS];
3452+ while (list_block) {
3453+ ASSERT_NS(list_block->cb_state == IDX_CAC_BLOCK_DIRTY);
3454+ ASSERT_NS(list_block->cb_dirty_prev == plist_block);
3455+ if (list_block == block)
3456+ found = TRUE;
3457+ plist_block = list_block;
3458+ list_block = list_block->cb_dirty_next;
3459+ }
3460+ ASSERT_NS(found);
3461+}
3462+
3463+static void ind_cac_check_dirty_list(DcSegmentPtr seg, XTIndBlockPtr block)
3464+{
3465+ XTIndBlockPtr list_block, plist_block;
3466+
3467+ for (u_int j=0; j<XT_INDEX_CACHE_FILE_SLOTS; j++) {
3468+ plist_block = NULL;
3469+ list_block = seg->cs_dirty_list[j];
3470+ while (list_block) {
3471+ ASSERT_NS(list_block->cb_state == IDX_CAC_BLOCK_DIRTY);
3472+ ASSERT_NS(block != list_block);
3473+ ASSERT_NS(list_block->cb_dirty_prev == plist_block);
3474+ plist_block = list_block;
3475+ list_block = list_block->cb_dirty_next;
3476+ }
3477+ }
3478+}
3479+
3480+#endif
3481+
3482+/*
3483+ * -----------------------------------------------------------------------
3484+ * FREEING INDEX CACHE
3485+ */
3486+
3487+/*
3488+ * This function return TRUE if the block is freed.
3489+ * This function returns FALSE if the block cannot be found, or the
3490+ * block is not clean.
3491+ *
3492+ * We also return FALSE if we cannot copy the block to the handle
3493+ * (if this is required). This will be due to out-of-memory!
3494+ */
3495+static xtBool ind_free_block(XTOpenTablePtr ot, XTIndBlockPtr block)
3496+{
3497+ XTIndBlockPtr xblock, pxblock;
3498+ u_int hash_idx;
3499+ u_int file_id;
3500+ xtIndexNodeID address;
3501+ DcSegmentPtr seg;
3502+
3503+#ifdef DEBUG_CHECK_IND_CACHE
3504+ xt_ind_check_cache(NULL);
3505+#endif
3506+ file_id = block->cb_file_id;
3507+ address = block->cb_address;
3508+
3509+ hash_idx = XT_NODE_ID(address) + (file_id * 223);
3510+ seg = &ind_cac_globals.cg_segment[hash_idx & IDX_CAC_SEGMENT_MASK];
3511+ hash_idx = (hash_idx >> XT_INDEX_CACHE_SEGMENT_SHIFTS) % ind_cac_globals.cg_hash_size;
3512+
3513+ IDX_CAC_WRITE_LOCK(seg, ot->ot_thread);
3514+
3515+ pxblock = NULL;
3516+ xblock = seg->cs_hash_table[hash_idx];
3517+ while (xblock) {
3518+ if (block == xblock) {
3519+ /* Found the block... */
3520+ XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id);
3521+ if (block->cb_state != IDX_CAC_BLOCK_CLEAN) {
3522+ /* This block cannot be freeed: */
3523+ XT_IPAGE_UNLOCK(&block->cb_lock, TRUE);
3524+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3525+#ifdef DEBUG_CHECK_IND_CACHE
3526+ xt_ind_check_cache(NULL);
3527+#endif
3528+ return FALSE;
3529+ }
3530+
3531+ goto free_the_block;
3532+ }
3533+ pxblock = xblock;
3534+ xblock = xblock->cb_next;
3535+ }
3536+
3537+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3538+
3539+ /* Not found (this can happen, if block was freed by another thread) */
3540+#ifdef DEBUG_CHECK_IND_CACHE
3541+ xt_ind_check_cache(NULL);
3542+#endif
3543+ return FALSE;
3544+
3545+ free_the_block:
3546+
3547+ /* If the block is reference by a handle, then we
3548+ * have to copy the data to the handle before we
3549+ * free the page:
3550+ */
3551+ /* {HANDLE-COUNT-USAGE}
3552+ * This access is safe because:
3553+ *
3554+ * We have an Xlock on the cache block, which excludes
3555+ * all other writers that want to change the cache block
3556+ * and also all readers of the cache block, because
3557+ * they all have at least an Slock on the cache block.
3558+ */
3559+ if (block->cb_handle_count) {
3560+ XTIndReferenceRec iref;
3561+
3562+ iref.ir_xlock = TRUE;
3563+ iref.ir_updated = FALSE;
3564+ iref.ir_block = block;
3565+ iref.ir_branch = (XTIdxBranchDPtr) block->cb_data;
3566+ if (!xt_ind_copy_on_write(&iref)) {
3567+ XT_IPAGE_UNLOCK(&block->cb_lock, TRUE);
3568+ return FALSE;
3569+ }
3570+ }
3571+
3572+ /* Block is clean, remove from the hash table: */
3573+ if (pxblock)
3574+ pxblock->cb_next = block->cb_next;
3575+ else
3576+ seg->cs_hash_table[hash_idx] = block->cb_next;
3577+
3578+ xt_lock_mutex_ns(&ind_cac_globals.cg_lock);
3579+
3580+ /* Remove from the MRU list: */
3581+ if (ind_cac_globals.cg_lru_block == block)
3582+ ind_cac_globals.cg_lru_block = block->cb_mr_used;
3583+ if (ind_cac_globals.cg_mru_block == block)
3584+ ind_cac_globals.cg_mru_block = block->cb_lr_used;
3585+
3586+ /* Note, I am updating blocks for which I have no lock
3587+ * here. But I think this is OK because I have a lock
3588+ * for the MRU list.
3589+ */
3590+ if (block->cb_lr_used)
3591+ block->cb_lr_used->cb_mr_used = block->cb_mr_used;
3592+ if (block->cb_mr_used)
3593+ block->cb_mr_used->cb_lr_used = block->cb_lr_used;
3594+
3595+ /* The block is now free: */
3596+ block->cb_next = ind_cac_globals.cg_free_list;
3597+ ind_cac_globals.cg_free_list = block;
3598+ ind_cac_globals.cg_free_count++;
3599+ block->cb_state = IDX_CAC_BLOCK_FREE;
3600+ IDX_TRACE("%d- f%x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(block->cb_data));
3601+
3602+ /* Unlock BEFORE the block is reused! */
3603+ XT_IPAGE_UNLOCK(&block->cb_lock, TRUE);
3604+
3605+ xt_unlock_mutex_ns(&ind_cac_globals.cg_lock);
3606+
3607+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3608+
3609+#ifdef DEBUG_CHECK_IND_CACHE
3610+ xt_ind_check_cache(NULL);
3611+#endif
3612+ return TRUE;
3613+}
3614+
3615+#define IND_CACHE_MAX_BLOCKS_TO_FREE 100
3616+
3617+/*
3618+ * Return the number of blocks freed.
3619+ *
3620+ * The idea is to grab a list of blocks to free.
3621+ * The list consists of the LRU blocks that are
3622+ * clean.
3623+ *
3624+ * Free as many as possible (up to max of blocks_required)
3625+ * from the list, even if LRU position has changed
3626+ * (or we have a race if there are too few blocks).
3627+ * However, if the block cannot be found, or is dirty
3628+ * we must skip it.
3629+ *
3630+ * Repeat until we find no blocks for the list, or
3631+ * we have freed 'blocks_required'.
3632+ *
3633+ * 'not_this' is a block that must not be freed because
3634+ * it is locked by the calling thread!
3635+ */
3636+static u_int ind_cac_free_lru_blocks(XTOpenTablePtr ot, u_int blocks_required, XTIdxBranchDPtr not_this)
3637+{
3638+ register DcGlobalsRec *dcg = &ind_cac_globals;
3639+ XTIndBlockPtr to_free[IND_CACHE_MAX_BLOCKS_TO_FREE];
3640+ int count;
3641+ XTIndBlockPtr block;
3642+ u_int blocks_freed = 0;
3643+ XTIndBlockPtr locked_block;
3644+
3645+#ifdef XT_USE_DIRECT_IO_ON_INDEX
3646+#error This will not work!
3647+#endif
3648+ locked_block = (XTIndBlockPtr) ((xtWord1 *) not_this - offsetof(XTIndBlockRec, cb_data));
3649+
3650+ retry:
3651+ xt_lock_mutex_ns(&ind_cac_globals.cg_lock);
3652+ block = dcg->cg_lru_block;
3653+ count = 0;
3654+ while (block && count < IND_CACHE_MAX_BLOCKS_TO_FREE) {
3655+ if (block != locked_block && block->cb_state == IDX_CAC_BLOCK_CLEAN) {
3656+ to_free[count] = block;
3657+ count++;
3658+ }
3659+ block = block->cb_mr_used;
3660+ }
3661+ xt_unlock_mutex_ns(&ind_cac_globals.cg_lock);
3662+
3663+ if (!count)
3664+ return blocks_freed;
3665+
3666+ for (int i=0; i<count; i++) {
3667+ if (ind_free_block(ot, to_free[i]))
3668+ blocks_freed++;
3669+ if (blocks_freed >= blocks_required &&
3670+ ind_cac_globals.cg_free_count >= ind_cac_globals.cg_max_free + blocks_required)
3671+ return blocks_freed;
3672+ }
3673+
3674+ goto retry;
3675+}
3676+
3677+/*
3678+ * -----------------------------------------------------------------------
3679+ * MAIN CACHE FUNCTIONS
3680+ */
3681+
3682+/*
3683+ * Fetch the block. Note, if we are about to write the block
3684+ * then there is no need to read it from disk!
3685+ */
3686+static XTIndBlockPtr ind_cac_fetch(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, DcSegmentPtr *ret_seg, xtBool read_data)
3687+{
3688+ register XTOpenFilePtr file = ot->ot_ind_file;
3689+ register XTIndBlockPtr block, new_block;
3690+ register DcSegmentPtr seg;
3691+ register u_int hash_idx;
3692+ register DcGlobalsRec *dcg = &ind_cac_globals;
3693+ size_t red_size;
3694+
3695+#ifdef DEBUG_CHECK_IND_CACHE
3696+ xt_ind_check_cache(NULL);
3697+#endif
3698+ /* Address, plus file ID multiplied by my favorite prime number! */
3699+ hash_idx = XT_NODE_ID(address) + (file->fr_id * 223);
3700+ seg = &dcg->cg_segment[hash_idx & IDX_CAC_SEGMENT_MASK];
3701+ hash_idx = (hash_idx >> XT_INDEX_CACHE_SEGMENT_SHIFTS) % dcg->cg_hash_size;
3702+
3703+ IDX_CAC_READ_LOCK(seg, ot->ot_thread);
3704+ block = seg->cs_hash_table[hash_idx];
3705+ while (block) {
3706+ if (XT_NODE_ID(block->cb_address) == XT_NODE_ID(address) && block->cb_file_id == file->fr_id) {
3707+ ASSERT_NS(block->cb_state != IDX_CAC_BLOCK_FREE);
3708+
3709+ /* Check how recently this page has been used: */
3710+ if (XT_TIME_DIFF(block->cb_ru_time, dcg->cg_ru_now) > (dcg->cg_block_count >> 1)) {
3711+ xt_lock_mutex_ns(&dcg->cg_lock);
3712+
3713+ /* Move to the front of the MRU list: */
3714+ block->cb_ru_time = ++dcg->cg_ru_now;
3715+ if (dcg->cg_mru_block != block) {
3716+ /* Remove from the MRU list: */
3717+ if (dcg->cg_lru_block == block)
3718+ dcg->cg_lru_block = block->cb_mr_used;
3719+ if (block->cb_lr_used)
3720+ block->cb_lr_used->cb_mr_used = block->cb_mr_used;
3721+ if (block->cb_mr_used)
3722+ block->cb_mr_used->cb_lr_used = block->cb_lr_used;
3723+
3724+ /* Make the block the most recently used: */
3725+ if ((block->cb_lr_used = dcg->cg_mru_block))
3726+ dcg->cg_mru_block->cb_mr_used = block;
3727+ block->cb_mr_used = NULL;
3728+ dcg->cg_mru_block = block;
3729+ if (!dcg->cg_lru_block)
3730+ dcg->cg_lru_block = block;
3731+ }
3732+
3733+ xt_unlock_mutex_ns(&dcg->cg_lock);
3734+ }
3735+
3736+ *ret_seg = seg;
3737+#ifdef DEBUG_CHECK_IND_CACHE
3738+ xt_ind_check_cache(NULL);
3739+#endif
3740+ ot->ot_thread->st_statistics.st_ind_cache_hit++;
3741+ return block;
3742+ }
3743+ block = block->cb_next;
3744+ }
3745+
3746+ /* Block not found... */
3747+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3748+
3749+ /* Check the open table reserve list first: */
3750+ if ((new_block = ot->ot_ind_res_bufs)) {
3751+ ot->ot_ind_res_bufs = new_block->cb_next;
3752+ ot->ot_ind_res_count--;
3753+#ifdef DEBUG_CHECK_IND_CACHE
3754+ xt_lock_mutex_ns(&dcg->cg_lock);
3755+ dcg->cg_reserved_by_ots--;
3756+ dcg->cg_read_count++;
3757+ xt_unlock_mutex_ns(&dcg->cg_lock);
3758+#endif
3759+ goto use_free_block;
3760+ }
3761+
3762+ free_some_blocks:
3763+ if (!dcg->cg_free_list) {
3764+ if (!ind_cac_free_lru_blocks(ot, 1, NULL)) {
3765+ if (!dcg->cg_free_list) {
3766+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_NO_INDEX_CACHE);
3767+#ifdef DEBUG_CHECK_IND_CACHE
3768+ xt_ind_check_cache(NULL);
3769+#endif
3770+ return NULL;
3771+ }
3772+ }
3773+ }
3774+
3775+ /* Get a free block: */
3776+ xt_lock_mutex_ns(&dcg->cg_lock);
3777+ if (!(new_block = dcg->cg_free_list)) {
3778+ xt_unlock_mutex_ns(&dcg->cg_lock);
3779+ goto free_some_blocks;
3780+ }
3781+ ASSERT_NS(new_block->cb_state == IDX_CAC_BLOCK_FREE);
3782+ dcg->cg_free_list = new_block->cb_next;
3783+ dcg->cg_free_count--;
3784+#ifdef DEBUG_CHECK_IND_CACHE
3785+ dcg->cg_read_count++;
3786+#endif
3787+ xt_unlock_mutex_ns(&dcg->cg_lock);
3788+
3789+ use_free_block:
3790+ new_block->cb_address = address;
3791+ new_block->cb_file_id = file->fr_id;
3792+ new_block->cb_state = IDX_CAC_BLOCK_CLEAN;
3793+ new_block->cb_handle_count = 0;
3794+ new_block->cp_flush_seq = 0;
3795+ new_block->cp_del_count = 0;
3796+ new_block->cb_dirty_next = NULL;
3797+ new_block->cb_dirty_prev = NULL;
3798+
3799+ if (read_data) {
3800+ if (!xt_pread_file(file, xt_ind_node_to_offset(ot->ot_table, address), XT_INDEX_PAGE_SIZE, 0, new_block->cb_data, &red_size, &ot->ot_thread->st_statistics.st_ind, ot->ot_thread)) {
3801+ xt_lock_mutex_ns(&dcg->cg_lock);
3802+ new_block->cb_next = dcg->cg_free_list;
3803+ dcg->cg_free_list = new_block;
3804+ dcg->cg_free_count++;
3805+#ifdef DEBUG_CHECK_IND_CACHE
3806+ dcg->cg_read_count--;
3807+#endif
3808+ new_block->cb_state = IDX_CAC_BLOCK_FREE;
3809+ IDX_TRACE("%d- F%x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(new_block->cb_data));
3810+ xt_unlock_mutex_ns(&dcg->cg_lock);
3811+#ifdef DEBUG_CHECK_IND_CACHE
3812+ xt_ind_check_cache(NULL);
3813+#endif
3814+ return NULL;
3815+ }
3816+ IDX_TRACE("%d- R%x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(new_block->cb_data));
3817+ ot->ot_thread->st_statistics.st_ind_cache_miss++;
3818+ }
3819+ else
3820+ red_size = 0;
3821+ // PMC - I don't think this is required! memset(new_block->cb_data + red_size, 0, XT_INDEX_PAGE_SIZE - red_size);
3822+
3823+ IDX_CAC_WRITE_LOCK(seg, ot->ot_thread);
3824+ block = seg->cs_hash_table[hash_idx];
3825+ while (block) {
3826+ if (XT_NODE_ID(block->cb_address) == XT_NODE_ID(address) && block->cb_file_id == file->fr_id) {
3827+ /* Oops, someone else was faster! */
3828+ xt_lock_mutex_ns(&dcg->cg_lock);
3829+ new_block->cb_next = dcg->cg_free_list;
3830+ dcg->cg_free_list = new_block;
3831+ dcg->cg_free_count++;
3832+#ifdef DEBUG_CHECK_IND_CACHE
3833+ dcg->cg_read_count--;
3834+#endif
3835+ new_block->cb_state = IDX_CAC_BLOCK_FREE;
3836+ IDX_TRACE("%d- F%x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(new_block->cb_data));
3837+ xt_unlock_mutex_ns(&dcg->cg_lock);
3838+ goto done_ok;
3839+ }
3840+ block = block->cb_next;
3841+ }
3842+ block = new_block;
3843+
3844+ /* Make the block the most recently used: */
3845+ xt_lock_mutex_ns(&dcg->cg_lock);
3846+ block->cb_ru_time = ++dcg->cg_ru_now;
3847+ if ((block->cb_lr_used = dcg->cg_mru_block))
3848+ dcg->cg_mru_block->cb_mr_used = block;
3849+ block->cb_mr_used = NULL;
3850+ dcg->cg_mru_block = block;
3851+ if (!dcg->cg_lru_block)
3852+ dcg->cg_lru_block = block;
3853+#ifdef DEBUG_CHECK_IND_CACHE
3854+ dcg->cg_read_count--;
3855+#endif
3856+ xt_unlock_mutex_ns(&dcg->cg_lock);
3857+
3858+ /* {LAZY-DEL-INDEX-ITEMS}
3859+ * Conditionally count the number of deleted entries in the index:
3860+ * We do this before other threads can read the block.
3861+ */
3862+ if (ind->mi_lazy_delete && read_data)
3863+ xt_ind_count_deleted_items(ot->ot_table, ind, block);
3864+
3865+ /* Add to the hash table: */
3866+ block->cb_next = seg->cs_hash_table[hash_idx];
3867+ seg->cs_hash_table[hash_idx] = block;
3868+
3869+ done_ok:
3870+ *ret_seg = seg;
3871+#ifdef DEBUG_CHECK_IND_CACHE
3872+ xt_ind_check_cache(NULL);
3873+#endif
3874+ return block;
3875+}
3876+
3877+static xtBool ind_cac_get(XTOpenTablePtr ot, xtIndexNodeID address, DcSegmentPtr *ret_seg, XTIndBlockPtr *ret_block)
3878+{
3879+ register XTOpenFilePtr file = ot->ot_ind_file;
3880+ register XTIndBlockPtr block;
3881+ register DcSegmentPtr seg;
3882+ register u_int hash_idx;
3883+ register DcGlobalsRec *dcg = &ind_cac_globals;
3884+
3885+ hash_idx = XT_NODE_ID(address) + (file->fr_id * 223);
3886+ seg = &dcg->cg_segment[hash_idx & IDX_CAC_SEGMENT_MASK];
3887+ hash_idx = (hash_idx >> XT_INDEX_CACHE_SEGMENT_SHIFTS) % dcg->cg_hash_size;
3888+
3889+ IDX_CAC_READ_LOCK(seg, ot->ot_thread);
3890+ block = seg->cs_hash_table[hash_idx];
3891+ while (block) {
3892+ if (XT_NODE_ID(block->cb_address) == XT_NODE_ID(address) && block->cb_file_id == file->fr_id) {
3893+ ASSERT_NS(block->cb_state != IDX_CAC_BLOCK_FREE);
3894+
3895+ *ret_seg = seg;
3896+ *ret_block = block;
3897+ return OK;
3898+ }
3899+ block = block->cb_next;
3900+ }
3901+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3902+
3903+ /* Block not found: */
3904+ *ret_seg = NULL;
3905+ *ret_block = NULL;
3906+ return OK;
3907+}
3908+
3909+xtPublic xtBool xt_ind_write(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, size_t size, xtWord1 *data)
3910+{
3911+ XTIndBlockPtr block;
3912+ DcSegmentPtr seg;
3913+
3914+ if (!(block = ind_cac_fetch(ot, ind, address, &seg, FALSE)))
3915+ return FAILED;
3916+
3917+ XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id);
3918+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY);
3919+ memcpy(block->cb_data, data, size);
3920+ block->cp_flush_seq = ot->ot_table->tab_ind_flush_seq;
3921+ if (block->cb_state != IDX_CAC_BLOCK_DIRTY) {
3922+ TRACK_BLOCK_WRITE(offset);
3923+ xt_spinlock_lock(&ind->mi_dirty_lock);
3924+ if ((block->cb_dirty_next = ind->mi_dirty_list))
3925+ ind->mi_dirty_list->cb_dirty_prev = block;
3926+ block->cb_dirty_prev = NULL;
3927+ ind->mi_dirty_list = block;
3928+ ind->mi_dirty_blocks++;
3929+ xt_spinlock_unlock(&ind->mi_dirty_lock);
3930+ block->cb_state = IDX_CAC_BLOCK_DIRTY;
3931+ }
3932+ XT_IPAGE_UNLOCK(&block->cb_lock, TRUE);
3933+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3934+#ifdef XT_TRACK_INDEX_UPDATES
3935+ ot->ot_ind_changed++;
3936+#endif
3937+ return OK;
3938+}
3939+
3940+/*
3941+ * Update the cache, if in RAM.
3942+ */
3943+xtPublic xtBool xt_ind_write_cache(XTOpenTablePtr ot, xtIndexNodeID address, size_t size, xtWord1 *data)
3944+{
3945+ XTIndBlockPtr block;
3946+ DcSegmentPtr seg;
3947+
3948+ if (!ind_cac_get(ot, address, &seg, &block))
3949+ return FAILED;
3950+
3951+ if (block) {
3952+ XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id);
3953+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY);
3954+ memcpy(block->cb_data, data, size);
3955+ XT_IPAGE_UNLOCK(&block->cb_lock, TRUE);
3956+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3957+ }
3958+
3959+ return OK;
3960+}
3961+
3962+xtPublic xtBool xt_ind_clean(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address)
3963+{
3964+ XTIndBlockPtr block;
3965+ DcSegmentPtr seg;
3966+
3967+ if (!ind_cac_get(ot, address, &seg, &block))
3968+ return FAILED;
3969+ if (block) {
3970+ XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id);
3971+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY);
3972+
3973+ if (block->cb_state == IDX_CAC_BLOCK_DIRTY) {
3974+ /* Take the block off the dirty list: */
3975+ xt_spinlock_lock(&ind->mi_dirty_lock);
3976+ if (block->cb_dirty_next)
3977+ block->cb_dirty_next->cb_dirty_prev = block->cb_dirty_prev;
3978+ if (block->cb_dirty_prev)
3979+ block->cb_dirty_prev->cb_dirty_next = block->cb_dirty_next;
3980+ if (ind->mi_dirty_list == block)
3981+ ind->mi_dirty_list = block->cb_dirty_next;
3982+ ind->mi_dirty_blocks--;
3983+ xt_spinlock_unlock(&ind->mi_dirty_lock);
3984+ block->cb_state = IDX_CAC_BLOCK_CLEAN;
3985+ }
3986+ XT_IPAGE_UNLOCK(&block->cb_lock, TRUE);
3987+
3988+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
3989+ }
3990+
3991+ return OK;
3992+}
3993+
3994+xtPublic xtBool xt_ind_read_bytes(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, size_t size, xtWord1 *data)
3995+{
3996+ XTIndBlockPtr block;
3997+ DcSegmentPtr seg;
3998+
3999+ if (!(block = ind_cac_fetch(ot, ind, address, &seg, TRUE)))
4000+ return FAILED;
4001+
4002+ XT_IPAGE_READ_LOCK(&block->cb_lock);
4003+ memcpy(data, block->cb_data, size);
4004+ XT_IPAGE_UNLOCK(&block->cb_lock, FALSE);
4005+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
4006+ return OK;
4007+}
4008+
4009+xtPublic xtBool xt_ind_fetch(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, XTPageLockType ltype, XTIndReferencePtr iref)
4010+{
4011+ register XTIndBlockPtr block;
4012+ DcSegmentPtr seg;
4013+ xtWord2 branch_size;
4014+ xtBool xlock = FALSE;
4015+
4016+#ifdef DEBUG
4017+ ASSERT_NS(iref->ir_xlock == 2);
4018+ ASSERT_NS(iref->ir_xlock == 2);
4019+#endif
4020+ if (!(block = ind_cac_fetch(ot, ind, address, &seg, TRUE)))
4021+ return FAILED;
4022+
4023+ branch_size = XT_GET_DISK_2(((XTIdxBranchDPtr) block->cb_data)->tb_size_2);
4024+ if (XT_GET_INDEX_BLOCK_LEN(branch_size) < 2 || XT_GET_INDEX_BLOCK_LEN(branch_size) > XT_INDEX_PAGE_SIZE) {
4025+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
4026+ xt_register_taberr(XT_REG_CONTEXT, XT_ERR_INDEX_CORRUPTED, ot->ot_table->tab_name);
4027+ return FAILED;
4028+ }
4029+
4030+ switch (ltype) {
4031+ case XT_LOCK_READ:
4032+ break;
4033+ case XT_LOCK_WRITE:
4034+ xlock = TRUE;
4035+ break;
4036+ case XT_XLOCK_LEAF:
4037+ if (!XT_IS_NODE(branch_size))
4038+ xlock = TRUE;
4039+ break;
4040+ case XT_XLOCK_DEL_LEAF:
4041+ if (!XT_IS_NODE(branch_size)) {
4042+ if (ot->ot_table->tab_dic.dic_no_lazy_delete)
4043+ xlock = TRUE;
4044+ else {
4045+ /*
4046+ * {LAZY-DEL-INDEX-ITEMS}
4047+ *
4048+ * We are fetch a page for delete purpose.
4049+ * we decide here if we plan to do a lazy delete,
4050+ * Or if we plan to compact the node.
4051+ *
4052+ * A lazy delete just requires a shared lock.
4053+ *
4054+ */
4055+ if (ind->mi_lazy_delete) {
4056+ /* If the number of deleted items is greater than
4057+ * half of the number of times that can fit in the
4058+ * page, then we will compact the node.
4059+ */
4060+ if (!xt_idx_lazy_delete_on_leaf(ind, block, XT_GET_INDEX_BLOCK_LEN(branch_size)))
4061+ xlock = TRUE;
4062+ }
4063+ else
4064+ xlock = TRUE;
4065+ }
4066+ }
4067+ break;
4068+ }
4069+
4070+ if ((iref->ir_xlock = xlock))
4071+ XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id);
4072+ else
4073+ XT_IPAGE_READ_LOCK(&block->cb_lock);
4074+
4075+ IDX_CAC_UNLOCK(seg, ot->ot_thread);
4076+
4077+ /* {DIRECT-IO}
4078+ * Direct I/O requires that the buffer is 512 byte aligned.
4079+ * To do this, cb_data is turned into a pointer, instead
4080+ * of an array.
4081+ * As a result, we need to pass a pointer to both the
4082+ * cache block and the cache block data:
4083+ */
4084+ iref->ir_updated = FALSE;
4085+ iref->ir_block = block;
4086+ iref->ir_branch = (XTIdxBranchDPtr) block->cb_data;
4087+ return OK;
4088+}
4089+
4090+xtPublic xtBool xt_ind_release(XTOpenTablePtr ot, XTIndexPtr ind, XTPageUnlockType XT_NDEBUG_UNUSED(utype), XTIndReferencePtr iref)
4091+{
4092+ register XTIndBlockPtr block;
4093+
4094+ block = iref->ir_block;
4095+
4096+#ifdef DEBUG
4097+ ASSERT_NS(iref->ir_xlock != 2);
4098+ ASSERT_NS(iref->ir_updated != 2);
4099+ if (iref->ir_updated)
4100+ ASSERT_NS(utype == XT_UNLOCK_R_UPDATE || utype == XT_UNLOCK_W_UPDATE);
4101+ else
4102+ ASSERT_NS(utype == XT_UNLOCK_READ || utype == XT_UNLOCK_WRITE);
4103+ if (iref->ir_xlock)
4104+ ASSERT_NS(utype == XT_UNLOCK_WRITE || utype == XT_UNLOCK_W_UPDATE);
4105+ else
4106+ ASSERT_NS(utype == XT_UNLOCK_READ || utype == XT_UNLOCK_R_UPDATE);
4107+#endif
4108+ if (iref->ir_updated) {
4109+ /* The page was update: */
4110+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY);
4111+ block->cp_flush_seq = ot->ot_table->tab_ind_flush_seq;
4112+ if (block->cb_state != IDX_CAC_BLOCK_DIRTY) {
4113+ TRACK_BLOCK_WRITE(offset);
4114+ xt_spinlock_lock(&ind->mi_dirty_lock);
4115+ if ((block->cb_dirty_next = ind->mi_dirty_list))
4116+ ind->mi_dirty_list->cb_dirty_prev = block;
4117+ block->cb_dirty_prev = NULL;
4118+ ind->mi_dirty_list = block;
4119+ ind->mi_dirty_blocks++;
4120+ xt_spinlock_unlock(&ind->mi_dirty_lock);
4121+ block->cb_state = IDX_CAC_BLOCK_DIRTY;
4122+ }
4123+ }
4124+
4125+ XT_IPAGE_UNLOCK(&block->cb_lock, iref->ir_xlock);
4126+#ifdef DEBUG
4127+ iref->ir_xlock = 2;
4128+ iref->ir_updated = 2;
4129+#endif
4130+ return OK;
4131+}
4132+
4133+xtPublic xtBool xt_ind_reserve(XTOpenTablePtr ot, u_int count, XTIdxBranchDPtr not_this)
4134+{
4135+ register XTIndBlockPtr block;
4136+ register DcGlobalsRec *dcg = &ind_cac_globals;
4137+
4138+#ifdef XT_TRACK_INDEX_UPDATES
4139+ ot->ot_ind_reserved = count;
4140+ ot->ot_ind_reads = 0;
4141+#endif
4142+#ifdef DEBUG_CHECK_IND_CACHE
4143+ xt_ind_check_cache(NULL);
4144+#endif
4145+ while (ot->ot_ind_res_count < count) {
4146+ if (!dcg->cg_free_list) {
4147+ if (!ind_cac_free_lru_blocks(ot, count - ot->ot_ind_res_count, not_this)) {
4148+ if (!dcg->cg_free_list) {
4149+ xt_ind_free_reserved(ot);
4150+ xt_register_xterr(XT_REG_CONTEXT, XT_ERR_NO_INDEX_CACHE);
4151+#ifdef DEBUG_CHECK_IND_CACHE
4152+ xt_ind_check_cache(NULL);
4153+#endif
4154+ return FAILED;
4155+ }
4156+ }
4157+ }
4158+
4159+ /* Get a free block: */
4160+ xt_lock_mutex_ns(&dcg->cg_lock);
4161+ while (ot->ot_ind_res_count < count && (block = dcg->cg_free_list)) {
4162+ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_FREE);
4163+ dcg->cg_free_list = block->cb_next;
4164+ dcg->cg_free_count--;
4165+ block->cb_next = ot->ot_ind_res_bufs;
4166+ ot->ot_ind_res_bufs = block;
4167+ ot->ot_ind_res_count++;
4168+#ifdef DEBUG_CHECK_IND_CACHE
4169+ dcg->cg_reserved_by_ots++;
4170+#endif
4171+ }
4172+ xt_unlock_mutex_ns(&dcg->cg_lock);
4173+ }
4174+#ifdef DEBUG_CHECK_IND_CACHE
4175+ xt_ind_check_cache(NULL);
4176+#endif
4177+ return OK;
4178+}
4179+
4180+xtPublic void xt_ind_free_reserved(XTOpenTablePtr ot)
4181+{
4182+#ifdef DEBUG_CHECK_IND_CACHE
4183+ xt_ind_check_cache(NULL);
4184+#endif
4185+ if (ot->ot_ind_res_bufs) {
4186+ register XTIndBlockPtr block, fblock;
4187+ register DcGlobalsRec *dcg = &ind_cac_globals;
4188+
4189+ xt_lock_mutex_ns(&dcg->cg_lock);
4190+ block = ot->ot_ind_res_bufs;
4191+ while (block) {
4192+ fblock = block;
4193+ block = block->cb_next;
4194+
4195+ fblock->cb_next = dcg->cg_free_list;
4196+ dcg->cg_free_list = fblock;
4197+#ifdef DEBUG_CHECK_IND_CACHE
4198+ dcg->cg_reserved_by_ots--;
4199+#endif
4200+ dcg->cg_free_count++;
4201+ }
4202+ xt_unlock_mutex_ns(&dcg->cg_lock);
4203+ ot->ot_ind_res_bufs = NULL;
4204+ ot->ot_ind_res_count = 0;
4205+ }
4206+#ifdef DEBUG_CHECK_IND_CACHE
4207+ xt_ind_check_cache(NULL);
4208+#endif
4209+}
4210+
4211+xtPublic void xt_ind_unreserve(XTOpenTablePtr ot)
4212+{
4213+ if (!ind_cac_globals.cg_free_list)
4214+ xt_ind_free_reserved(ot);
4215+}
4216+
4217
4218=== added file 'plugin/pbxt/src/cache_xt.h'
4219--- plugin/pbxt/src/cache_xt.h 1970-01-01 00:00:00 +0000
4220+++ plugin/pbxt/src/cache_xt.h 2010-04-01 14:19:35 +0000
4221@@ -0,0 +1,188 @@
4222+/* Copyright (c) 2005 PrimeBase Technologies GmbH
4223+ *
4224+ * PrimeBase XT
4225+ *
4226+ * This program is free software; you can redistribute it and/or modify
4227+ * it under the terms of the GNU General Public License as published by
4228+ * the Free Software Foundation; either version 2 of the License, or
4229+ * (at your option) any later version.
4230+ *
4231+ * This program is distributed in the hope that it will be useful,
4232+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
4233+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4234+ * GNU General Public License for more details.
4235+ *
4236+ * You should have received a copy of the GNU General Public License
4237+ * along with this program; if not, write to the Free Software
4238+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
4239+ *
4240+ * 2005-05-24 Paul McCullagh
4241+ *
4242+ * H&G2JCtL
4243+ */
4244+#ifndef __xt_cache_h__
4245+#define __xt_cache_h__
4246+
4247+//#define XT_USE_MYSYS
4248+
4249+#include "filesys_xt.h"
4250+#include "index_xt.h"
4251+
4252+struct XTOpenTable;
4253+struct XTIdxReadBuffer;
4254+
4255+#ifdef DEBUG
4256+//#define XT_USE_CACHE_DEBUG_SIZES
4257+#endif
4258+
4259+#ifdef XT_USE_CACHE_DEBUG_SIZES
4260+#define XT_INDEX_CACHE_SEGMENT_SHIFTS 1
4261+#else
4262+#define XT_INDEX_CACHE_SEGMENT_SHIFTS 3
4263+#endif
4264+
4265+#define IDX_CAC_BLOCK_FREE 0
4266+#define IDX_CAC_BLOCK_CLEAN 1
4267+#define IDX_CAC_BLOCK_DIRTY 2
4268+
4269+#ifdef XT_NO_ATOMICS
4270+#define XT_IPAGE_USE_PTHREAD_RW
4271+#else
4272+//#define XT_IPAGE_USE_ATOMIC_RW
4273+#define XT_IPAGE_USE_SPINXSLOCK
4274+//#define XT_IPAGE_USE_SKEW_RW
4275+#endif
4276+
4277+#ifdef XT_IPAGE_USE_ATOMIC_RW
4278+#define XT_IPAGE_LOCK_TYPE XTAtomicRWLockRec
4279+#define XT_IPAGE_INIT_LOCK(s, i) xt_atomicrwlock_init_with_autoname(s, i)
4280+#define XT_IPAGE_FREE_LOCK(s, i) xt_atomicrwlock_free(s, i)
4281+#define XT_IPAGE_READ_LOCK(i) xt_atomicrwlock_slock(i)
4282+#define XT_IPAGE_WRITE_LOCK(i, o) xt_atomicrwlock_xlock(i, o)
4283+#define XT_IPAGE_UNLOCK(i, x) xt_atomicrwlock_unlock(i, x)
4284+#elif defined(XT_IPAGE_USE_PTHREAD_RW)
4285+#define XT_IPAGE_LOCK_TYPE xt_rwlock_type
4286+#define XT_IPAGE_INIT_LOCK(s, i) xt_init_rwlock(s, i)
4287+#define XT_IPAGE_FREE_LOCK(s, i) xt_free_rwlock(i)
4288+#define XT_IPAGE_READ_LOCK(i) xt_slock_rwlock_ns(i)
4289+#define XT_IPAGE_WRITE_LOCK(i, s) xt_xlock_rwlock_ns(i)
4290+#define XT_IPAGE_UNLOCK(i, x) xt_unlock_rwlock_ns(i)
4291+#elif defined(XT_IPAGE_USE_SPINXSLOCK)
4292+#define XT_IPAGE_LOCK_TYPE XTSpinXSLockRec
4293+#define XT_IPAGE_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, i)
4294+#define XT_IPAGE_FREE_LOCK(s, i) xt_spinxslock_free(s, i)
4295+#define XT_IPAGE_READ_LOCK(i) xt_spinxslock_slock(i)
4296+#define XT_IPAGE_WRITE_LOCK(i, o) xt_spinxslock_xlock(i, o)
4297+#define XT_IPAGE_UNLOCK(i, x) xt_spinxslock_unlock(i, x)
4298+#else // XT_IPAGE_USE_SKEW_RW
4299+#define XT_IPAGE_LOCK_TYPE XTSkewRWLockRec
4300+#define XT_IPAGE_INIT_LOCK(s, i) xt_skewrwlock_init_with_autoname(s, i)
4301+#define XT_IPAGE_FREE_LOCK(s, i) xt_skewrwlock_free(s, i)
4302+#define XT_IPAGE_READ_LOCK(i) xt_skewrwlock_slock(i)
4303+#define XT_IPAGE_WRITE_LOCK(i, o) xt_skewrwlock_xlock(i, o)
4304+#define XT_IPAGE_UNLOCK(i, x) xt_skewrwlock_unlock(i, x)
4305+#endif
4306+
4307+enum XTPageLockType { XT_LOCK_READ, XT_LOCK_WRITE, XT_XLOCK_LEAF, XT_XLOCK_DEL_LEAF };
4308+enum XTPageUnlockType { XT_UNLOCK_NONE, XT_UNLOCK_READ, XT_UNLOCK_WRITE, XT_UNLOCK_R_UPDATE, XT_UNLOCK_W_UPDATE };
4309+
4310+/* A block is X locked if it is being changed or freed.
4311+ * A block is S locked if it is being read.
4312+ */
4313+typedef struct XTIndBlock {
4314+ xtIndexNodeID cb_address; /* The block address. */
4315+ u_int cb_file_id; /* The file id of the block. */
4316+ /* This is protected by cs_lock */
4317+ struct XTIndBlock *cb_next; /* Pointer to next block on hash list, or next free block on free list. */
4318+ /* This is protected by mi_dirty_lock */
4319+ struct XTIndBlock *cb_dirty_next; /* Double link for dirty blocks, next pointer. */
4320+ struct XTIndBlock *cb_dirty_prev; /* Double link for dirty blocks, previous pointer. */
4321+ /* This is protected by cg_lock */
4322+ xtWord4 cb_ru_time; /* If this is in the top 1/4 don't change position in MRU list. */
4323+ struct XTIndBlock *cb_mr_used; /* More recently used blocks. */
4324+ struct XTIndBlock *cb_lr_used; /* Less recently used blocks. */
4325+ /* Protected by cb_lock: */
4326+ XT_IPAGE_LOCK_TYPE cb_lock;
4327+ xtWord1 cb_state; /* Block status. */
4328+ xtWord2 cb_handle_count; /* TRUE if this page is referenced by a handle. */
4329+ xtWord2 cp_flush_seq;
4330+ xtWord2 cp_del_count; /* Number of deleted entries. */
4331+#ifdef XT_USE_DIRECT_IO_ON_INDEX
4332+ xtWord1 *cb_data;
4333+#else
4334+ xtWord1 cb_data[XT_INDEX_PAGE_SIZE];
4335+#endif
4336+} XTIndBlockRec, *XTIndBlockPtr;
4337+
4338+typedef struct XTIndReference {
4339+ xtBool ir_xlock; /* Set to TRUE if the cache block is X locked. */
4340+ xtBool ir_updated; /* Set to TRUE if the cache block is updated. */
4341+ XTIndBlockPtr ir_block;
4342+ XTIdxBranchDPtr ir_branch;
4343+} XTIndReferenceRec, *XTIndReferencePtr;
4344+
4345+typedef struct XTIndFreeBlock {
4346+ XTDiskValue1 if_zero1_1; /* Must be set to zero. */
4347+ XTDiskValue1 if_zero2_1; /* Must be set to zero. */
4348+ XTDiskValue1 if_status_1;
4349+ XTDiskValue1 if_unused1_1;
4350+ XTDiskValue4 if_unused2_4;
4351+ XTDiskValue8 if_next_block_8;
4352+} XTIndFreeBlockRec, *XTIndFreeBlockPtr;
4353+
4354+typedef struct XTIndHandleBlock {
4355+ xtWord4 hb_ref_count;
4356+ struct XTIndHandleBlock *hb_next;
4357+ XTIdxBranchDRec hb_branch;
4358+} XTIndHandleBlockRec, *XTIndHandleBlockPtr;
4359+
4360+typedef struct XTIndHandle {
4361+ struct XTIndHandle *ih_next;
4362+ struct XTIndHandle *ih_prev;
4363+ XTSpinLockRec ih_lock;
4364+ xtIndexNodeID ih_address;
4365+ xtBool ih_cache_reference; /* True if this handle references the cache. */
4366+ union {
4367+ XTIndBlockPtr ih_cache_block;
4368+ XTIndHandleBlockPtr ih_handle_block;
4369+ } x;
4370+ XTIdxBranchDPtr ih_branch;
4371+} XTIndHandleRec, *XTIndHandlePtr;
4372+
4373+void xt_ind_init(XTThreadPtr self, size_t cache_size);
4374+void xt_ind_exit(XTThreadPtr self);
4375+
4376+xtInt8 xt_ind_get_usage();
4377+xtInt8 xt_ind_get_size();
4378+xtBool xt_ind_write(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID offset, size_t size, xtWord1 *data);
4379+xtBool xt_ind_write_cache(struct XTOpenTable *ot, xtIndexNodeID offset, size_t size, xtWord1 *data);
4380+xtBool xt_ind_clean(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID offset);
4381+xtBool xt_ind_read_bytes(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID offset, size_t size, xtWord1 *data);
4382+void xt_ind_check_cache(XTIndexPtr ind);
4383+xtBool xt_ind_reserve(struct XTOpenTable *ot, u_int count, XTIdxBranchDPtr not_this);
4384+void xt_ind_free_reserved(struct XTOpenTable *ot);
4385+void xt_ind_unreserve(struct XTOpenTable *ot);
4386+
4387+xtBool xt_ind_fetch(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID node, XTPageLockType ltype, XTIndReferencePtr iref);
4388+xtBool xt_ind_release(struct XTOpenTable *ot, XTIndexPtr ind, XTPageUnlockType utype, XTIndReferencePtr iref);
4389+
4390+void xt_ind_lock_handle(XTIndHandlePtr handle);
4391+void xt_ind_unlock_handle(XTIndHandlePtr handle);
4392+xtBool xt_ind_copy_on_write(XTIndReferencePtr iref);
4393+
4394+XTIndHandlePtr xt_ind_get_handle(struct XTOpenTable *ot, XTIndexPtr ind, XTIndReferencePtr iref);
4395+void xt_ind_release_handle(XTIndHandlePtr handle, xtBool have_lock, XTThreadPtr thread);
4396+
4397+#ifdef DEBUG
4398+//#define DEBUG_CHECK_IND_CACHE
4399+#endif
4400+
4401+//#define XT_TRACE_INDEX
4402+
4403+#ifdef XT_TRACE_INDEX
4404+#define IDX_TRACE(x, y, z) xt_trace(x, y, z)
4405+#else
4406+#define IDX_TRACE(x, y, z)
4407+#endif
4408+
4409+#endif
4410
4411=== added file 'plugin/pbxt/src/ccutils_xt.cc'
4412--- plugin/pbxt/src/ccutils_xt.cc 1970-01-01 00:00:00 +0000
4413+++ plugin/pbxt/src/ccutils_xt.cc 2010-04-01 14:19:35 +0000
4414@@ -0,0 +1,69 @@
4415+/* Copyright (c) 2005 PrimeBase Technologies GmbH
4416+ *
4417+ * PrimeBase XT
4418+ *
4419+ * This program is free software; you can redistribute it and/or modify
4420+ * it under the terms of the GNU General Public License as published by
4421+ * the Free Software Foundation; either version 2 of the License, or
4422+ * (at your option) any later version.
4423+ *
4424+ * This program is distributed in the hope that it will be useful,
4425+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
4426+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4427+ * GNU General Public License for more details.
4428+ *
4429+ * You should have received a copy of the GNU General Public License
4430+ * along with this program; if not, write to the Free Software
4431+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
4432+ *
4433+ * 2006-05-16 Paul McCullagh
4434+ *
4435+ * H&G2JCtL
4436+ *
4437+ * C++ Utilities
4438+ */
4439+
4440+#include "xt_config.h"
4441+
4442+#include "pthread_xt.h"
4443+#include "ccutils_xt.h"
4444+#include "bsearch_xt.h"
4445+
4446+static int ccu_compare_object(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b)
4447+{
4448+ XTObject *obj_ptr = (XTObject *) b;
4449+
4450+ return obj_ptr->compare(a);
4451+}
4452+
4453+void XTListImp::append(XTThreadPtr self, XTObject *info, void *key) {
4454+ size_t idx;
4455+
4456+ if (li_item_count == 0)
4457+ idx = 0;
4458+ else if (li_item_count == 1) {
4459+ int r;
4460+
4461+ if ((r = li_items[0]->compare(key)) == 0)
4462+ idx = 0;
4463+ else if (r < 0)
4464+ idx = 0;
4465+ else
4466+ idx = 1;
4467+ }
4468+ else {
4469+ xt_bsearch(self, key, li_items, li_item_count, sizeof(void *), &idx, NULL, ccu_compare_object);
4470+ }
4471+
4472+ if (!xt_realloc(NULL, (void **) &li_items, (li_item_count + 1) * sizeof(void *))) {
4473+ if (li_referenced)
4474+ info->release(self);
4475+ xt_throw_errno(XT_CONTEXT, XT_ENOMEM);
4476+ return;
4477+ }
4478+ memmove(&li_items[idx+1], &li_items[idx], (li_item_count-idx) * sizeof(void *));
4479+ li_items[idx] = info;
4480+ li_item_count++;
4481+}
4482+
4483+
4484
4485=== added file 'plugin/pbxt/src/ccutils_xt.h'
4486--- plugin/pbxt/src/ccutils_xt.h 1970-01-01 00:00:00 +0000
4487+++ plugin/pbxt/src/ccutils_xt.h 2010-04-01 14:19:35 +0000
4488@@ -0,0 +1,220 @@
4489+/* Copyright (c) 2005 PrimeBase Technologies GmbH
4490+ *
4491+ * PrimeBase XT
4492+ *
4493+ * This program is free software; you can redistribute it and/or modify
4494+ * it under the terms of the GNU General Public License as published by
4495+ * the Free Software Foundation; either version 2 of the License, or
4496+ * (at your option) any later version.
4497+ *
4498+ * This program is distributed in the hope that it will be useful,
4499+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
4500+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4501+ * GNU General Public License for more details.
4502+ *
4503+ * You should have received a copy of the GNU General Public License
4504+ * along with this program; if not, write to the Free Software
4505+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
4506+ *
4507+ * 2006-05-16 Paul McCullagh
4508+ *
4509+ * H&G2JCtL
4510+ *
4511+ * C++ Utilities
4512+ */
4513+
4514+#ifndef __ccutils_xt_h__
4515+#define __ccutils_xt_h__
4516+
4517+#include <errno.h>
4518+
4519+#include "xt_defs.h"
4520+#include "thread_xt.h"
4521+
4522+class XTObject
4523+{
4524+ private:
4525+ u_int o_refcnt;
4526+
4527+ public:
4528+ inline XTObject() { o_refcnt = 1; }
4529+
4530+ virtual ~XTObject() { }
4531+
4532+ inline void reference() {
4533+ o_refcnt++;
4534+ }
4535+
4536+ inline void release(XTThreadPtr self) {
4537+ ASSERT(o_refcnt > 0);
4538+ o_refcnt--;
4539+ if (o_refcnt == 0) {
4540+ finalize(self);
4541+ delete this;
4542+ }
4543+ }
4544+
4545+ virtual XTObject *factory(XTThreadPtr self) {
4546+ XTObject *new_obj;
4547+
4548+ if (!(new_obj = new XTObject()))
4549+ xt_throw_errno(XT_CONTEXT, XT_ENOMEM);
4550+ return new_obj;
4551+ }
4552+
4553+ virtual XTObject *clone(XTThreadPtr self) {
4554+ XTObject *new_obj;
4555+
4556+ new_obj = factory(self);
4557+ new_obj->init(self, this);
4558+ return new_obj;
4559+ }
4560+
4561+ virtual void init(XTThreadPtr self) { (void) self; }
4562+ virtual void init(XTThreadPtr self, XTObject *obj) { (void) obj; init(self); }
4563+ virtual void finalize(XTThreadPtr self) { (void) self; }
4564+ virtual int compare(const void *key) { (void) key; return -1; }
4565+};
4566+
4567+class XTListImp
4568+{
4569+ protected:
4570+ bool li_referenced;
4571+ u_int li_item_count;
4572+ XTObject **li_items;
4573+
4574+ public:
4575+ inline XTListImp() : li_referenced(true), li_item_count(0), li_items(NULL) { }
4576+
4577+ inline void setNonReferenced() { li_referenced = false; }
4578+
4579+ void append(XTThreadPtr self, XTObject *info) {
4580+ if (!xt_realloc(NULL, (void **) &li_items, (li_item_count + 1) * sizeof(void *))) {
4581+ if (li_referenced)
4582+ info->release(self);
4583+ xt_throw_errno(XT_CONTEXT, XT_ENOMEM);
4584+ return;
4585+ }
4586+ li_items[li_item_count] = info;
4587+ li_item_count++;
4588+ }
4589+
4590+ void insert(XTThreadPtr self, XTObject *info, u_int i) {
4591+ if (!xt_realloc(NULL, (void **) &li_items, (li_item_count + 1) * sizeof(void *))) {
4592+ if (li_referenced)
4593+ info->release(self);
4594+ xt_throw_errno(XT_CONTEXT, XT_ENOMEM);
4595+ return;
4596+ }
4597+ memmove(&li_items[i+1], &li_items[i], (li_item_count-i) * sizeof(XTObject *));
4598+ li_items[i] = info;
4599+ li_item_count++;
4600+ }
4601+
4602+ void addToFront(XTThreadPtr self, XTObject *info) {
4603+ insert(self, info, 0);
4604+ }
4605+
4606+ /* Will sort! */
4607+ void append(XTThreadPtr self, XTObject *info, void *key);
4608+
4609+ inline bool remove(XTObject *info) {
4610+ for (u_int i=0; i<li_item_count; i++) {
4611+ if (li_items[i] == info) {
4612+ li_item_count--;
4613+ memmove(&li_items[i], &li_items[i+1], (li_item_count - i) * sizeof(XTObject *));
4614+ return true;
4615+ }
4616+ }
4617+ return false;
4618+ }
4619+
4620+ inline bool remove(XTThreadPtr self, u_int i) {
4621+ XTObject *item;
4622+
4623+ if (i >= li_item_count)
4624+ return false;
4625+ item = li_items[i];
4626+ li_item_count--;
4627+ memmove(&li_items[i], &li_items[i+1], (li_item_count - i) * sizeof(void *));
4628+ if (li_referenced)
4629+ item->release(self);
4630+ return true;
4631+ }
4632+
4633+ inline XTObject *take(u_int i) {
4634+ XTObject *item;
4635+
4636+ if (i >= li_item_count)
4637+ return NULL;
4638+ item = li_items[i];
4639+ li_item_count--;
4640+ memmove(&li_items[i], &li_items[i+1], (li_item_count - i) * sizeof(void *));
4641+ return item;
4642+ }
4643+
4644+ inline u_int size() const { return li_item_count; }
4645+
4646+ inline void setEmpty(XTThreadPtr self) {
4647+ if (li_items)
4648+ xt_free(self, li_items);
4649+ li_item_count = 0;
4650+ li_items = NULL;
4651+ }
4652+
4653+ inline bool isEmpty() { return li_item_count == 0; }
4654+
4655+ inline XTObject *itemAt(u_int i) const {
4656+ if (i >= li_item_count)
4657+ return NULL;
4658+ return li_items[i];
4659+ }
4660+};
4661+
4662+
4663+template <class T> class XTList : public XTListImp
4664+{
4665+ public:
4666+ inline XTList() : XTListImp() { }
4667+
4668+ inline void append(XTThreadPtr self, T *a) { XTListImp::append(self, a); }
4669+ inline void insert(XTThreadPtr self, T *a, u_int i) { XTListImp::insert(self, a, i); }
4670+ inline void addToFront(XTThreadPtr self, T *a) { XTListImp::addToFront(self, a); }
4671+
4672+ inline bool remove(T *a) { return XTListImp::remove(a); }
4673+
4674+ inline bool remove(XTThreadPtr self, u_int i) { return XTListImp::remove(self, i); }
4675+
4676+ inline T *take(u_int i) { return (T *) XTListImp::take(i); }
4677+
4678+ inline T *itemAt(u_int i) const { return (T *) XTListImp::itemAt(i); }
4679+
4680+ inline u_int indexOf(T *a) {
4681+ u_int i;
4682+
4683+ for (i=0; i<size(); i++) {
4684+ if (itemAt(i) == a)
4685+ break;
4686+ }
4687+ return i;
4688+ }
4689+
4690+ void deleteAll(XTThreadPtr self)
4691+ {
4692+ for (u_int i=0; i<size(); i++) {
4693+ if (li_referenced)
4694+ itemAt(i)->release(self);
4695+ }
4696+ setEmpty(self);
4697+ }
4698+
4699+ void clone(XTThreadPtr self, XTListImp *list)
4700+ {
4701+ deleteAll(self);
4702+ for (u_int i=0; i<list->size(); i++) {
4703+ XTListImp::append(self, list->itemAt(i)->clone(self));
4704+ }
4705+ }
4706+};
4707+
4708+#endif
4709
4710=== added file 'plugin/pbxt/src/database_xt.cc'
4711--- plugin/pbxt/src/database_xt.cc 1970-01-01 00:00:00 +0000
4712+++ plugin/pbxt/src/database_xt.cc 2010-04-01 14:19:35 +0000
4713@@ -0,0 +1,1314 @@
4714+/* Copyright (c) 2005 PrimeBase Technologies GmbH
4715+ *
4716+ * PrimeBase XT
4717+ *
4718+ * This program is free software; you can redistribute it and/or modify
4719+ * it under the terms of the GNU General Public License as published by
4720+ * the Free Software Foundation; either version 2 of the License, or
4721+ * (at your option) any later version.
4722+ *
4723+ * This program is distributed in the hope that it will be useful,
4724+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
4725+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4726+ * GNU General Public License for more details.
4727+ *
4728+ * You should have received a copy of the GNU General Public License
4729+ * along with this program; if not, write to the Free Software
4730+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
4731+ *
4732+ * 2005-01-15 Paul McCullagh
4733+ *
4734+ * H&G2JCtL
4735+ */
4736+
4737+#include "xt_config.h"
4738+
4739+#ifdef DRIZZLED
4740+#include <bitset>
4741+#endif
4742+
4743+#include <string.h>
4744+#include <stdio.h>
4745+
4746+#include "pthread_xt.h"
4747+#include "hashtab_xt.h"
4748+#include "filesys_xt.h"
4749+#include "database_xt.h"
4750+#include "memory_xt.h"
4751+#include "heap_xt.h"
4752+#include "datalog_xt.h"
4753+#include "strutil_xt.h"
4754+#include "util_xt.h"
4755+#include "trace_xt.h"
4756+
4757+#ifdef DEBUG
4758+//#define XT_TEST_XACT_OVERFLOW
4759+#endif
4760+
4761+#ifndef NAME_MAX
4762+#define NAME_MAX 128
4763+#endif
4764+
4765+/*
4766+ * -----------------------------------------------------------------------
4767+ * GLOBALS
4768+ */
4769+
4770+xtPublic XTDatabaseHPtr pbxt_database = NULL; // The global open database
4771+
4772+xtPublic xtLogOffset xt_db_log_file_threshold;
4773+xtPublic size_t xt_db_log_buffer_size;
4774+xtPublic size_t xt_db_transaction_buffer_size;
4775+xtPublic size_t xt_db_checkpoint_frequency;
4776+xtPublic off_t xt_db_data_log_threshold;
4777+xtPublic size_t xt_db_data_file_grow_size;
4778+xtPublic size_t xt_db_row_file_grow_size;
4779+xtPublic int xt_db_garbage_threshold;
4780+xtPublic int xt_db_log_file_count;
4781+xtPublic int xt_db_auto_increment_mode; /* 0 = MySQL compatible, 1 = PrimeBase Compatible. */
4782+xtPublic int xt_db_offline_log_function; /* 0 = recycle logs, 1 = delete logs, 2 = keep logs */
4783+xtPublic int xt_db_sweeper_priority; /* 0 = low (default), 1 = normal, 2 = high */
4784+
4785+xtPublic XTSortedListPtr xt_db_open_db_by_id = NULL;
4786+xtPublic XTHashTabPtr xt_db_open_databases = NULL;
4787+xtPublic time_t xt_db_approximate_time = 0; /* A "fast" alternative timer (not too accurate). */
4788+
4789+static xtDatabaseID db_next_id = 1;
4790+static volatile XTOpenFilePtr db_lock_file = NULL;
4791+
4792+/*
4793+ * -----------------------------------------------------------------------
4794+ * LOCK/UNLOCK INSTALLATION
4795+ */
4796+
4797+xtPublic void xt_lock_installation(XTThreadPtr self, char *installation_path)
4798+{
4799+ char file_path[PATH_MAX];
4800+ char buffer[101];
4801+ size_t red_size;
4802+ llong pid;
4803+ xtBool cd = pbxt_crash_debug;
4804+
4805+ xt_strcpy(PATH_MAX, file_path, installation_path);
4806+ xt_add_pbxt_file(PATH_MAX, file_path, "no-debug");
4807+ if (xt_fs_exists(file_path))
4808+ pbxt_crash_debug = FALSE;
4809+ xt_strcpy(PATH_MAX, file_path, installation_path);
4810+ xt_add_pbxt_file(PATH_MAX, file_path, "crash-debug");
4811+ if (xt_fs_exists(file_path))
4812+ pbxt_crash_debug = TRUE;
4813+
4814+ if (pbxt_crash_debug != cd) {
4815+ if (pbxt_crash_debug)
4816+ xt_logf(XT_NT_WARNING, "Crash debugging has been turned on ('crash-debug' file exists)\n");
4817+ else
4818+ xt_logf(XT_NT_WARNING, "Crash debugging has been turned off ('no-debug' file exists)\n");
4819+ }
4820+ else if (pbxt_crash_debug)
4821+ xt_logf(XT_NT_WARNING, "Crash debugging is enabled\n");
4822+
4823+ /* Moved the lock file out of the pbxt directory so that
4824+ * it is possible to drop the pbxt database!
4825+ */
4826+ xt_strcpy(PATH_MAX, file_path, installation_path);
4827+ xt_add_dir_char(PATH_MAX, file_path);
4828+ xt_strcat(PATH_MAX, file_path, "pbxt-lock");
4829+ db_lock_file = xt_open_file(self, file_path, XT_FS_CREATE | XT_FS_MAKE_PATH);
4830+
4831+ try_(a) {
4832+ if (!xt_lock_file(self, db_lock_file)) {
4833+ xt_logf(XT_NT_ERROR, "A server appears to already be running\n");
4834+ xt_logf(XT_NT_ERROR, "The file: %s, is locked\n", file_path);
4835+ xt_throw_xterr(XT_CONTEXT, XT_ERR_SERVER_RUNNING);
4836+ }
4837+ if (!xt_pread_file(db_lock_file, 0, 100, 0, buffer, &red_size, &self->st_statistics.st_rec, self))
4838+ xt_throw(self);
4839+ if (red_size > 0) {
4840+ buffer[red_size] = 0;
4841+#ifdef XT_WIN
4842+ pid = (llong) _atoi64(buffer);
4843+#else
4844+ pid = atoll(buffer);
4845+#endif
4846+ /* Problem with this code is, after a restart
4847+ * the process ID's are reused.
4848+ * If some system process grabs the proc id that
4849+ * the server had on the last run, then
4850+ * the database will not start.
4851+ if (xt_process_exists((xtProcID) pid)) {
4852+ xt_logf(XT_NT_ERROR, "A server appears to already be running, process ID: %lld\n", pid);
4853+ xt_logf(XT_NT_ERROR, "Remove the file: %s, if this is not the case\n", file_path);
4854+ xt_throw_xterr(XT_CONTEXT, XT_ERR_SERVER_RUNNING);
4855+ }
4856+ */
4857+ xt_logf(XT_NT_INFO, "The server was not shutdown correctly, recovery required\n");
4858+#ifdef XT_BACKUP_BEFORE_RECOVERY
4859+ if (pbxt_crash_debug) {
4860+ /* The server was not shut down correctly. Make a backup before
4861+ * we start recovery.
4862+ */
4863+ char extension[100];
4864+
4865+ for (int i=1;;i++) {
4866+ xt_strcpy(PATH_MAX, file_path, installation_path);
4867+ xt_remove_dir_char(file_path);
4868+ sprintf(extension, "-recovery-%d", i);
4869+ xt_strcat(PATH_MAX, file_path, extension);
4870+ if (!xt_fs_exists(file_path))
4871+ break;
4872+ }
4873+ xt_logf(XT_NT_INFO, "In order to reproduce recovery errors a backup of the installation\n");
4874+ xt_logf(XT_NT_INFO, "will be made to:\n");
4875+ xt_logf(XT_NT_INFO, "%s\n", file_path);
4876+ xt_logf(XT_NT_INFO, "Copy in progress...\n");
4877+ xt_fs_copy_dir(self, installation_path, file_path);
4878+ xt_logf(XT_NT_INFO, "Copy OK\n");
4879+ }
4880+#endif
4881+ }
4882+
4883+ sprintf(buffer, "%lld", (llong) xt_getpid());
4884+ xt_set_eof_file(self, db_lock_file, 0);
4885+ if (!xt_pwrite_file(db_lock_file, 0, strlen(buffer), buffer, &self->st_statistics.st_rec, self))
4886+ xt_throw(self);
4887+ }
4888+ catch_(a) {
4889+ xt_close_file(self, db_lock_file);
4890+ db_lock_file = NULL;
4891+ xt_throw(self);
4892+ }
4893+ cont_(a);
4894+}
4895+
4896+xtPublic void xt_unlock_installation(XTThreadPtr self, char *installation_path)
4897+{
4898+ if (db_lock_file) {
4899+ char lock_file[PATH_MAX];
4900+
4901+ xt_unlock_file(NULL, db_lock_file);
4902+ xt_close_file_ns(db_lock_file);
4903+ db_lock_file = NULL;
4904+
4905+ xt_strcpy(PATH_MAX, lock_file, installation_path);
4906+ xt_add_dir_char(PATH_MAX, lock_file);
4907+ xt_strcat(PATH_MAX, lock_file, "pbxt-lock");
4908+ xt_fs_delete(self, lock_file);
4909+ }
4910+}
4911+
4912+int *xt_bad_pointer = 0;
4913+
4914+void xt_crash_me(void)
4915+{
4916+ if (pbxt_crash_debug)
4917+ *xt_bad_pointer = 123;
4918+}
4919+
4920+/*
4921+ * -----------------------------------------------------------------------
4922+ * INIT/EXIT DATABASE
4923+ */
4924+
4925+static xtBool db_hash_comp(void *key, void *data)
4926+{
4927+ XTDatabaseHPtr db = (XTDatabaseHPtr) data;
4928+
4929+ return strcmp((char *) key, db->db_name) == 0;
4930+}
4931+
4932+static xtHashValue db_hash(xtBool is_key, void *key_data)
4933+{
4934+ XTDatabaseHPtr db = (XTDatabaseHPtr) key_data;
4935+
4936+ if (is_key)
4937+ return xt_ht_hash((char *) key_data);
4938+ return xt_ht_hash(db->db_name);
4939+}
4940+
4941+static xtBool db_hash_comp_ci(void *key, void *data)
4942+{
4943+ XTDatabaseHPtr db = (XTDatabaseHPtr) data;
4944+
4945+ return strcasecmp((char *) key, db->db_name) == 0;
4946+}
4947+
4948+static xtHashValue db_hash_ci(xtBool is_key, void *key_data)
4949+{
4950+ XTDatabaseHPtr db = (XTDatabaseHPtr) key_data;
4951+
4952+ if (is_key)
4953+ return xt_ht_casehash((char *) key_data);
4954+ return xt_ht_casehash(db->db_name);
4955+}
4956+
4957+static void db_hash_free(XTThreadPtr self, void *data)
4958+{
4959+ xt_heap_release(self, (XTDatabaseHPtr) data);
4960+}
4961+
4962+static int db_cmp_db_id(struct XTThread *XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b)
4963+{
4964+ xtDatabaseID db_id = *((xtDatabaseID *) a);
4965+ XTDatabaseHPtr *db_ptr = (XTDatabaseHPtr *) b;
4966+
4967+ if (db_id == (*db_ptr)->db_id)
4968+ return 0;
4969+ if (db_id < (*db_ptr)->db_id)
4970+ return -1;
4971+ return 1;
4972+}
4973+
4974+xtPublic void xt_init_databases(XTThreadPtr self)
4975+{
4976+ if (pbxt_ignore_case)
4977+ xt_db_open_databases = xt_new_hashtable(self, db_hash_comp_ci, db_hash_ci, db_hash_free, TRUE, TRUE);
4978+ else
4979+ xt_db_open_databases = xt_new_hashtable(self, db_hash_comp, db_hash, db_hash_free, TRUE, TRUE);
4980+ xt_db_open_db_by_id = xt_new_sortedlist(self, sizeof(XTDatabaseHPtr), 20, 10, db_cmp_db_id, NULL, NULL, FALSE, FALSE);
4981+}
4982+
4983+xtPublic void xt_stop_database_threads(XTThreadPtr self, xtBool sync)
4984+{
4985+ u_int len = 0;
4986+ XTDatabaseHPtr *dbptr;
4987+ XTDatabaseHPtr db = NULL;
4988+
4989+ if (xt_db_open_db_by_id)
4990+ len = xt_sl_get_size(xt_db_open_db_by_id);
4991+ for (u_int i=0; i<len; i++) {
4992+ if ((dbptr = (XTDatabaseHPtr *) xt_sl_item_at(xt_db_open_db_by_id, i))) {
4993+ db = *dbptr;
4994+ if (sync) {
4995+ /* Wait for the sweeper: */
4996+ xt_wait_for_sweeper(self, db, 16);
4997+
4998+ /* Wait for the writer: */
4999+ xt_wait_for_writer(self, db);
5000+
The diff has been truncated for viewing.