“Oracle/MySQL Engineering” team

Branches of MySQL Server

Name	Status	Last Modified	Last Commit
lp:mysql-server/cluster-7.0	2 Mature	2015-04-17 14:23:53 UTC 2015-04-17	4873. ndb - add mapping table for MaxNoOfEx... Author: Jonas Oreland Revision Date: 2012-03-09 13:24:50 UTC ndb - add mapping table for MaxNoOfExecutionThreads to thread-types
lp:mysql-server/cluster-7.1	2 Mature	2015-04-17 14:22:12 UTC 2015-04-17	5125. Fix regression introduced by fix for ... Author: Ole John Aske Revision Date: 2015-01-21 08:37:53 UTC Fix regression introduced by fix for bug#19524096. That fix caused ndb_global_schema_lock_error to fail as mysqld ended up in a deadlock between TransporterFacade and ClusterMgr mutex in ClusterMgr::is_cluster_completely_unavailable(). There likely are other deadlock scenarios also. The fix is to remove the Guard locking clusterMgrThreadMutex. The rational and (lack of) correctness for this is discussed in a mail pasted below. There are also a few small improvements to ::is_cluster_completely_unavailable() 1. Dont copy entire 'trp_node' struct but read it by refference instead. 2. Replace usage of 'm_impl->m_transporter_facade' with 'getTransporter()' as this is the pattern used elsewhere in this code block. --------------- Pasted mail with some background ---------------- Subject: (Dead-) locking of ClusterMgr::theNodes[] structures Hi Writing this as a note to myself and others after having analyzed the failure of ndb_global_schema_lock_error.test. That test timed out as mysqld ended up in a deadlock between ClusterMgr::clusterMgrThreadMutex and TransporterFacade::theMutexPtr ClusterMgr maintains node state & info in theNodes[]. From external components, this info is access through ClusterMgr::getNodeInfo(). theNodes[] are only updated from within ClustMgr, all external access is read only. Updates to theNodes[] are partly done from withing several ClustMgr::exec<foo> methods, and partly from ClusterMgr::reportConnected() / ::reportDisconnected(). All updates seems to be done with ClusterMgr::clusterMgrThreadMutex locked. Several ClusterMgr methods are available for inspecting node status from other components, these all use information from theNodes[]. Some of the more commonly used of these methods are: - TransporterFacade::get_node_alive(n) (Reads 'theClusterMgr->getNodeInfo(n)') - TransporterFacade::get_an_alive_node() (Implemented on top of ::get_node_alive(n)) - NdbImpl::get_node_stopping(n) (Reads 'theClusterMgr->getNodeInfo(n)') - NdbImpl::get_node_alive(n) (Reads 'theClusterMgr->getNodeInfo(n)') - NdbImpl::get<foo> ...A lot more node state getters.... The locking schema used to provide atomicity of theNodes[] for the above methods are .... mixed, and not really defined at all as far as I can tell. Some examples: - NdbDictInterface::dictSignal(), NdbDictInterface::forceGCPWait() & NdbDictInterface::listObjects(): Before calling get_node_alive() / ::get_an_alive_node(), a PollGuard is set. PollGuard calls trp_client::start_poll() which is different pre/post 7.3: - Pre 7.3, trp_client::start_poll -> TransporterFacade::start_poll - -> lock TransporterFacade mutex (a global mutex) - 7.3 -> trp_client::start_poll, lock trp_client m_mutex. -> TransporterFacade::start_poll ...no locking, and mutex gone in this version Observations: There are no locking of ClusterMgr::clusterMgrThreadMutex here, neither pre/post 7.3 . - Ndb_cluster_connection::wait_until_ready(), Ndb_cluster_connection_impl::get_next_alive_node() Ndb_cluster_connection::get_no_ready(): These all sets the TransporterFacadeFurthermore mutex. - Ndb::sendRecSignal Sets a PollGuard as above, which either lock the TransporterFacade or the trp_client mutex - Ndb::sendPrepTrans() Documents in comments that TransporterFacade mutex should be locked prior to call So this has become a total mess. It might seem like that it prior to 7.3 was the intention that TransporterFacade mutex should be held when accessing theNodes[], or any methods that access it itself. After 7.3 a mix of TransporterFacade and Trp_client mutex is effectively used Updating looks like it sets the ClusterMgr mutex to protect these, which will of course not work as the reader doesnt set this mutex. However, it could be that all updates happens at places where it is called from the TransporterFacade. Here we used to hold the TransporterFacade mutex prior to 7.3, which would make some sense. This all needs more investigation .... and some documentation in the code... In the current state there certainly are no effective concurrency protection of the node state info in 7.3+ , It could be that it work in 7.1 & 7.2 On top of this the fix for bug#19524096 introduced ClusterMgr::is_cluster_completely_unavailable() which is also based on the nodes status available in theNodes[]. Probably in good faith, backed by that updates of theNodes[] was protected with clusterMgrThreadMutex, that method grabs that lock before accessing theNodes[] info. Actually this method is the only node state getters which does any explicit locking. ... and it was doomed to fail as this was completely untested territory. See other mail about how it could deadlock with the TranporterFacade mutex. Sidenote: any other node state getters attempting to follow the same locking pattern had likely deadlocked the same way. ::is_cluster_completely_unavailable() is called from within code which also calls ::get_node_alive() and ::get_an_alive_node(), without using any locking protection for these. Based on this I will propose a patch for the bug#19524096 regression, which simply removes the mutex locking from ::is_cluster_completely_unavailable(). This will not be entirely kosher based on how the shared node state structs should have been mutex protected. However, based on my discussion above, there are already so many violations in this area that a single more should not matter. A bigger effort should be taken to clean up this entirely.
lp:mysql-server/cluster-7.2	2 Mature	2015-04-17 14:21:59 UTC 2015-04-17	4625. Cherrypicked revision-id: mauritz.su... Author: Mauritz Sundell Revision Date: 2014-10-09 13:02:54 UTC Cherrypicked revision-id: mauritz.sundell@oracle.com-20141009124636-dg0th9bzvr27r1i7 parent: mauritz.sundell@oracle.com-20140930122950-gn1rl2yigc4s7ucu committer: Mauritz Sundell <mauritz.sundell@oracle.com> branch nick: mysql-7.1 timestamp: Thu 2014-10-09 14:46:36 +0200 message: Bug #19582807 MAKE SIGNAL DUMP IN TRACE FILES ALWAYS START FROM LATEST SIGNAL Removes a regression introduced with patch for the above bug. During crash dumps there could be a segmentation fault or most recent signals could be dump as old signals or not at all.
lp:mysql-server/cluster-7.3	1 Development	2015-04-17 14:21:29 UTC 2015-04-17	4444. Cherrypicked revision-id: mauritz.su... Author: Mauritz Sundell Revision Date: 2014-10-09 13:04:27 UTC Cherrypicked revision-id: mauritz.sundell@oracle.com-20141009124636-dg0th9bzvr27r1i7 parent: mauritz.sundell@oracle.com-20140930122950-gn1rl2yigc4s7ucu committer: Mauritz Sundell <mauritz.sundell@oracle.com> branch nick: mysql-7.1 timestamp: Thu 2014-10-09 14:46:36 +0200 message: Bug #19582807 MAKE SIGNAL DUMP IN TRACE FILES ALWAYS START FROM LATEST SIGNAL Removes a regression introduced with patch for the above bug. During crash dumps there could be a segmentation fault or most recent signals could be dump as old signals or not at all.
lp:mysql-server/5.1	2 Mature	2015-01-22 12:24:36 UTC 2015-01-22	4061. Bug#17617945 BUFFER OVERFLOW IN GET_M... Author: Tor Didriksen Revision Date: 2013-11-01 15:39:19 UTC Bug#17617945 BUFFER OVERFLOW IN GET_MERGE_MANY_BUFFS_COST WITH SMALL SORT_BUFFER_SIZE get_cost_calc_buff_size() could return wrong value for the size of imerge_cost_buff.
lp:mysql-server/5.5	2 Mature	2015-01-22 12:24:21 UTC 2015-01-22	4736. Added sles11 repo packages Author: Balasubramanian Kandasamy Revision Date: 2014-11-04 07:30:23 UTC Added sles11 repo packages
lp:mysql-server/5.6	2 Mature	2015-01-22 12:24:05 UTC 2015-01-22	6235. Bug #19183565 CREATE DYNAMIC INNODB_T... Author: Thirunarayanan B Revision Date: 2014-11-21 05:20:02 UTC Bug #19183565 CREATE DYNAMIC INNODB_TMPDIR VARIABLE TO CONTROL WHERE INNODB WRITES TEMP FILES - Reverting the patch.
lp:mysql-server	1 Development	2015-01-22 12:23:44 UTC 2015-01-22	7626. Fix for bugs 18402580 and 18402999, a... Author: Bjorn Munch Revision Date: 2014-03-19 10:32:19 UTC Fix for bugs 18402580 and 18402999, as suggested for 18402580 Instead of relying on $HOME, use Perl's getpwuid() to get home dir.