bad fractional math in bash

Bug #1245934 reported by Matthew B
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Monitoring Plugins
Fix Released
Medium
Unassigned

Bug Description

When using pmp-check-mysql-replication-delay with pt-heartbeat, MySQL can return fractional delay which bash doesn't support.

percona@den-iops-01:~/nagios-plugins$ ./pmp-check-mysql-replication-delay --version
Percona Monitoring Plugins pmp-check-mysql-replication-delay 1.0.5

-- Test delay with Show Slave Status
percona@den-iops-01:~/nagios-plugins$ ./pmp-check-mysql-replication-delay -H den-db-alpha-03
OK 6 seconds of replication delay | replication_delay=6;300;600;0;

-- Test delay against heartbeat table
percona@den-iops-01:~/nagios-plugins$ ./pmp-check-mysql-replication-delay -H den-db-alpha-03 -T percona.heartbeat -s 104129
[: 139: Illegal number: -0.000700
[: 139: Illegal number: -0.000700
OK -0.000700 seconds of replication delay | replication_delay=-0.000700;300;600;0;

Patch proposal:

--- pmp-check-mysql-replication-delay 2013-10-29 10:51:56.477282996 -0500
+++ pmp-check-mysql-replication-delay.orig 2013-10-29 10:24:46.165290035 -0500
@@ -91,9 +91,9 @@
       NOTE="${LEVEL:-0} seconds of replication delay"
       if [ "${LEVEL:-""}" = "NULL" ]; then
          NOTE="UNK replica is stopped"
- elif [ $(echo "${LEVEL:-0} ${OPT_CRIT}" | awk '{print ($1 > $2)}') = 1 ]; then
+ elif [ "${LEVEL:-0}" -gt "${OPT_CRIT}" ]; then
          NOTE="CRIT $NOTE | $PERFDATA"
- elif [ $(echo "${LEVEL:-0} ${OPT_WARN}" | awk '{print ($1 > $2)}') = 1 ]; then
+ elif [ "${LEVEL:-0}" -gt "${OPT_WARN}" ]; then
          NOTE="WARN $NOTE | $PERFDATA"
       else
          NOTE="OK $NOTE | $PERFDATA"

percona@den-iops-01:~/nagios-plugins$ ./pmp-check-mysql-replication-delay -H den-db-alpha-03 -T percona.heartbeat -s 104129
OK 6.999160 seconds of replication delay | replication_delay=6.999160;300;600;0;

percona@den-iops-01:~/nagios-plugins$ ./pmp-check-mysql-replication-delay -H den-db-alpha-03 -T percona.heartbeat -s 104129 -w 6.1
WARN 10.999280 seconds of replication delay | replication_delay=10.999280;6.1;600;0;

percona@den-iops-01:~/nagios-plugins$ ./pmp-check-mysql-replication-delay -H den-db-alpha-03 -T percona.heartbeat -s 104129 -w 6.1 -c 7.8
OK -0.000920 seconds of replication delay | replication_delay=-0.000920;6.1;7.8;0;

Tags: nagios
Revision history for this message
Roman Vynar (roman-vynar) wrote :

Hi Matthew,

I am not sure whether the checking against fraction of seconds make any sense.
Why do you need to check against -w 6.1 and 6 is not OK?

About why it returns "OK -0.000700 seconds" on the first test - I am not able to reproduce that (fractional result).
It appears to be possible if the actual query includes NOW() but it's not.

Can you please show me the output from your host on these queries?

SELECT MAX(UNIX_TIMESTAMP(UTC_TIMESTAMP) - UNIX_TIMESTAMP(ts)) AS delay from heartbeat where server_id=104129;
SELECT MAX(UNIX_TIMESTAMP() - UNIX_TIMESTAMP(ts)) AS delay from heartbeat where server_id=104129;

Also what's the version of pt-heartbeat in use on the master?

Thanks,
Roman

Changed in percona-monitoring-plugins:
status: New → Incomplete
tags: added: nagios
Revision history for this message
Roman Vynar (roman-vynar) wrote :

Also please show the actual value:
SELECT MAX(ts) from heartbeat where server_id=104129;

Changed in percona-monitoring-plugins:
status: Incomplete → In Progress
importance: Undecided → Medium
milestone: none → 1.1
Revision history for this message
Roman Vynar (roman-vynar) wrote :

mysql> select * from heartbeat;
+----------------------------+-----------+--------------------+----------+-----------------------+---------------------+
| ts | server_id | file | position | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+--------------------+----------+-----------------------+---------------------+
| 2013-05-03 17:23:15 | 1 | NULL | NULL | NULL | NULL |
| 2013-11-26T14:39:51.001960 | 101 | centos6-bin.000019 | 4711312 | NULL | NULL |
+----------------------------+-----------+--------------------+----------+-----------------------+---------------------+
2 rows in set (0.00 sec)

Related http://www.mysqlperformanceblog.com/2013/07/29/pt-heartbeat-2-1-your-troubles-are-over/

Unfortunately, I am still not able to reproduce the problem.

Changed in percona-monitoring-plugins:
milestone: 1.1 → none
status: In Progress → Incomplete
Revision history for this message
Matthew B (utdrmac) wrote :

Hey Roman,
My apologies but I no longer have access to the server that originally had this issue. So I can't check the versions of everything.

To answer one of your questions, I would never test against a fractional delay. That was simply to test my patch. I'm more concerned about the "bash error illegal number" and possibly producing a false positive.

My patch allows pmp-check-mysql-replication-delay to work with both whole integers and fractions. I'd rather cover all bases and be safe than sorry. My $0.02.

If I ever get access to that machine again, I'll check things out and re-open this bug if necessary.

Revision history for this message
Roman Vynar (roman-vynar) wrote :

Thank you Matthew. Please feel free to re-open the bug if you have more details.

Changed in percona-monitoring-plugins:
status: Incomplete → Invalid
Revision history for this message
bwitt (brianwitt) wrote :

We have this problem on a few of our hosts; UNIX_TIMESTAMP(ts) is returning fractional values:

$ ./pmp-check-mysql-replication-delay --version
Percona Monitoring Plugins pmp-check-mysql-replication-delay 1.1.3

also pt-heartbeat on the master:
$ /usr/bin/pt-heartbeat --version
pt-heartbeat 2.2.6

mysql> SELECT UTC_TIMESTAMP, UNIX_TIMESTAMP(UTC_TIMESTAMP), ts, UNIX_TIMESTAMP(ts), UNIX_TIMESTAMP(UTC_TIMESTAMP) - UNIX_TIMESTAMP(ts) FROM heartbeat;
+---------------------+-------------------------------+----------------------------+--------------------+----------------------------------------------------+
| UTC_TIMESTAMP | UNIX_TIMESTAMP(UTC_TIMESTAMP) | ts | UNIX_TIMESTAMP(ts) | UNIX_TIMESTAMP(UTC_TIMESTAMP) - UNIX_TIMESTAMP(ts) |
+---------------------+-------------------------------+----------------------------+--------------------+----------------------------------------------------+
| 2014-05-12 23:25:03 | 1399937103 | 2014-05-12T23:25:03.000370 | 1399937103.000370 | -0.000370 |
| 2014-05-12 23:25:03 | 1399937103 | 2014-05-12T23:25:03.000410 | 1399937103.000410 | -0.000410 |
+---------------------+-------------------------------+----------------------------+--------------------+----------------------------------------------------+
2 rows in set (0.00 sec)

we're using
ii percona-nagios-plugins 1.1.3-1
ii percona-server-client-5.6 1:5.6.15-rel63.0-519.precise
ii percona-server-common-5.6 1:5.6.15-rel63.0-519.precise
ii percona-server-server-5.6 1:5.6.15-rel63.0-519.precise
ii percona-toolkit 2.2.6

Everything seems to work fine on the hosts that have mysql 5.5 and 5.1.

Is that because since 5.6.4, these functions now work with fractional values?
http://dev.mysql.com/doc/refman/5.6/en/date-and-time-functions.html#function_unix-timestamp

bwitt (brianwitt)
Changed in percona-monitoring-plugins:
status: Invalid → Confirmed
Revision history for this message
Mike Siekkinen (9-mikes) wrote :

Another work around is to change the SQL query to wrap unix_timestamp through round to eliminate passing the fractional portion to bash. (see attachment)

Changed in percona-monitoring-plugins:
milestone: none → 1.1.4
Revision history for this message
Roman Vynar (roman-vynar) wrote :

Wrapped through round:

- SQL="SELECT MAX(${NOW_FUNC} - UNIX_TIMESTAMP(ts)) AS delay
+ SQL="SELECT MAX(${NOW_FUNC} - ROUND(UNIX_TIMESTAMP(ts))) AS delay

Changed in percona-monitoring-plugins:
status: Confirmed → Fix Committed
Changed in percona-monitoring-plugins:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.