GU_AVPHYS_SIZE can report more available memory than can be addressed on 32-bit systems

Bug #1204241 reported by Raghavendra D Prabhu
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Fix Released
Medium
Alex Yurchenko
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Undecided
Unassigned

Bug Description

lp:1181347 is regressing on centos6-32

http://jenkins.percona.com/job/percona-xtrabackup-2.1-param/393/BUILD_TYPE=release,Host=centos6-32,xtrabackuptarget=galera55/testReport/junit/(root)/t_xb_galera_sst/sh/

=============================================
130723 15:14:13 [Note] WSREP: Some threads may fail to exit.
130723 15:14:13 [Note] WSREP: Setting wsrep_ready to 0
130723 15:14:13 [Note] WSREP: Read nil XID from storage engines, skipping position init
130723 15:14:13 [Note] WSREP: wsrep_load(): loading provider library '/home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/server/lib/libgalera_smm.so'
130723 15:14:13 [Note] WSREP: wsrep_load(): Galera 2.6(r152) by Codership Oy <email address hidden> loaded succesfully.
130723 15:14:13 [Warning] WSREP: Could not open saved state file for reading: /home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data//grastate.dat
130723 15:14:13 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
130723 15:14:13 [Note] WSREP: Preallocating 134219040/134219040 bytes in '/home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data//galera.cache'...
130723 15:14:13 [ERROR] WSREP: galerautils/src/gu_fifo.c:gu_fifo_create():102: Maximum FIFO size 9663938748 exceeds size_t range 4294967295
130723 15:14:13 [ERROR] WSREP: gcs/src/gcs.c:gcs_create():264: Failed to create recv_q.
130723 15:14:13 [ERROR] WSREP: gcs/src/gcs.c:gcs_create():310: Failed to create GCS connection handle.
130723 15:14:13 [Note] WSREP: Passing config to GCS: base_host = 127.0.0.1; base_port = 4567; cert.log_conflicts = no; debug = 1; gcache.dir = /home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 2147483647; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://127.0.0.1:31241; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
============================================

Note, that wsrep-debug=1 and debug=1 in wsrep_provider_options as requested.

Related branches

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

I would not call it "regressing": there it was an undetected overflow resulting in a wrong available memory estimate. Here it is most likely bad initial length estimate in gcs.c rather than a bug in gu_fifo.c. What does getconf -a | grep PAGES say on that system?

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Output of getconf -a on that host: https://gist.github.com/3ede88972f3719a96b50

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

So you have 32Gb of RAM on a machine running 32-bit OS?
Reporting more *available* memory than can be addressed IMO is a bug in libc. But sure, we'll do a workaround.

Changed in galera:
assignee: nobody → Alex Yurchenko (ayurchen)
importance: Undecided → Medium
milestone: none → 23.2.7
status: New → Confirmed
summary: - lp:1181347 regresses on 32 bit builds
+ GU_AVPHYS_SIZE can report more available memory than can be addressed on
+ 32-bit systems
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@#3,

yes, looks like that,

             total used free shared buffers cached
Mem: 32032 1236 30796 0 0 1236
-/+ buffers/cache: 0 32032
Swap: 512 2 509
Total: 32544 1238 31306

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Actually, GU_AVPHYS_SIZE may not be ideal here.

a) It doesn't take into account fs cache.

b) On PAE systems, applications can address more than 4G. However, size_t on those systems is still 32 bit I presume and will still overflow.

http://compgroups.net/comp.unix.programmer/available-physical-memory/536419 has discussion relating to it.

There is get_avphys_pages , however, it is linux only.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

OTOH, if the problem is only on linux/glibc, then this - https://bazaar.launchpad.net/~percona-dev/percona-xtradb-cluster/galera-2.x/revision/126 - should do. (I have tested this).

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Raghu,
1) I don't think that this is a Linux-specific problem, at least the link you provided above clearly mentioned 32-bit Solaris supporting 64Gb RAM.
2) Your patch does work and will work for a while, simply because instead of memory size in bytes you use memory size in pages. Which is not what you want. I'd just use min(GU_AVPHYS_SIZE, size_t(-1))

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Regarding #2,

it was meant to be https://bazaar.launchpad.net/~percona-dev/percona-xtradb-cluster/galera-2.x/revision/126 than using the pages directly. Though I am not sure if this fixes that (testing on jenkins).

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

So, using get_avphys_pages also fails, I will test with min(GU_AVPHYS_SIZE, size_t(-1)) on jenkins.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Test with min(GU_AVPHYS_SIZE, size_t(-1)) works fine on jenkins.

Changed in galera:
status: Confirmed → In Progress
Revision history for this message
Alex Yurchenko (ayurchen) wrote :
Changed in galera:
status: In Progress → Fix Committed
Changed in percona-xtradb-cluster:
milestone: none → 5.5.33-23.7.6
status: New → Fix Committed
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Changed in galera:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1400

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.