cupsd crashed with SIGSEGV in _cups_strcasecmp()

Bug #855445 reported by Till Kamppeter
32
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cups (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Problem of bug 855026, trying another retrace ...

ProblemType: Crash
DistroRelease: Ubuntu 11.10
Package: cups 1.5.0-6bzr1
ProcVersionSignature: Ubuntu 3.0.0-11.18-generic 3.0.4
Uname: Linux 3.0.0-11-generic x86_64
ApportVersion: 1.23-0ubuntu1
Architecture: amd64
CrashCounter: 1
Date: Wed Sep 21 13:06:40 2011
ExecutablePath: /usr/sbin/cupsd
MachineType: LENOVO 7417CTO
Papersize: a4
PccardctlIdent:
 Socket 0:
   product info: "MoGo Mouse BT", " ", "", ""
PccardctlStatus:
 Socket 0:
   5.0V 16-bit PC Card
   Subdevice 0 (function 0) [unbound]
ProcAttrCurrent: /usr/sbin/cupsd (enforce)
ProcCmdline: /usr/sbin/cupsd -F
ProcEnviron: PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-11-generic root=/dev/mapper/hostname-root ro quiet splash vt.handoff=7
SegvAnalysis:
 Segfault happened at: 0x7f494254c7f8 <_cups_strcasecmp+8>: movzbl (%rsi),%ecx
 PC (0x7f494254c7f8) ok
 source "(%rsi)" (0x00000005) not located in a known VMA region (needed readable region)!
 destination "%ecx" ok
SegvReason: reading NULL VMA
Signal: 11
SourcePackage: cups
StacktraceTop:
 _cups_strcasecmp () from /usr/lib/x86_64-linux-gnu/libcups.so.2
 ?? () from /usr/lib/x86_64-linux-gnu/libcups.so.2
 ?? () from /usr/lib/x86_64-linux-gnu/libcups.so.2
 ?? ()
 ?? ()
Title: cupsd crashed with SIGSEGV in _cups_strcasecmp()
UpgradeStatus: Upgraded to oneiric on 2009-12-18 (642 days ago)
UserGroups:

dmi.bios.date: 10/13/2009
dmi.bios.vendor: LENOVO
dmi.bios.version: 7UET79WW (3.09 )
dmi.board.name: 7417CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7UET79WW(3.09):bd10/13/2009:svnLENOVO:pn7417CTO:pvrThinkPadT400:rvnLENOVO:rn7417CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 7417CTO
dmi.product.version: ThinkPad T400
dmi.sys.vendor: LENOVO
mtime.conffile..etc.cups.cupsd.conf: 2011-09-19T18:40:54.178026

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :
visibility: private → public
Revision history for this message
Apport retracing service (apport) wrote :

StacktraceTop:
 _cups_strcasecmp () from /tmp/tmpkJkQ0k/usr/lib/x86_64-linux-gnu/libcups.so.2
 ?? () from /tmp/tmpkJkQ0k/usr/lib/x86_64-linux-gnu/libcups.so.2
 ?? () from /tmp/tmpkJkQ0k/usr/lib/x86_64-linux-gnu/libcups.so.2
 ?? ()
 ?? ()

Revision history for this message
Apport retracing service (apport) wrote : Stacktrace.txt
Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt
Changed in cups (Ubuntu):
status: New → Invalid
Revision history for this message
Apport retracing service (apport) wrote : Crash report cannot be processed

Thank you for your report!

However, processing it in order to get sufficient information for the
developers failed (it does not generate an useful symbolic stack trace). This
might be caused by some outdated packages which were installed on your system
at the time of the report:

outdated debug symbol package for libavahi-common3: package version 0.6.30-4ubuntu1 dbgsym version 0.6.30-0ubuntu2
outdated debug symbol package for libtiff4: package version 3.9.5-1ubuntu1 dbgsym version 3.9.4-5ubuntu6
outdated debug symbol package for libnih1: package version 1.0.3-4ubuntu2 dbgsym version 1.0.3-1ubuntu1
outdated debug symbol package for libslang2: package version 2.2.4-2ubuntu1 dbgsym version 2.2.2-4ubuntu2
outdated debug symbol package for libk5crypto3: package version 1.9.1+dfsg-1ubuntu1 dbgsym version 1.8.3+dfsg-5ubuntu2.1
outdated debug symbol package for libavahi-client3: package version 0.6.30-4ubuntu1 dbgsym version 0.6.30-0ubuntu2
outdated debug symbol package for passwd: package version 1:4.1.4.2+svn3283-3ubuntu2 dbgsym version 1:4.1.4.2+svn3283-3ubuntu1
outdated debug symbol package for libkrb5-3: package version 1.9.1+dfsg-1ubuntu1 dbgsym version 1.8.3+dfsg-5ubuntu2.1
outdated debug symbol package for libgssapi-krb5-2: package version 1.9.1+dfsg-1ubuntu1 dbgsym version 1.8.3+dfsg-5ubuntu2.1
outdated debug symbol package for procps: package version 1:3.2.8-10ubuntu5 dbgsym version 1:3.2.8-10ubuntu3
outdated debug symbol package for libkrb5support0: package version 1.9.1+dfsg-1ubuntu1 dbgsym version 1.8.3+dfsg-5ubuntu2.1
outdated debug symbol package for libnih-dbus1: package version 1.0.3-4ubuntu2 dbgsym version 1.0.3-1ubuntu1
outdated debug symbol package for module-init-tools: package version 3.16-1ubuntu1 dbgsym version 3.12-1ubuntu6

Please upgrade your system to the latest package versions. If you still
encounter the crash, please file a new report.

Thank you for your understanding, and sorry for the inconvenience!

tags: removed: need-amd64-retrace
Changed in cups (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :
Download full text (8.1 KiB)

Obtained the stack trace locally due to problems with the current CUPS package. For this stack trace the current snapshot of the Debian BZR repository (with the fix of the problem) was built locally and then with a rapid firing of queue addition and removal operations (~60) the crash was triggered again and the offer of Apport to report it accepted. Then I have followed all instructions on the screen until it refused to submit the report to Launchpad as it is from a local package build.

Now I have unpacked /var/crash/_usr_sbin_cupsd.0.crash with apport-unpack (into ~/test/) and got following stack trace:

----------
till@till:~$ gdb -c ~/test/CoreDump /usr/sbin/cupsd
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /usr/sbin/cupsd...Reading symbols from /usr/lib/debug/usr/sbin/cupsd...done.
done.
[New LWP 7444]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/sbin/cupsd -F'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f38b12297f8 in _cups_strcasecmp (
    s=0x7f38b686b824 "Hewlett-Packard HP Color LaserJet CM3530 MFP @ till",
    t=0x1 <Address 0x1 out of bounds>) at string.c:613
613 while (*s != '\0' && *t != '\0')
(gdb) bt full
#0 0x00007f38b12297f8 in _cups_strcasecmp (
    s=0x7f38b686b824 "Hewlett-Packard HP Color LaserJet CM3530 MFP @ till",
    t=0x1 <Address 0x1 out of bounds>) at string.c:613
No locals.
#1 0x00007f38b120148d in cups_array_find (a=0x7f38b68683e0, e=0x7f38b690a530,
    prev=<optimized out>, rdiff=0x7fff2e59482c) at array.c:1265
        left = 0
        right = 38
        current = 19
        diff = <optimized out>
#2 0x00007f38b120218d in cupsArrayRemove (a=0x7f38b68683e0, e=<optimized out>)
    at array.c:900
        i = <optimized out>
        current = <optimized out>
        diff = <optimized out>
#3 0x00007f38b2c3272d in dnssdDeregisterPrinter (p=0x7f38b690a530)
    at dirsvc.c:2681
No locals.
#4 0x00007f38b2c514af in cupsdDeletePrinter (p=0x7f38b690a530,
    update=<optimized out>) at printers.c:901
        i = <optimized out>
---Type <return> to continue, or q <return> to quit---
        changed = 0
#5 0x00007f38b2c1b491 in delete_printer (con=0x7f38b6929dc0,
    uri=<optimized out>) at ipp.c:7068
        status = <optimized out>
        printer = 0x7f38b690a530
        filename = "/var/cache/cups/0017.data\000\377\377\377\377\377\377@IY.\377\177\000\000\000P׳8\177\000\000O\000\000\000\000\000\000\000 P\237\256\070\177\000\000\060\367\346\262\070\177\000\000\001\000\000\000\000\000\000\000\236ޛ\260\070\177\000\000\200OY.\377\177\000\000`IY.\377\177\000\000\360\326׳8\177\000\000PJY.\377\177\000\000\001\000\000\000\000\000\000\000\330JY.\377\...

Read more...

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

I have investigated the problem more and found a simple workaround, but not a fix:

When creating or removing a queue, the printer is (un)registered for DNS-SD broadcasting this queue and CUPS (all in scheduler/dirsvc.c) keeps in an array (DNSSDPrinters) which queues are broadcasting. CUPS adds the DNS-SD service name to the printer's record (reg_name in cupsd_printer_s, scheduler/printers.h) and then adds the record to the array (scheduler/dirsvc.c, line 2903), with the service name as sorting/searching criteria (dnssdComparePrinters() function).

When it removes a queue, CUPS searches the record in the array to remove it and afterwards sets the service name in the record to NULL (scheduler/dirsvc.c, line 2680). When creating a queue CUPS searches the array to find the correct place for the new record, so that they keep sorted. All this is done by CUPS' own array infrastructure in cups/array.c.

On this array search (it does a binary search, function cups_array_find() in cups/array.c) it chokes on a record in the array where the pointer to the service name (p->reg_name) is neither the start address of a string nor 0 (NULL), but 1 (!), so it is a pointer to the memory address 0x1 (see variable t in step #0 of the back trace). I did not find out how this happened, but I can work around it by defining the array without comparison function (set it to NULL in line 1653 of scheduler/dirsvc.c), getting an unsorted array which is searched linearly with only pointer comparisons (cups/array.c, line 104):

    for (current = 0; current < a->num_elements; current ++)
      if (a->elements[current] == e)
      {
        diff = 0;
        break;
      }

Array size is the number of locally defined queues. Most users have less than 100. If it comes high a server has around 10000 queues. So searching linearly here should not introduce a too high delay and the workaround therefore should not be a problem.

Applying the workaround at least for Oneiric ...

Changed in cups (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

After applying the workaround I have created 10000 print queues with the following command line

x=0; while [[ $x != 10000 ]]; do lpadmin -p 0000$x -E -v dnssd://CM3530%20%40%20eee-pc._ipp._tcp.local/cups; echo $x; x=$(( x + 1 )); done

checked whether they are really there

lpstat -v | wc -l

and removed them:

x=0; while [[ $x != 10000 ]]; do lpadmin -x 0000$x; echo $x; x=$(( x + 1 )); done

All the time it was able to do several queues per second (up to around 10) and CUPS did not crash during this rapid fire. gnome-settings-daemon, system-config-printer-applet, and system-config-printer keep running and do not crash. They simply have a higher CPU consumption, the former two for notifying on the desktop the latter for updating its main window (being unresponsive though) under this torture.

So this shows that the workaround is OK. I will apply it.

Changed in cups (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

As the origin of the problem can be in the cups-avahi patch I will not yet report this bug upstream.

I have subscribed Tim Waugh to make him aware of a possible problem in the patch.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Applied workaround to the Debian BZR repository of CUPS. pitti, can you upload a new CUPS package? Thanks.

Changed in cups (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cups - 1.5.0-7

---------------
cups (1.5.0-7) unstable; urgency=low

  [ Till Kamppeter ]
  * debian/patches/dnssd-reg-array-linear-search.patch: Let the array of
    DNS-SD-registered printers be searched linearly and not binarily by
    the DNS-SD service name, as due to another bug the DNS-SD service name
    is not correctly set in some array elements (LP: #855445). Not
    reporting upstream yet, as the original bug can be in the Avahi
    support patch.

  [ Martin Pitt ]
  * debian/local/cups.upstart: Lower the timeout for waiting for the socket
    to 3 seconds. Tweak the respawn count and limit to be longer than the
    socket poll time, so that a continuously failing cupsd process will not
    cause an eternal respawn loop. (LP: #854490)
  * debian/control: Have libcupsmime1 break pre-1.5 versions of cups as well,
    so that we avoid having a configured "cups" packae with an upgraded, but
    unconfigured libcupsmime1 package. This takes care of LP: #854490 during
    upgrades.
  * debian/rules: Configure with --enable-debug, to stop the upstream build
    system stripping all our binaries. This fixes the cups-dbg package.
 -- Martin Pitt <email address hidden> Thu, 22 Sep 2011 19:57:54 +0200

Changed in cups (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Additional remark: The new CUPS package with workaround adds 10000 queues in 5:12 min and removes them in 2:50 min, using the command lines of comment #9, probably even faster on a head-less server without gnome-settings-daemon and system-config-printer. Most important: CUPS did not crash, all 10000 queues got created and removed correctly.

Revision history for this message
Tim Waugh (twaugh) wrote :

Till, I haven't been able to reproduce this. Could you please try running your local build (without the work-around) under valgrind to track down what the cause is?

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Did a first try of running

# valgrind -v --trace-children=yes /usr/sbin/cupsd -f > /tmp/log1 2>&1

in one terminal and the first command line of comment #9 in another. No crash of CUPS, all 10000 queues got created, but a crash of a piece of valgrind, bug #861186. Log is attached.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

I cannot reproduce it any more. I have created and removed the 10000 queues of comment #9 but no CUPS crash at all. Instead, I got the Valgrind crash and it also deleted the 97 queues I had before (only random stuff for testing, no queues for daily printing).

At least from the creating and removing 10000 queues I know that sorting and binary search for the list of DNS-SD-published printers is overkill. Linear search is fast enough. So I recommend to adopt the (trivial) workaround into the CUPS Avahi patch upstream to simply improve robustness.

Revision history for this message
Tim Waugh (twaugh) wrote :

I'd rather we didn't hide this difficult-to-debug issue any further than it is already hidden. The valgrind trace unfortunately doesn't reveal anything -- but perhaps MALLOC_CHECK_ would have, or libefence. It's a shame you can't reproduce this now.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Tim, it seems that one needs to have a big bunch of print queues installed, with different backends and PPD files. Hammering in thousands of identical raw queues with the same device URI seems not to trigger the problem. The important seems to be to have a base of ~100 well different queues. Then adding or removing queues will trigger the bug in some rare cases.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.