Comment 5 for bug 1021471

Revision history for this message
Stéphane Graber (stgraber) wrote : Re: stuck on mutex_lock creating a new network namespace when starting a container

Not much luck reproducing at the moment with an up to date quantal, though running using the deadline scheduler with two containers rebooting in a loop, I eventually hit that:

Jul 19 07:22:34 lantea kernel: [46965.795778] ---[ end trace c212400a9b13d700 ]---
Jul 19 07:22:35 lantea kernel: [46965.809353] general protection fault: 0000 [#2] SMP
Jul 19 07:22:35 lantea kernel: [46965.812019] Modules linked in: veth ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables 8021q garp bridge stp llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm coretemp microcode snd_seq_midi snd_rawmidi psmouse serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer snd_seq_device i915 bonding rfcomm bnep bluetooth parport_pc ppdev mac_hid snd drm_kms_helper drm i2c_algo_bit soundcore snd_page_alloc video lp parport hid_generic usbhid hid r8169 floppy
Jul 19 07:22:35 lantea kernel: [46965.812019]
Jul 19 07:22:35 lantea kernel: [46965.812019] Pid: 11839, comm: initctl Tainted: G D 3.5.0-5-generic #5-Ubuntu /945GSE
Jul 19 07:22:35 lantea kernel: [46965.812019] EIP: 0060:[<c154bdf3>] EFLAGS: 00010286 CPU: 0
Jul 19 07:22:35 lantea kernel: [46965.812019] EIP is at unix_destruct_scm+0x53/0x90
Jul 19 07:22:35 lantea kernel: [46965.812019] EAX: 00000000 EBX: f71740c0 ECX: ffffffff EDX: 00000000
Jul 19 07:22:35 lantea kernel: [46965.812019] ESI: e0a828c8 EDI: f71740c0 EBP: e0a89adc ESP: e0a89abc
Jul 19 07:22:35 lantea kernel: [46965.812019] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jul 19 07:22:35 lantea kernel: [46965.812019] CR0: 80050033 CR2: b7606fb8 CR3: 01968000 CR4: 000007e0
Jul 19 07:22:35 lantea kernel: [46965.812019] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul 19 07:22:35 lantea kernel: [46965.812019] DR6: ffff0ff0 DR7: 00000400
Jul 19 07:22:35 lantea kernel: [46965.812019] Process initctl (pid: 11839, ti=e0a88000 task=f34e6580 task.ti=e0a88000)
Jul 19 07:22:35 lantea kernel: [46965.812019] Stack:
Jul 19 07:22:35 lantea kernel: [46965.812019] 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000 f71740c0
Jul 19 07:22:35 lantea kernel: [46965.812019] e0a89ae8 c14c45d3 f71740c0 e0a89af4 c14c43d0 00000001 e0a89b0c c14c4486
Jul 19 07:22:35 lantea kernel: [46965.812019] c154bc6f 00000001 e0a828c8 f71740c0 e0a89b38 c154bc6f 00000000 e0a80ae0
Jul 19 07:22:35 lantea kernel: [46965.812019] Call Trace:
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c45d3>] skb_release_head_state+0x43/0xc0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c43d0>] __kfree_skb+0x10/0x90
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c4486>] kfree_skb+0x36/0x80
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154bc6f>] ? unix_release_sock+0x13f/0x240
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154bc6f>] unix_release_sock+0x13f/0x240
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154bd8f>] unix_release+0x1f/0x30
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bc4e0>] sock_release+0x20/0x70
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bc547>] sock_close+0x17/0x30
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c114ff76>] fput+0xe6/0x210
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c114c674>] filp_close+0x54/0x80
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c1049ae5>] put_files_struct+0x75/0xc0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c1049bd6>] exit_files+0x46/0x60
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c104a01a>] do_exit+0x14a/0x7a0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c104473f>] ? print_oops_end_marker+0x2f/0x40
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c15c09dd>] oops_end+0x8d/0xd0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c10138d4>] die+0x54/0x80
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c15c05d2>] do_general_protection+0x102/0x180
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c1075f10>] ? default_wake_function+0x10/0x20
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c115f822>] ? pollwake+0x62/0x70
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c15c04d0>] ? do_trap+0xd0/0xd0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c15c0233>] error_code+0x67/0x6c
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154d2fb>] ? unix_stream_recvmsg+0x4eb/0x680
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c1289713>] ? aa_revalidate_sk+0x83/0x90
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bd54c>] sock_recvmsg+0xcc/0x100
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c12c97b1>] ? _copy_from_user+0x41/0x60
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c79cf>] ? verify_iovec+0x3f/0xb0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bd480>] ? sock_sendmsg_nosec+0xf0/0xf0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bcc70>] __sys_recvmsg+0x110/0x1d0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bd480>] ? sock_sendmsg_nosec+0xf0/0xf0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c107d7cf>] ? trigger_load_balance+0x4f/0x1c0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c10786c5>] ? __dequeue_entity+0x25/0x40
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c101004c>] ? __switch_to+0xbc/0x260
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c1072051>] ? finish_task_switch+0x41/0xc0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14be93b>] sys_recvmsg+0x3b/0x60
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bee3b>] sys_socketcall+0x28b/0x2d0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c15c675f>] sysenter_do_call+0x12/0x28
Jul 19 07:22:35 lantea kernel: [46965.812019] Code: e4 8b 53 20 89 45 e0 85 d2 74 0d 8d 45 e8 89 da e8 d3 ed ff ff 8b 45 e0 e8 bb 5d b1 ff 8b 4d e4 c7 45 e0 00 00 00 00 85 c9 74 11 <f0> ff 09 0f 94 c0 84 c0 74 07 89 c8 e8 0c fc b1 ff 8b 45 e8 c7
Jul 19 07:22:35 lantea kernel: [46965.812019] EIP: [<c154bdf3>] unix_destruct_scm+0x53/0x90 SS:ESP 0068:e0a89abc
Jul 19 07:22:35 lantea kernel: [46966.536717] ---[ end trace c212400a9b13d701 ]---
Jul 19 07:22:35 lantea kernel: [46966.553450] Fixing recursive fault but reboot is needed!

Since then, "initctl" in the p1 container has been stuck in I/O wait (for 5 hours), only way to unblock it will be a reboot.

To reproduce I use two basic containers:
lxc-create -n p1 -t ubuntu
lxc-create -n p2 -t ubuntu

then used the following script as "/etc/init/test.conf" in both of them:
start on stopped rc RUNLEVEL=[2345]
exec reboot

Finally, I started them both in a screen session:
lxc-start -n p1
lxc-start -n p2

And let that run for the night, this morning I noticed that one of the two was stuck, because of that kernel oops.

On the same system I've been able to reproduce exactly the issue described by Iain, though not at the moment... I'll try with the default scheduler, see if that helps reproducing Iain's symptoms.