Comment 7 for bug 1758037

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Based on "virNWFilterSnoopDHCPOpen:1130 : internal error: pcap_setfilter: can't remove kernel filter: Bad file descriptor"

Tracking some details in virNWFilterSnoopDHCPOpen with gdb and debug symbols.
pcap_setfilter being the failing call.

The stack to the breaking call looks pretty much as expected:
#0 0x00007fae3c457850 in pcap_setfilter () from /usr/lib/x86_64-linux-gnu/libpcap.so.0.8
#1 0x00007fae3c693e31 in virNWFilterSnoopDHCPOpen (ifname=0x7fae2c01f010 "vnet0", mac=mac@entry=0x7fae2c004fcf, filter=<optimized out>,
    dir=PCAP_D_IN) at ../../../src/nwfilter/nwfilter_dhcpsnoop.c:1128
#2 0x00007fae3c6954ea in virNWFilterDHCPSnoopThread (req0=0x7fae2c004f70) at ../../../src/nwfilter/nwfilter_dhcpsnoop.c:1387
#3 0x00007fae59fe5ad2 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206
#4 0x00007fae5968f6db in start_thread (arg=0x7fae0dd02700) at pthread_create.c:463
#5 0x00007fae593b888f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The arguments seem reasonable:
virNWFilterSnoopDHCPOpen (ifname=0x7fae2c01f010 "vnet0", mac=mac@entry=0x7fae2c004fcf,
    filter=0x7fae3c6aaaea "dst port 67 and src port 68", dir=PCAP_D_IN)

pcap_compile does not complain and make a bpf program as it seems.
fp => {bf_len = 24, bf_insns = 0x1a042e0b60b57e00}
pcap_compile(handle, &fp, ext_filter, 1, PCAP_NETMASK_UNKNOWN)
fp => {bf_len = 28, bf_insns = 0x7fadf0254760}

ext_filter got generated as "dst port 67 and src port 68 and ether src 52:54:00:e7:04:a2"

The pcap handle looks at least not totally broken either:
p *p
$1 = {read_op = 0x7fae3c44d8c0 <pcap_read_linux>, next_packet_op = 0x0, fd = -1, selectable_fd = -1, bufsize = 576, buffer = 0x0, bp = 0x0,
  cc = 0, break_loop = 0, priv = 0x7fadf4020810, swapped = 0, rfile = 0x0, fddipad = 0, next = 0x0, version_major = 0, version_minor = 0,
  snapshot = 576, linktype = 1, linktype_ext = 0, tzoff = 0, offset = 6, activated = 1, oldstyle = 0, opt = {device = 0x7fadf40905a0 "vnet0",
    timeout = 0, buffer_size = 131072, promisc = 0, rfmon = 0, immediate = 0, tstamp_type = -1, tstamp_precision = 0}, pkt = 0x0,
  direction = PCAP_D_INOUT, bpf_codegen_flags = 1, fcode = {bf_len = 0, bf_insns = 0x0},
  errbuf = "can't mmap rx ring: Invalid argument", '\000' <repeats 220 times>, dlt_count = 0, dlt_list = 0x0, tstamp_type_count = 0,
  tstamp_type_list = 0x0, tstamp_precision_count = 0, tstamp_precision_list = 0x0, pcap_header = {ts = {tv_sec = 0, tv_usec = 0}, caplen = 0,
    len = 0}, activate_op = 0x7fae3c450e80 <pcap_activate_linux>, can_set_rfmon_op = 0x7fae3c44fd20 <pcap_can_set_rfmon_linux>,
  inject_op = 0x7fae3c450b40 <pcap_inject_linux>, setfilter_op = 0x7fae3c44e790 <pcap_setfilter_linux>,
  setdirection_op = 0x7fae3c44f2b0 <pcap_setdirection_linux>, set_datalink_op = 0x7fae3c44d390 <pcap_set_datalink_linux>,
  getnonblock_op = 0x7fae3c457730 <pcap_getnonblock_fd>, setnonblock_op = 0x7fae3c4577a0 <pcap_setnonblock_fd>,
  stats_op = 0x7fae3c44e860 <pcap_stats_linux>, oneshot_callback = 0x7fae3c456430 <pcap_oneshot>,
  cleanup_op = 0x7fae3c44f3c0 <pcap_cleanup_linux>}

pcap_setfilter fixes up the program format and sets can_filter_in_kernel which should be the good path.

Then it fails to install the kernel filter at
   set_kernel_filter(handle, &fcode)
With errno being 9
But it has integrated fallback to userland filter.
handlep->filter_in_userland is 1 so it tried userland.

Here we come closer to the issue.
It now runs:
        /*
         * If we're not using the kernel filter, get rid of any kernel
         * filter that might've been there before, e.g. because the
         * previous filter could work in the kernel, or because some other
         * code attached a filter to the socket by some means other than
         * calling "pcap_setfilter()". Otherwise, the kernel filter may
         * filter out packets that would pass the new userland filter.
         */
        if (handlep->filter_in_userland) {
                if (reset_kernel_filter(handle) == -1) {
                        pcap_snprintf(handle->errbuf, PCAP_ERRBUF_SIZE,
                            "can't remove kernel filter: %s",
                            pcap_strerror(errno));
                        err = -2; /* fatal error */
                }
        }

This also is matching our error message.
Now we know that we just created the handle - so there should be no kernel filter on it.
So do we just fail to remove something that isn't there.
The comment says "might" so I'd expect it to tolerate "no filter there" to be ok.

reset_kernel_filter is just a static wrapper to setsockopt.
  => setsockopt(handle->fd, SOL_SOCKET, SO_DETACH_FILTER, &dummy, sizeof(dummy));

And yes here it seems handle->fd is -1
It is -1 all the way up to libvirts virNWFilterSnoopDHCPOpen

Also the reason the setting of the kernel filter failed is the bad fd btw.
It ends in pcap set_kernel_filter essentially calling:
  setsockopt(handle->fd, SOL_SOCKET, SO_ATTACH_FILTER, &total_fcode, sizeof(total_fcode))

Same bad fd here.

We might need to trace the create instead
  pcap_create (device=device@entry=0x7fae480296e0 "vnet0", errbuf=errbuf@entry=0x7fae0ccff9f0 "\320\373\317\f\256\177")

It is none of the special interfaces, but uses pcap_create_interface(device_str, errbuf);
The initial fd of -1 is then from
  pcap_alloc_pcap_t
in this stack
#0 pcap_create_common (ebuf=ebuf@entry=0x7fae0ccff9f0 "\320\373\317\f\256\177", size=size@entry=152) at ./pcap.c:561
#1 0x00007fae3c452b9f in pcap_create_interface (device=device@entry=0x7fadf4066b30 "vnet0",
    ebuf=ebuf@entry=0x7fae0ccff9f0 "\320\373\317\f\256\177") at ./pcap-linux.c:462
#2 0x00007fae3c45746b in pcap_create (device=device@entry=0x7fae480296e0 "vnet0", errbuf=errbuf@entry=0x7fae0ccff9f0 "\320\373\317\f\256\177")
    at ./pcap.c:454
#3 0x00007fae3c693db8 in virNWFilterSnoopDHCPOpen (ifname=0x7fae480296e0 "vnet0", mac=mac@entry=0x7fae4802979f,
    filter=0x7fae3c6aaaea "dst port 67 and src port 68", dir=PCAP_D_IN) at ../../../src/nwfilter/nwfilter_dhcpsnoop.c:1105
#4 0x00007fae3c6954ea in virNWFilterDHCPSnoopThread (req0=0x7fae48029740) at ../../../src/nwfilter/nwfilter_dhcpsnoop.c:1387
#5 0x00007fae59fe5ad2 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206
#6 0x00007fae5968f6db in start_thread (arg=0x7fae0cd00700) at pthread_create.c:463
#7 0x00007fae593b888f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Then follows a lot of timestamping.
What would actually set that fd usually - especial with the POV of the libvirt code?
We only have
- pcap_set_snaplen
- pcap_set_buffer_size
- pcap_activate
- pcap_compile

If any then of these activate maybe.
Vice versa in the code confirms activate e.g. pcap_activate_bpf sets an fd it gets via bpf_open.
   fd = bpf_open(p->errbuf);
   [...]
   p->fd = fd;

So which path does out activate take? From gdb we know:
  handle->activate_op = pcap_activate_linux;

Later in pcap_activate_linux the fd migth already be used, but it is not referred directly.
So it has to be a call from pcap_activate_linux with handle as arg to be modified.
In there is activate_new which sets up a socket using PF_PACKET and would set handle->fd if successful.
In my case activate_new fails then it tries to fall back to activate_mmap.

create_ring and prepare_tpacket_socket would set these up, but the latter fails.
So eventually pcap_activate_linux exits via the fail path.
It returns 1 according to gdb.

But that in libpcap accounts as "success" so p->activated = 1; gets set while nothing ever set the fd correctly.

Anything later is a follow on error.
Libvirt relies on the rc < 0 indicating an error but that is not set.

For now this almost seems to be one of the following:
a) a failure how libvirt requests the setup of the handle
b) a bug in libpcap that in this case it should actually return < 0 on pcap_activate

I quickly checked that without
  <parameter name='CTRL_IP_LEARNING' value='dhcp'/>
being set what seems (at a glimpse) to be the same call works, sets an fd and then all is good.

So the next step whoever works on it would be to check on *activate* the differences depending on that arg (arguments from libvirt and such).