cloud-init

Code review comment for ~tamilmani1989/cloud-init:azure_networking

Revision history for this message

Tamilmani Manoharan (tamilmani1989) wrote on 2018-11-14:

> > > Thanks for working on the refactor. This looks a lot better and it's much
> > > easier to read and write unittests.
> > >
> > > I've quite a few inline comments, please look through those. The biggest
> > > query is around how do we know when the vnet migration has started to
> > prevent
> > > us from getting stuck forever if it's already started before we attempt to
> > > listen for events.
> >
> > Thanks for reviewing it. Unless cloud-init report ready to azure fabric,
> vnet
> > switch won't happen. Its way of informing the top level services that vm is
> > ready for vnet switch. So I have created and bound netlink socket before
> that
> > call so it won't miss any event that happens after that.
>
> OK, so, the sequence then is:
>
> 1) crawl initial metadata and detect if pre-provision VM, if so write a
> REPROVISION marker file
> 2) in _poll_imds create nl socket
> 3) write a REPORT_READY marker file and call _report_ready()
> 4) _report_ready() will invoke the shim
> 5) wait_for_media_disconnect()
>
> The window is now between (2) and (5); and the question is what does the
> netlink socket do with the incoming messages that we won't process until (5)?
>
> From reading the upstream docs on netlink, the default netlink socket size is
> 32kb, and while the typical messages may be small, the payload for messages is
> arbitrary. I know that DELLINK and NEWLINK are small but other messages may
> come through and if we're not processing then the socket could become full.
>
> I suspect that in practice there isn't a lot of netlink traffic on the node
> with only a single interface up at this point but I'm wondering if we
> shouldn't employ something that allows us to:
>
> a) spawn a thread to start doing the "Wait" which would start reading messages
> as they come in on the socket *before* we send the READY flag to the fabric
> b) after sending the ready flag, we would join() the thread which would
> require the main program to wait until the thread exited (ie, we detected the
> media change event).
>
> Thoughts?

I don't think we receive that many netlink messages since this happens before systemd-networkd starts. we manually setup an ephemeral network(one interface) to send report ready. Also netlink code listen for only RTMGRP_LINK(network interface create/delete/up/down events) and not for any ipaddress and route changes. There will be only one interface at that time and it reads only above events for that interface

> > > Thanks for working on the refactor.  This looks a lot better and it's much
> > > easier to read and write unittests.
> > >
> > > I've quite a few inline comments, please look through those.  The biggest
> > > query is around how do we know when the vnet migration has started to
> > prevent
> > > us from getting stuck forever if it's already started before we attempt to
> > > listen for events.
> >
> > Thanks for reviewing it. Unless cloud-init report ready to azure fabric,
> vnet
> > switch won't happen. Its way of informing the top level services that vm is
> > ready for vnet switch. So I have created and bound netlink socket before
> that
> > call so it won't miss any event that happens after that.
> 
> OK, so, the sequence then is:
> 
> 1) crawl initial metadata and detect if pre-provision VM, if so write a
> REPROVISION marker file
> 2) in _poll_imds create nl socket
> 3) write a REPORT_READY marker file and call _report_ready()
> 4) _report_ready() will invoke the shim
> 5) wait_for_media_disconnect()
> 
> The window is now between (2) and (5); and the question is what does the
> netlink socket do with the incoming messages that we won't process until (5)?
> 
> From reading the upstream docs on netlink, the default netlink socket size is
> 32kb, and while the typical messages may be small, the payload for messages is
> arbitrary.  I know that DELLINK and NEWLINK are small but other messages may
> come through and if we're not processing then the socket could become full.
> 
> I suspect that in practice there isn't a lot of netlink traffic on the node
> with only a single interface up at this point but I'm wondering if we
> shouldn't employ something that allows us to:
> 
> a) spawn a thread to start doing the "Wait" which would start reading messages
> as they come in on the socket *before* we send the READY flag to the fabric
> b) after sending the ready flag, we would join() the thread which would
> require the main program to wait until the thread exited (ie, we detected the
> media change event).
> 
> Thoughts?

« Back to merge proposal