On Mon, Sep 27, 2010 at 2:12 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
>> Maybe I'm missinterpreting what you mean by "process".
>>
>> I'd be surprised is "all client software" tried each address returned.
>> I would expect most client software to simply use an address in bind
>> or connect and get watever address is prefered for the type of socket
>> they created.
>
> With IPv6, it is now common and recommended that each host gets at least
> two addresses: one for IPv4 and one for IPv6 (dual stack).
Makes sense.
> In addition,
> some systems have multiple IPv6 addresses (e.g. assigned by various
> tunneling systems).
Hm. Sounds complicated. :)
> It's not universally clear that any of these
> addresses is better than any other: it depends, in particular, on the
> client connectivity (IPv6 available or not, IPv4 available or not).
Hm.
> The new resolver API recommended for use it getaddrinfo(3). It gives you
> a list of addrinfo(family, socktype, proto, canonname, addr), mixing
> IPv4 and IPv6 addresses. Today, systems have a policy (e.g.
> /etc/gai.conf) to determine in which order to return them (e.g. IPv6
> global first, then IPv4, then IPv6 tunnel). Applications must not
> normally chose for themselves whether they want IPv4 or IPv6
> connections, but should let the system decide.
>
> In turn, you can't actually create the socket until getaddrinfo
> returns, as it will tell you what socket family to use. Hence
> the typically connect code looks like this:
>
> for family,address in getaddrinfo():
> try:
> s = socket.socket(family)
> s.connect(address, port)
> except socket.error:
> continue
> if not s:
> raise "None of these addresses worked"
I'm hoping there's a missing break or return in there somewhere. :)
> The IPv6 server code is entirely different: typically, you either
> bind to all addresses (v4 and v6) of a system (preferably using
> dual-stack sockets using a wildcard address), or you bind to a (list
> of) specifically configured addresses; no DNS involved.
I wonder why the asymmetry. Or, at least, why it wouldn't at least
make sense to at least listen for both v4 and v6 for the same host name.
> When binding
> to a list of addresses, you need to create multiple server sockets.
OK.
> IIUC, ZEO currently doesn't support that
Right.
> (and IMO doesn't need to).
Not sure wrt IPv6.
I have wanted to be able to listen on both a TCP socket and a
unix-domain socket. There's nothing hard about this for ZEO per se,
but figuring how to spell this in ZConfig schema has been just enough
of a pain to make it not happen. :)
>>> With multiple addresses coming out of DNS, it can speed-up connection
>>> establishment when they are tried simultaneously.
>>
>> I'm not sure what you mean here. But yeah, if you're going to do
>> multiple things, doing them in parallel is generally a good strategy.
>
> See above: the for loop can run in parallel.
>
>>> In addition, it was easier to implement that way. If you think it's
>>> undesirable, I can try to come up with a way to do them
>>> sequentially.
>>
>> I'm not clear on what was easier to implement.
>>
>> I'll try to guess. :)
>>
>> I see a number of possible issues that this might try to solve.
>>
>> 1. Given a host name, you have to decide whether to bind or connect
>> using a v4 or v6 socket.
>
> No, you would need to iterate over all getaddrinfo results either
> way.
Because the first result might not work.
>> 2. If multiple machines have the same host name, maybe you want to
>> treat them as a high-availability pool and connect to whichever one
>> is providing the desired service.
>
> That is now possible, but only a side effect. But yes, it was easier
> to treat a list of addresses returned from DNS as a high-availability
> pool - easier than modifying ConnectWrapper to execute the loop.
Oh, please don't remind me of ConnectionWrapper. It hurts. ;)
>> Is there any other reason why there would be multiple v4 addresses
>> or multiple v6 addresses for the same host name. (Leaving aside a
>> host having obe v4 address and one v6 address)?
>
> In DNS, you typically wouldn't have multiple IPv6 addresses for the
> same name on the same machine. There might be boundary cases
> (e.g. mDNS might return an additional address, different from the
> one found in DNS), or multiple addresses may get returned during
> a reconfiguration period.
>
>> Are these the problems you're trying to solve?
>
> The problem is that ConnectWrapper currently creates the socket
> in the constructor, and then has a connect_procedure method to connect.
> If a ConnectWrapper is representing a DNS name (with multiple
> addresses), then it would somehow need to create multiple sockets,
> over time.
>
>> For 1, it would be just as easy to grab the first result from
>> getaddrinfo.
>
> That is likely to fail. On many systems, that will be the IPv6
> address, if the server has one, but then will fail on the
> client if it doesn't have IPv6 connectivity. Try connecting to
> www.python.org with that approach. You absolutely
> have to try multiple of them, until you get a connect.
OK.
>> BTW, the patch for the server always uses a v4 socket if a host name
>> is given, but perhaps I don't understand how v6 interacts with bind.
>
> See above: to bind to multiple addresses, you would have to create
> multiple sockets, which is IMO unnecessary. So binding to a name
> for the server is supported for backwards compatibility, and continues
> to mean IPv4. IMO, in the long run, people should stop using host
> names for server-bind, and use wildcard addresses instead (or numeric
> addresses if they really want to bind to a single interface only).
>
>> 1. It becomes ambiguous whether a v4 or v6 socket will be used.
>
> If that's a problem, we need to find a way to do the connects
> sequentially. However, I couldn't figure out how to integrate
> this with the current code.
Fair enough.
A complication implicit here is that there are 2 levels of
connectivity. There is the basic socket connect and the
application-level connect. You're arguing that a host name may map to
multiple addresses and the prefered address might not be connectable
whereas addresses further down the list might be.
There's also a application-level connection where you connect to
multiple servers to see which one provides the desired level of
service.
I don't mind trying multiple paths (addresses) to do the low-level
connections, but I'm worried about making multiple connections to the
same server.
>> Perhaps this is moot because I don't think the patch let's a server
>> listen on both a v4 and a v6 address at the same time.
>
> Actually, it does. See the _has_dualstack logic; the IPV6_V6ONLY
> option allows to accept v4 connections over v6 sockets (when
> disabled).
Oh right, when wildcard addresses are used.
Thanks for bearing with me on this. :) And thanks for the branch.
I'm good with merging the branch. Do you want to do this (using some
LP tool)? Or do you want me to?
On Mon, Sep 27, 2010 at 2:12 PM, "Martin v. Löwis" <martin@ v.loewis. de> wrote:
>> Maybe I'm missinterpreting what you mean by "process".
>>
>> I'd be surprised is "all client software" tried each address returned.
>> I would expect most client software to simply use an address in bind
>> or connect and get watever address is prefered for the type of socket
>> they created.
>
> With IPv6, it is now common and recommended that each host gets at least
> two addresses: one for IPv4 and one for IPv6 (dual stack).
Makes sense.
> In addition,
> some systems have multiple IPv6 addresses (e.g. assigned by various
> tunneling systems).
Hm. Sounds complicated. :)
> It's not universally clear that any of these
> addresses is better than any other: it depends, in particular, on the
> client connectivity (IPv6 available or not, IPv4 available or not).
Hm.
> The new resolver API recommended for use it getaddrinfo(3). It gives you socket( family)
> a list of addrinfo(family, socktype, proto, canonname, addr), mixing
> IPv4 and IPv6 addresses. Today, systems have a policy (e.g.
> /etc/gai.conf) to determine in which order to return them (e.g. IPv6
> global first, then IPv4, then IPv6 tunnel). Applications must not
> normally chose for themselves whether they want IPv4 or IPv6
> connections, but should let the system decide.
>
> In turn, you can't actually create the socket until getaddrinfo
> returns, as it will tell you what socket family to use. Hence
> the typically connect code looks like this:
>
> for family,address in getaddrinfo():
> try:
> s = socket.
> s.connect(address, port)
> except socket.error:
> continue
> if not s:
> raise "None of these addresses worked"
I'm hoping there's a missing break or return in there somewhere. :)
> The IPv6 server code is entirely different: typically, you either
> bind to all addresses (v4 and v6) of a system (preferably using
> dual-stack sockets using a wildcard address), or you bind to a (list
> of) specifically configured addresses; no DNS involved.
I wonder why the asymmetry. Or, at least, why it wouldn't at least
make sense to at least listen for both v4 and v6 for the same host name.
> When binding
> to a list of addresses, you need to create multiple server sockets.
OK.
> IIUC, ZEO currently doesn't support that
Right.
> (and IMO doesn't need to).
Not sure wrt IPv6.
I have wanted to be able to listen on both a TCP socket and a
unix-domain socket. There's nothing hard about this for ZEO per se,
but figuring how to spell this in ZConfig schema has been just enough
of a pain to make it not happen. :)
>>> With multiple addresses coming out of DNS, it can speed-up connection
>>> establishment when they are tried simultaneously.
>>
>> I'm not sure what you mean here. But yeah, if you're going to do
>> multiple things, doing them in parallel is generally a good strategy.
>
> See above: the for loop can run in parallel.
>
>>> In addition, it was easier to implement that way. If you think it's
>>> undesirable, I can try to come up with a way to do them
>>> sequentially.
>>
>> I'm not clear on what was easier to implement.
>>
>> I'll try to guess. :)
>>
>> I see a number of possible issues that this might try to solve.
>>
>> 1. Given a host name, you have to decide whether to bind or connect
>> using a v4 or v6 socket.
>
> No, you would need to iterate over all getaddrinfo results either
> way.
Because the first result might not work.
>> 2. If multiple machines have the same host name, maybe you want to
>> treat them as a high-availability pool and connect to whichever one
>> is providing the desired service.
>
> That is now possible, but only a side effect. But yes, it was easier
> to treat a list of addresses returned from DNS as a high-availability
> pool - easier than modifying ConnectWrapper to execute the loop.
Oh, please don't remind me of ConnectionWrapper. It hurts. ;)
>> Is there any other reason why there would be multiple v4 addresses
>> or multiple v6 addresses for the same host name. (Leaving aside a
>> host having obe v4 address and one v6 address)?
>
> In DNS, you typically wouldn't have multiple IPv6 addresses for the
> same name on the same machine. There might be boundary cases
> (e.g. mDNS might return an additional address, different from the
> one found in DNS), or multiple addresses may get returned during
> a reconfiguration period.
>
>> Are these the problems you're trying to solve?
>
> The problem is that ConnectWrapper currently creates the socket
> in the constructor, and then has a connect_procedure method to connect.
> If a ConnectWrapper is representing a DNS name (with multiple
> addresses), then it would somehow need to create multiple sockets,
> over time.
>
>> For 1, it would be just as easy to grab the first result from
>> getaddrinfo.
>
> That is likely to fail. On many systems, that will be the IPv6
> address, if the server has one, but then will fail on the
> client if it doesn't have IPv6 connectivity. Try connecting to
> www.python.org with that approach. You absolutely
> have to try multiple of them, until you get a connect.
OK.
>> BTW, the patch for the server always uses a v4 socket if a host name
>> is given, but perhaps I don't understand how v6 interacts with bind.
>
> See above: to bind to multiple addresses, you would have to create
> multiple sockets, which is IMO unnecessary. So binding to a name
> for the server is supported for backwards compatibility, and continues
> to mean IPv4. IMO, in the long run, people should stop using host
> names for server-bind, and use wildcard addresses instead (or numeric
> addresses if they really want to bind to a single interface only).
>
>> 1. It becomes ambiguous whether a v4 or v6 socket will be used.
>
> If that's a problem, we need to find a way to do the connects
> sequentially. However, I couldn't figure out how to integrate
> this with the current code.
Fair enough.
A complication implicit here is that there are 2 levels of
connectivity. There is the basic socket connect and the
application-level connect. You're arguing that a host name may map to
multiple addresses and the prefered address might not be connectable
whereas addresses further down the list might be.
There's also a application-level connection where you connect to
multiple servers to see which one provides the desired level of
service.
I don't mind trying multiple paths (addresses) to do the low-level
connections, but I'm worried about making multiple connections to the
same server.
>> Perhaps this is moot because I don't think the patch let's a server
>> listen on both a v4 and a v6 address at the same time.
>
> Actually, it does. See the _has_dualstack logic; the IPV6_V6ONLY
> option allows to accept v4 connections over v6 sockets (when
> disabled).
Oh right, when wildcard addresses are used.
Thanks for bearing with me on this. :) And thanks for the branch.
I'm good with merging the branch. Do you want to do this (using some
LP tool)? Or do you want me to?
Jim
--
Jim Fulton