Installation blocks when the machine is behind a proxy server

Bug #1766542 reported by Robert Liu
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OEM Priority Project
Fix Released
High
Unassigned
apt (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
ubiquity (Ubuntu)
Fix Released
High
Unassigned
Bionic
Invalid
High
Unassigned

Bug Description

[Impact]
APT takes a long time to notice when certain connections time out

[Test case]
* Change the default route to not route stuff successfully (wrong gateway,
  for example)
* Make your sources.list look like this:

deb http://archive.ubuntu.com/ubuntu bionic main restricted
deb http://archive.ubuntu.com/ubuntu bionic-updates main restricted

* Run apt update

You should see that it fails with a long verbose error message for the first entry, with all possible IP addresses listed in it; while for the second one it fails with just "Unable to connect to archive.ubuntu.com:http:" as it recognizes it has been blacklisted.

[Regression potential]
APT will not attempt to retry the host given that it could not connect to it for previous entries. If your network recovered in the meantime, it might update less than previously.

[Original bug report]
When the machine is behind a proxy server, the installation will block for a while (several minutes) to retrieve the package lists. The timeouts are too long and makes user feels the machine may have some problems.

The symptom is similar with bug #14599, but it seems the apt-setup module was rewritten.

Another method to trigger this issue is to make the machine cannot access to the Internet, for instance: a wrong gateway.

Image: 16.04

Revision history for this message
Yuan-Chen Cheng (ycheng-twn) wrote :

need test results on 18.04.

Changed in oem-priority:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Robert Liu (robertliu) wrote :

Tried the 18.04 daily build (http://cdimage.ubuntu.com/ubuntu/daily-live/20180424/bionic-desktop-amd64.iso), and it has the same issue.

Changed in ubiquity (Ubuntu):
importance: Undecided → High
tags: added: rls-bb-incoming
Revision history for this message
Steve Langasek (vorlon) wrote :

Robert, can you please provide complete steps to reproduce this bug? My understanding from discussion elsewhere that this is specific to oem-config mode; is that correct?

If your network is incorrectly configured and you have a route to the Internet but traffic is dropped, it is expected that an initial connection to the apt mirror will have to hit a tcp connection timeout (by default, ~128 seconds); so that might qualify as "several minutes". What behavior are you seeing that is "too long", and how long do you expect this to take?

Changed in ubiquity (Ubuntu):
status: New → Incomplete
Revision history for this message
Julian Andres Klode (juliank) wrote :

I've been considering lowering the timeout in apt from 120s to something like 10-30s for a year or two, but there's been some concern about high-latency connections. I personally do not think that 120s is a sensible timeout for one round trip.

There are also several options to prevent this issue on the network side:

- do not provide DNS servers / resolve internet names
- do not drop packets, but reject them
- point a SRV record in your DNS server to a working HTTP host (either serving the archive, or 404, as long as it connects, it should be fine)

We also had the same issue on some IPv6+IPv4 systems where the IPv6 is not reachable, but that's fixed in cosmic with happy eyeballs falling back to the next host within milliseconds (trying more and more hosts in parallel until one works).

Revision history for this message
Julian Andres Klode (juliank) wrote :

The default time out now is:

- 120s * #ip-addresses before bionic
- 120s + 250ms * #ip-addresses for bionic

That's a substantial improvement. I still think we should lower the timeout from 120s to 20s for cosmic, bringing this down to 20s + 250ms * #ipaddr (if we estimate up to 4 ip addresses, it will succeed or fail within 21s).

I'd be open to backporting happy-eyeballs to xenial too, once it has spent some more time in bionic. The changes are local to a single file (methods/connect.cc), but they are quite big relative to that file, so we should be careful.

Revision history for this message
Julian Andres Klode (juliank) wrote :

OK, so archive.ubuntu.com has 4 AAAA and 4 A entries, meaning that

16.04 waits 16 minutes (8 * 2 minutes)
18.04 waits 2 minutes, 2 seconds (2 minutes + 8 * 250 milliseconds)

Now we can shorten that further to one of the following:

* 12 seconds (10 + 8 * 250 ms)
* 22 seconds (20 + 8 * 250 ms)
* 32 seconds (30 + 8 * 250 ms)

30 seconds seems like a good conservative choice with reasonable behavior.

Revision history for this message
Robert Liu (robertliu) wrote :

Here is how I setup the environment:
1. prepare a broadband gateway. This time I use LEDE with VirtualBox. I can upload the VM image if necessary.
2. add a firewall rule on the gateway: all http/https traffic from LAN port will be redirected to a IP which is not used by any machine.
3. install Ubuntu and check the behavior.

I can reproduce the issue with this environment. Even though, I'm not sure if this is exactly the same with the customer's environment.

@Steve, Yes, We are using oem-config mode. With the normal mode, because it uses less archives and apt operations, the blocking time is shorter. Please see my following comments, I'll provide my results of the official 16.04 and 18.04 image.

Revision history for this message
Robert Liu (robertliu) wrote :

Image: ubuntu-16.04.4-desktop-amd64.iso

Significant blockings -
"Retrieving file 1 of 93", 30 seconds
"Retrieving file 1 of 31", 3 minutes

Revision history for this message
Robert Liu (robertliu) wrote :

Image: ubuntu-18.04-desktop-amd64.iso

Significant blockings -
"Retrieving file 1 of 3", 90 seconds
"Retrieving file 1 of 1", 32 seconds

The behavior is different from 16.04.

Revision history for this message
Steve Langasek (vorlon) wrote :

> Significant blockings -
> "Retrieving file 1 of 3", 90 seconds
> "Retrieving file 1 of 1", 32 seconds

Does this mean that in 18.04, the full delay you see is 122 seconds?

Is that considered acceptable, or not?

Revision history for this message
Robert Liu (robertliu) wrote :

Hi @Steve,
IMO, it would be great that the period is less than 30 seconds in total.
In this case, the updating operation is always failed. Users should know that the Internet is not available before setting a proxy server. They may expect to have a minimum timeout or pre-configure the proxy server before installation.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I don't see how we can achieve an overall timeout. We only have per connect() timeouts available, and we run multiple apt processes / fetches. We can limit the individual connect() timeout to 30s, but overall? I think we do one attempt for archive.ubuntu.com / mirror and a another for security, so we should be looking at a one minute timeout overall.

In fact, apt-setup already sets a timeout of 30 seconds for the updates. Now about the 16.04 log: I'm not sure about networking, is that an IPv6-only network? It times out for security.ubuntu.com after 3 minutes, which is expected for 6 addresses (and security.u.c resolves to 6 IPv4 and 6 IPv6 addresses).

With 18.04 apt, this should time out in roughly 32 seconds (6 , which seems to match the second 18.04 result. Not sure why it times out in 90 seconds of file 1 of 3, though. Do you have logs?

Revision history for this message
Robert Liu (robertliu) wrote :

@Julian,

Here is the archive of logs. Sorry I forgot to attach it before.

Revision history for this message
Julian Andres Klode (juliank) wrote :

It seems we are experiencing a bug in apt in bionic WRT the 90s timeout. With happy eyeballs, we added a new place where stuff can timeout; and we did not mark these IP addresses as failed, hence they were always retried, and thus you saw it attempting to connect 3 times to tw.archive.ubuntu.com.

With that fixed, we should be down to 30s instead of 90s.

That said, I cannot reproduce that as connect() fails for me with "No Route to Host" after 3 seconds, so I might have missed something.

Changed in apt (Ubuntu):
status: New → Triaged
Revision history for this message
Julian Andres Klode (juliank) wrote :

I don't think we can realistically go lower than 2x30s - I think the intention is to add the mirror and security separately and comment them out if they don't work or something, so they need to be in seperate apt runs. Unless, of course, we write a tool in python or something that does update there and inspects failures and checks which hosts errored out; then we can do it in one.

Revision history for this message
Julian Andres Klode (juliank) wrote :

For the apt side, that's the first commit in

https://salsa.debian.org/apt-team/apt/merge_requests/18/

I'll also upload that to bionic eventually; but xenial would also need the happy eyeballs changes, which were a bit large (although, isolated).

tags: added: id-5af9ea356db8cb2d4eb3d4e7
Changed in apt (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apt - 1.7.0~alpha0ubuntu2

---------------
apt (1.7.0~alpha0ubuntu2) cosmic; urgency=medium

  [ David Kalnischkies ]
  * Add boilerplate plural form to po/apt-all.pot
  * don't try SRV requests based on IP addresses
  * use 127.0.0.1 instead of localhost as default Tor proxy
  * Extend apt build-dep pkg/release to switch dep as needed
  * Support release selector for volatile files as well
  * Start pkg records for deb files with dpkg output
  * Deprectate buggy/incorrect Rls/PkgFile::IsOk methods
  * Support --with-source in show & search commands
  * Support local files as arguments in show command (Closes: 883206)
  * Drop alternative URIs we got a hash-based fail from
  * Handle by-hash URI construction more centrally
  * Don't force the same mirror for by-hash URIs
  * Reword error for timed out read/write on SOCKS proxy (Closes: #898886)
  * Don't show acquire warning for "hidden" components (Closes: #879591)
  * Use a steady clock source for progress reporting
  * Use steady clock source for bandwidth limitation

  [ Filipe Brandenburger ]
  * Update .gitignore
  * Increase debug verbosity in `apt-get autoremove`
  * Extend test-apt-get-autoremove to check debug output

  [ Julian Andres Klode ]
  * tests: Do not expect requested-by if sudo was invoked by root
  * Run tests on GitLab CI
  * Handle a missed case of timed out ip addresses (LP: #1766542)
  * Lower default timeout from 120s to 30s
  * apt-key: Pass all instead of gpg-agent to gpgconf --kill (LP: #1773992)
  * Fix lock counting in debSystem

  [ annadane ]
  * Add verb 'be' to NEWS entry for 1.5~beta1 (Closes: 892792)

  [ Алексей Шилин ]
  * Russian program translation update (Closes: 898797)

  [ Frans Spiesschaert ]
  * Dutch program translation update (Closes: #900589)
  * Dutch manpage translation update (Closes: #900602)

 -- Julian Andres Klode <email address hidden> Tue, 19 Jun 2018 17:12:51 +0200

Changed in apt (Ubuntu):
status: Fix Committed → Fix Released
Changed in apt (Ubuntu Bionic):
status: New → In Progress
description: updated
description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Robert, or anyone else affected,

Accepted apt into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/apt/1.6.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in apt (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Julian Andres Klode (juliank) wrote :
Download full text (7.7 KiB)

I had to tweak the time outs a bit, as I quickly get some errors from the networking stack (like 7s or so), but I could successfully verify the fix now, using a standard ubuntu:bionic lxd container, and modifying the routes, and setting a lower timeout:

+ ip r del default
+ ip r add default via 10.33.102.254 dev eth0
+ ip -6 r del default
+ ip -6 r add default via fe80::689a:53ff:fe6f:8f85 dev eth0
RTNETLINK answers: File exists
+ cat
+ apt update -o acquire::http::timeout=1
Err:1 http://archive.ubuntu.com/ubuntu bionic InRelease
  Could not connect to archive.ubuntu.com:80 (91.189.88.162), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::21), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.152), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1560:8001::11), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.149), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1560:8001::14), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.161), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::17), connection timed out
Err:2 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
  Could not connect to archive.ubuntu.com:80 (91.189.88.162). - connect (113: No route to host) Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::21). - connect (113: No route to host) Could not connect to archive.ubuntu.com:80 (91.189.88.152), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1560:8001::11), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.149), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1560:8001::14), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.161), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::17), connection timed out
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic/InRelease Could not connect to archive.ubuntu.com:80 (91.189.88.162), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::21), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.152), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1560:8001::11), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.149), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1560:8001::14), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.161), connection timed out Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::17), connection timed out
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-updates/InRelease Could not connect to archive.ubuntu.com:80 (91.189.88.162). - connect (113: No route to host) Could not connect to archive.ubuntu.com:80 (2001:67c:1360:8001::21). - connect (113: No route to h...

Read more...

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for apt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apt - 1.6.2

---------------
apt (1.6.2) unstable; urgency=medium

  * Fix build with new gtest (Closes: #897149)
  * Handle a missed case of timed out ip addresses (LP: #1766542)
  * Lower default network timeouts from 120s to 30s
  * apt-key: Pass all instead of gpg-agent to gpgconf --kill (LP: #1773992)
  * Fix lock counting in debSystem (LP: #1778547)
  * CI fixes:
   - tests: Do not expect requested-by if sudo was invoked by root
   - Run tests on GitLab CI
   - CI: Export DEBIAN_FRONTEND=noninteractive in all CI environments

 -- Julian Andres Klode <email address hidden> Mon, 25 Jun 2018 17:15:10 +0200

Changed in apt (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in oem-priority:
status: Confirmed → Fix Released
Changed in ubiquity (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote :

I don't believe that any special fix is need for ubiquity given that the bug and fix were in apt, so I'm setting this task to Invalid.

Changed in ubiquity (Ubuntu Bionic):
status: New → Invalid
tags: removed: rls-bb-incoming
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.