Error fetching git kernel trees

Bug #1012037 reported by Milo Casagrande
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro patch metrics
Fix Released
Medium
Milo Casagrande

Bug Description

Cronjob '/srv/patches.linaro.org/apps/patchwork/bin/update-committed-patches.py' that is running at patches.linaro.org reports timeout errors when fething some particular git kernel trees.

Am attaching here the output we receive via mail.

Related branches

Revision history for this message
Milo Casagrande (milo) wrote :
Revision history for this message
Milo Casagrande (milo) wrote :

Copy-pasting the git URLs into a browser, looks like all of them cannot be found.

Revision history for this message
Milo Casagrande (milo) wrote :

Update: looks like http://git.kernel.org is unreachable. Cloning either via HTTP or GIT protocol is not working.

Milo Casagrande (milo)
Changed in linaro-patchmetrics:
assignee: nobody → Milo Casagrande (milo)
Revision history for this message
Milo Casagrande (milo) wrote :

Update: http://git.kernel.org is now back online, local cloning via HTTP and GIT seems to work. Do not know if we should ask Canonical IS to manually run the script for us and check that it works, but might be interesting to know nonetheless at what exact time the cronjob runs.

Revision history for this message
Milo Casagrande (milo) wrote :

Opened a ticket on rt.linaro.org (ticket number is 493 [1]): git.kernel.org is back on businness since yesterday, but we still receive timeouts.

Before touching the code, we will investigate a little bit the issue server side with IS to have a better idea if something is going on there.

[1] https://rt.linaro.org/Ticket/Display.html?id=493

Changed in linaro-patchmetrics:
status: New → In Progress
importance: Undecided → Low
Revision history for this message
Milo Casagrande (milo) wrote :

Attaching here an updated version of the errors we get: might be identical to the previous one, but at least is newer.

Changed in linaro-patchmetrics:
milestone: none → 2012.06
Milo Casagrande (milo)
Changed in linaro-patchmetrics:
importance: Low → Medium
Revision history for this message
Milo Casagrande (milo) wrote :

Couple of updates.

This kernel tree: http://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq.git ( http://git.kernel.org/?p=linux/kernel/git/davej/cpufreq.git;a=summary ) does not have a "real" HEAD ref, and it doesn't even have a master branch, it has two remote branches: "next" and "fixes", and they both didn't see a commit in 3 months.
We might consider:
- removing it in some way, but it might pops up again if there are old patches that went in there
- create a workaround for this: just skip it; or after cloning it do a "git checkout next" in order to swith to that branch, hoping the patches we have are there, but this means that a "git pull origin HEAD" (as it is written in the code for the next update command) will always not work (BTW that command will work also if written as "git pull origin" since HEAD is just a reference).

The problem we have with the timeout on kernel trees, looks like lies in the output from git and the subprocess.Popen function. For unknown reasons to me, git output of "git clone" all goes to stderr (I thought it was stdout). Non-suppressing that, the clone operation works (in a couple of cases I had to increment the max timeout, because I really reached that timeout probably due to a slower connection). I tried also passing "-q" (for quite) to "git clone", increasing the bufsize, and redirecting stderr to /dev/null on the command line, but all ot these didn't work. Using the GIT protocol instead of HTTP, the timeouts do no happen anymore. The "pull" operations do not seem to be affected by this problem.

Revision history for this message
Milo Casagrande (milo) wrote :

First patch that should fix some of the timeouts and errors we get has been pushed to trunk. Am not setting it as "fix commited" nor "fix released" yet since I would like to monitor the situation and check what other errors we might still have.

We will need to ask for a new deploy of the service from IS in order to test it.

Revision history for this message
Milo Casagrande (milo) wrote :

After talking a little bit with IS on IRC, and requesting a manual run of the cronjob (looks like the deployment was done after the cronjob was started and we always got the same error), now it seems to work.

Taking a look here:
http://patches.linaro.org/project/linux-arm-kernel/

The "Last commit scanned" value now points to a commit of June 21st, where before it was pointing to a May commit.
Will wait and see what happens with the other git repositories.

Revision history for this message
Milo Casagrande (milo) wrote :

With the new deployment, the situation is a little bit better.

The problems we have right now are still with some git repository, that reach timeouts.
In particular:
- gcc git repository: cloned locally, it takes something like 96 minutes, way more than our max timeout; with this we might have another problem, since it is in a "detached HEAD" state, meaning that no default branch is set in the repository, and we might want to switch to "master" to perform "pull" operations on it (otherwise we will always get an error).
- other repositories take a lot locally too, might be good to increase the timeout again (now is 55 minutes, could be good to take it to 90) or ask IS to set up manually the repositories. This happens only on "clone" operations.

The other problem we could have is with "pull" operation when updating the already cloned repositories: as for the "gcc" one, not all of them have a default branch, nor a "master" one. We can deal with that in the code, but this means calling git, parsing output and making guess of what we might need, we do not have a way to specify a "branch" to be taken (worth adding it in the database?).

Revision history for this message
Milo Casagrande (milo) wrote :

To temporarly fix the problem of the missing default "branch" we have in the git repository (specific reported as bug 1017933), for the moment we can store in a dictionary those projects we have to switch (perform a git checkout), in order to decrease the number of errors.

For the cpu-freq repository, we had a reply from Amit that suggest to use the "next" branch. Still waiting for a reply for gcc.

Changed in linaro-patchmetrics:
milestone: 2012.06 → 2012.07
Revision history for this message
Milo Casagrande (milo) wrote :

The situation is a little bit better at the moment, but we still get some errors even with a max timeout of 120 minutes. This still happens only remotely, locally it works and it performs git clone and pull operation correctly (it takes a while, but never over 120 minutes).

Revision history for this message
Milo Casagrande (milo) wrote :

A new updated on this, since we are still getting timeout errors. I will open a ticket and try to work out with IS what is happening there, adding maintenance team in CC.

The errors we receive, not all of them, contain this:

error: RPC failed; result=18, HTTP code = 200
fatal: protocol error: bad pack header
warning: http unexpectedly said: '0000'

It might be something on server side, related to git version, or something else in how we clone the repositories.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Few comments:

Errors like "protocol error: bad pack header" are well known and seen elsewhere. I guess, for some share of such errors we can to the bottom of it, i.e. get systematic cause of the issue. But rest are still what it seems: random, non-deterministic errors which happen regularly working with gross amounts of data transferred over network with systems which undergo regular strain.

So, the proper solution to this would be to always expect such errors, and handle them properly, which is not easy: reporting them all the time as we do now is mostly noise, not reporting is obviously won't, what would work is to ignore 1-2 cases of such errors in row (random failures), but start to report if there more failures in series (sign of systematic error). Of course to do that, we need to add whole context-dependent error management facility to patchmetrics (well, to few other such projects too), and that's again extends their scope well beyond the original one.

git pull issues: lack of master branch and/or proper HEAD reference is also something we had problems when used old repo-based mirror on android-build, and was one of the reason we switched to custom mirror solution (using raw git) and seeded builds. Generally, a command to update local repository from remote is "git fetch", "git pull" is a combination of git pull and git co, and for scripted usage, git fetch may be better. That doesn't solve the main problem of master branch not being available, but then again we probably should take that as normal, expected case, and make sure that patchmetrics accepts the project in the form of (repo_url, repo_branch), not just repo_url (having master as default for repo_url is fine of course).

David Zinman (dzinman)
Changed in linaro-patchmetrics:
milestone: 2012.07 → 2012.08
Revision history for this message
Milo Casagrande (milo) wrote :

Hello there,

in order to continue on this one, it is necessary to understand what is going on with running git commands. We need to interact also with IS to deploy the patches. All I had in mind for this problem is written in the previous comments. Another idea I had was to not-suppress stdout otuput from running the git command, and see if that helps solving the problem.

Milo Casagrande (milo)
Changed in linaro-patchmetrics:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.