Launchpad itself

Code review comment for lp:~cjwatson/launchpad/refactor-cron-germinate

refactor-cron-germinate
Merge into devel

Revision history for this message

Colin Watson (cjwatson) wrote on 2011-12-07:

[I've not responded to your entire review point-by-point, but I believe
I've addressed all points where I haven't given an explicit response.]

On Wed, Dec 07, 2011 at 09:22:24AM -0000, Jeroen T. Vermeulen wrote:
> + def runGerminate(self, override_file, series_name, arch, flavours,
> + structures):
>
> Rightly or wrongly, the “run” in that name led me to expect that this
> method would fire up a separate process. Maybe a very short docstring
> to dispel that notion?

The name doesn't actually make a lot of sense; I think originally I was
in the mindset where this actually did fire up a separate process. I've
renamed the script instance to germinateArch (with a docstring), and the
test instance to fetchGerminatedOverrides. Does that help?

> To me, your branch is of very high quality but this is its Achilles'
> heel. The following loop body is massive! I can't see it all on one
> page, so it becomes hard even to see just how long it is. If I judged
> the indentations right, it makes up the entire rest of the method.

That's fair; I went back and forward a bit on this, but clearly landed
in the wrong place. Actually, now that I've renamed runGerminate to
germinateArch, it's easier in my mind to turn a block of it into
germinateArchFlavour in turn. (Regularised naming may be boring, but I
like it anyway.)

The methods do start ending up with quite a few parameters, but that's
probably better than the alternative.

> Once you've extracted the loop body, given its size, it's probably
> still worth extracting chunks from that.

I've done a good deal of this, although will probably end up
restructuring a bit further per your later comments.

> According to some, the pattern of “comment, block of code, blank line,
> comment, block of code, blank line” is a strong indication that you
> need to split things up. It shows that you have already done so in
> your own mind, but leaves each individual reader to figure out what
> the dependencies between consecutive blocks are.

I don't necessarily agree with this in all cases (e.g. I don't know that
the new writeGerminateOutput method would benefit much from being split
further), but I agree that the original version of this code was rather
too far in the other direction.

> + # Add this to the germinate log as well so that that can be
> + # debugged more easily. Log a separator line first.
> + self.germinate_logger.info('', extra={'progress': True})
> + self.germinate_logger.info('Germinating for %s/%s/%s',
> + flavour, series_name, arch,
> + extra={'progress': True})
>
> What does the “extra” parameter do?

That attaches a dictionary of extra attributes to the log message which
can then be picked up by the formatter. Germinate uses this for certain
messages (arguably a wart in germinate's design; I should probably look
into doing that more neatly at some point). I've moved this to its own
method so that it can have a clarifying docstring.

> That's a lot of string manipulation. It may be clearer as a regex:
>
> task_header_regex = re.compile("task-([^:]*):(.*)", flags=re.IGNORECASE)
> for line in seedtext:
> match = task_header_regex.match(line)
> if match is not None:
> key, value = match.groups()
> task_headers[key.lower()] = value.strip()

Thanks. I used this mostly verbatim, although I simplified using ".*?".

> If flavours[0] has a special meaning, that's worth documenting.
> Consider documenting it in a docstring, or even making it a separate
> parameter.

I've added a primary_flavour parameter.

Regarding your question of whether Build-Essential overrides should be
pulled out of the loop, it's a tough call, but I actually think it's
clearer inside the loop now that we have the primary_flavour parameter.
That's because otherwise I'd have to pass the results of germination out
of the germinateArchFlavour method, and I think the flow control is
clearer if I don't do that. Also, I think it's worth keeping the two
"write overrides" pieces next to each other.

> I think for now flavours[0] is always “ubuntu,” but will we want to
> generalize this code for derived distros later?

As you say. The shell code I was replacing hardcoded [ "$distro" =
ubuntu ] tests, but I didn't see a need to repeat that hardcoding.

Thanks for the detailed commentary on seed/task parsing here; as well as
clearer code, it's resulted in the tests being a few seconds faster, and
every little helps.

[I've not responded to your entire review point-by-point, but I believe
I've addressed all points where I haven't given an explicit response.]

On Wed, Dec 07, 2011 at 09:22:24AM -0000, Jeroen T. Vermeulen wrote:
> +    def runGerminate(self, override_file, series_name, arch, flavours,
> +                     structures):
> 
> Rightly or wrongly, the “run” in that name led me to expect that this
> method would fire up a separate process.  Maybe a very short docstring
> to dispel that notion?

The name doesn't actually make a lot of sense; I think originally I was
in the mindset where this actually did fire up a separate process.  I've
renamed the script instance to germinateArch (with a docstring), and the
test instance to fetchGerminatedOverrides.  Does that help?

> To me, your branch is of very high quality but this is its Achilles'
> heel.  The following loop body is massive!  I can't see it all on one
> page, so it becomes hard even to see just how long it is.  If I judged
> the indentations right, it makes up the entire rest of the method.

That's fair; I went back and forward a bit on this, but clearly landed
in the wrong place.  Actually, now that I've renamed runGerminate to
germinateArch, it's easier in my mind to turn a block of it into
germinateArchFlavour in turn.  (Regularised naming may be boring, but I
like it anyway.)

The methods do start ending up with quite a few parameters, but that's
probably better than the alternative.

> Once you've extracted the loop body, given its size, it's probably
> still worth extracting chunks from that.

I've done a good deal of this, although will probably end up
restructuring a bit further per your later comments.

> According to some, the pattern of “comment, block of code, blank line,
> comment, block of code, blank line” is a strong indication that you
> need to split things up.  It shows that you have already done so in
> your own mind, but leaves each individual reader to figure out what
> the dependencies between consecutive blocks are.

> +            # Add this to the germinate log as well so that that can be
> +            # debugged more easily.  Log a separator line first.
> +            self.germinate_logger.info('', extra={'progress': True})
> +            self.germinate_logger.info('Germinating for %s/%s/%s',
> +                                       flavour, series_name, arch,
> +                                       extra={'progress': True})
> 
> What does the “extra” parameter do?

That attaches a dictionary of extra attributes to the log message which
can then be picked up by the formatter.  Germinate uses this for certain
messages (arguably a wart in germinate's design; I should probably look
into doing that more neatly at some point).  I've moved this to its own
method so that it can have a clarifying docstring.

> That's a lot of string manipulation.  It may be clearer as a regex:
> 
>     task_header_regex = re.compile("task-([^:]*):(.*)", flags=re.IGNORECASE)
>     for line in seedtext:
>         match = task_header_regex.match(line)
>         if match is not None:
>             key, value = match.groups()
>             task_headers[key.lower()] = value.strip()

Thanks.  I used this mostly verbatim, although I simplified using ".*?".

> If flavours[0] has a special meaning, that's worth documenting.
> Consider documenting it in a docstring, or even making it a separate
> parameter.

I've added a primary_flavour parameter.

Regarding your question of whether Build-Essential overrides should be
pulled out of the loop, it's a tough call, but I actually think it's
clearer inside the loop now that we have the primary_flavour parameter.
That's because otherwise I'd have to pass the results of germination out
of the germinateArchFlavour method, and I think the flow control is
clearer if I don't do that.  Also, I think it's worth keeping the two
"write overrides" pieces next to each other.

> I think for now flavours[0] is always “ubuntu,” but will we want to
> generalize this code for derived distros later?

As you say.  The shell code I was replacing hardcoded [ "$distro" =
ubuntu ] tests, but I didn't see a need to repeat that hardcoding.

Thanks for the detailed commentary on seed/task parsing here; as well as
clearer code, it's resulted in the tests being a few seconds faster, and
every little helps.

« Back to merge proposal