On Wed, 2011-11-23 at 08:46 +0000, Mattias Backman wrote:
[...]
> > > >
> > > >>                  has_found_description = True
> > > >> -            elif last_heading == 'Acceptance Criteria':
> > > >> -                acceptance_criteria = page_item['text']
> > > >> -
> > > >> -    return {'description': description, 'acceptance_criteria':
> > acceptance_criteria}
> > > >> +            elif 'Acceptance Criteria' in last_heading:
> > > >> +                acceptance_criteria = page_item['html']
> > > >> +
> > > >> +    return {'description': get_first_paragraph(description),
> > > >> +            'acceptance_criteria':
> > get_first_paragraph(acceptance_criteria)}
> > > >> +
> > > >> +
> > > >> +def get_first_paragraph(text):
> > > >> +    if text is None:
> > > >> +        return None
> > > >> +    # This might break, depending on what type of line breaks
> > > >> +    # whoever authors the Papyrs document uses.
> > > >> +    first_pararaph, _, _ = text.partition('<br>')
> > > >
> > > > Would it be possible to use something like BeautifulSoup to make this
> > > > more robust?
> > >
> > > Possibly. I can have a look at what it can do. This seemed to work but
> > > now I see that someone has added BRs just before a list too which gets
> > > ugly.
> > 
> > What does the HTML you're parsing look like? On, say
> > https://linaro-public.papyrs.com/public/4552/KWG2011-AMP-IPC/, I see the
> > actual content in a javascript function, so I'm assuming you're parsing
> > something else?
> 
> Here's one example with a list: http://paste.ubuntu.com/746780/
> Sorry that it's all on one line. I get it from the json API, but it's
> not much better than scraping that javascript. Sometimes we get
> whitespace with a lot of formatting around it, just like when people
> edit html using MS Word.

Ok, apparently there's not much structure on the HTML; just some lists
and <br>s to force line breaks, so I think BeautifulSoup wouldn't help
much here.

> > 
> > > >
> > > >> +    return first_pararaph
> > > >>
> > > >>
> > > >>  ########################################################################
> > > >
> > > >
> > > >> === modified file 'report_tools.py'
> > > >> --- report_tools.py   2011-10-24 15:08:02 +0000
> > > >> +++ report_tools.py   2011-11-21 15:22:45 +0000
[...]
> > > >
> > > > Having all the classes here make for a rather long function. Is there a
> > > > reason for keeping them here other than the fact that they're not used
> > > > anywhere else?
> > >
> > > I wasn't even sure that creating subclasses would be a good idea.
> > > Since i fear that we will see a lot more checks when PMs know what
> > > they want I guess these should go on a module instead.
> > >
> > > >
> > > >> +
> > > >> +    health_checks = []
> > > >> +    health_checks.append(RoadmapIdHealthCheck())
> > > >> +    health_checks.append(DescriptionHealthCheck())
> > > >> +    health_checks.append(CriteriaHealthCheck())
> > > >> +    health_checks.append(BlueprintsHealthCheck())
> > > >> +    health_checks.append(BlueprintsBlockedHealthCheck())
> > > >
> > > > One neat thing we could do is use a class decorator on all HealthCheck
> > > > classes to register them on a global registry (a module-level python
> > > > list, really) and then use that here. That way one just needs to inherit
> > > > from HealthCheck and use the decorator to add a health check. It could
> > > > be a nice thing to have, but definitely not essential.
> > >
> > > It could make this a lot cleaner and it's going to be important to
> > > make changes and additions really clean since the requirements are not
> > > really clear yet. So with this I could just put all the checks in a
> > > separate module and then just import the module?
> > 
> > Yes, you could move all classes to a separate module and then just
> > import the registry here. Just ping me if you'd like a hand with the
> > decorator or anything else.
> 
> Thanks for showing me how it's done. I'll use that.

You're welcome. :)

 review approve