On Wed, 2011-11-23 at 08:46 +0000, Mattias Backman wrote: [...] > > > > > > > >> has_found_description = True > > > >> - elif last_heading == 'Acceptance Criteria': > > > >> - acceptance_criteria = page_item['text'] > > > >> - > > > >> - return {'description': description, 'acceptance_criteria': > > acceptance_criteria} > > > >> + elif 'Acceptance Criteria' in last_heading: > > > >> + acceptance_criteria = page_item['html'] > > > >> + > > > >> + return {'description': get_first_paragraph(description), > > > >> + 'acceptance_criteria': > > get_first_paragraph(acceptance_criteria)} > > > >> + > > > >> + > > > >> +def get_first_paragraph(text): > > > >> + if text is None: > > > >> + return None > > > >> + # This might break, depending on what type of line breaks > > > >> + # whoever authors the Papyrs document uses. > > > >> + first_pararaph, _, _ = text.partition('
') > > > > > > > > Would it be possible to use something like BeautifulSoup to make this > > > > more robust? > > > > > > Possibly. I can have a look at what it can do. This seemed to work but > > > now I see that someone has added BRs just before a list too which gets > > > ugly. > > > > What does the HTML you're parsing look like? On, say > > https://linaro-public.papyrs.com/public/4552/KWG2011-AMP-IPC/, I see the > > actual content in a javascript function, so I'm assuming you're parsing > > something else? > > Here's one example with a list: http://paste.ubuntu.com/746780/ > Sorry that it's all on one line. I get it from the json API, but it's > not much better than scraping that javascript. Sometimes we get > whitespace with a lot of formatting around it, just like when people > edit html using MS Word. Ok, apparently there's not much structure on the HTML; just some lists and
s to force line breaks, so I think BeautifulSoup wouldn't help much here. > > > > > > > > > >> + return first_pararaph > > > >> > > > >> > > > >> ######################################################################## > > > > > > > > > > > >> === modified file 'report_tools.py' > > > >> --- report_tools.py 2011-10-24 15:08:02 +0000 > > > >> +++ report_tools.py 2011-11-21 15:22:45 +0000 [...] > > > > > > > > Having all the classes here make for a rather long function. Is there a > > > > reason for keeping them here other than the fact that they're not used > > > > anywhere else? > > > > > > I wasn't even sure that creating subclasses would be a good idea. > > > Since i fear that we will see a lot more checks when PMs know what > > > they want I guess these should go on a module instead. > > > > > > > > > > >> + > > > >> + health_checks = [] > > > >> + health_checks.append(RoadmapIdHealthCheck()) > > > >> + health_checks.append(DescriptionHealthCheck()) > > > >> + health_checks.append(CriteriaHealthCheck()) > > > >> + health_checks.append(BlueprintsHealthCheck()) > > > >> + health_checks.append(BlueprintsBlockedHealthCheck()) > > > > > > > > One neat thing we could do is use a class decorator on all HealthCheck > > > > classes to register them on a global registry (a module-level python > > > > list, really) and then use that here. That way one just needs to inherit > > > > from HealthCheck and use the decorator to add a health check. It could > > > > be a nice thing to have, but definitely not essential. > > > > > > It could make this a lot cleaner and it's going to be important to > > > make changes and additions really clean since the requirements are not > > > really clear yet. So with this I could just put all the checks in a > > > separate module and then just import the module? > > > > Yes, you could move all classes to a separate module and then just > > import the registry here. Just ping me if you'd like a hand with the > > decorator or anything else. > > Thanks for showing me how it's done. I'll use that. You're welcome. :) review approve