Merge lp:~vaab/openobject-addons/6.1-fix-project-gtd-encoding-issue-5 into lp:openobject-addons/6.1

Proposed by Valentin Lab
Status: Rejected
Rejected by: Olivier Dony (Odoo)
Proposed branch: lp:~vaab/openobject-addons/6.1-fix-project-gtd-encoding-issue-5
Merge into: lp:openobject-addons/6.1
Diff against target: 30 lines (+9/-2)
1 file modified
project_gtd/project_gtd.py (+9/-2)
To merge this branch: bzr merge lp:~vaab/openobject-addons/6.1-fix-project-gtd-encoding-issue-5
Reviewer Review Type Date Requested Status
Olivier Dony (Odoo) Needs Fixing
Valentin Lab (community) Needs Resubmitting
Amit Parik (community) Disapprove
Review via email: mp+99998@code.launchpad.net

Description of the change

Enforce utf-8 encoding so that "search_extended" is of expected type.

Please see the bug description for more information.

To post a comment you must log in.
Revision history for this message
Amit Parik (amit-parik) wrote :

Hello Valentin,

I think we no need to create a method for encoding.
Just change the res['arch'] = unicode(res['arch'], 'utf-8').replace('<separator name="gtdsep"/>', search_extended) at line#118 fields_view_get
of project_gtd.py. Which is solved the problem.

Would you please look in to lp:944761 at there correct fix is available.

Thank you!

review: Disapprove
Revision history for this message
Valentin Lab (vaab) wrote :

Yes, to be honest, this was the first hack that came to mind.

But after analysis, this is not logical way of solving this. This is why:
- res["arch"] is always encoded in "utf-8". This is due to "osv/orm.py". So no surprises here. But, there are no obvious way to tell that search_extended is unicode by looking to the code as it is build from simple strings in python and gets casted in unicode by side-effect. This is obviously what caused the original programmer to let this bug through.
- your solution changes the type of res["arch"], it is not anymore a "utf-8" encoded string but a unicode. This will lead to other bugs as everywhere res["arch"] is encoded in "utf-8".
- You can also note that in openobject-server, the habits are to encode unicode strings before using them. The "encode" function I use came directly from there.

For these reasons, it seemed much better to propose this patch.

Therefore I ask you to reconcider this patch.

review: Needs Resubmitting
Revision history for this message
Valentin Lab (vaab) wrote :

Please ignore the Resubmit, that's a mis-click ;).

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :
Download full text (4.1 KiB)

> But after analysis, this is not logical way of solving this. This is why:
> - res["arch"] is always encoded in "utf-8". This is due to "osv/orm.py". So no
> surprises here. But, there are no obvious way to tell that search_extended is
> unicode by looking to the code as it is build from simple strings in python
> and gets casted in unicode by side-effect. This is obviously what caused the
> original programmer to let this bug through.

A common Python2 issue, unfortunately. The situation is made worse because we have historically used different techniques to care for these situations.
It is particularly the case with fields_view_get() because we are using the ElementTree API for processing XML, and that library could not work with unicode data in the past.

> - your solution changes the type of res["arch"], it is not anymore a "utf-8"
> encoded string but a unicode. This will lead to other bugs as everywhere
> res["arch"] is encoded in "utf-8".
> - You can also note that in openobject-server, the habits are to encode
> unicode strings before using them. The "encode" function I use came directly
> from there.

I can understand your suggestion, given the way our current code produces the result of field_view_get().

However there is in fact a third solution that is normally recommended for solving cases of mix utf8/unicode strings: coerce every string you need to work with to unicode using tools.ustr(), eliminating any encoding ambiguity. This util method will gracefully accept input that is already unicode, so it is safe to use at any point when encoding issues arrive. If you search for "ustr(" in the codebase you will see how common it is.

Our current[1] policy is more or less this: code should be agnostic of the actual kind of strings it is using, use the common string API, and let Python and our framework do its job, which will be fine 99% of the time.
Field values come from the database as unicode and are the most common place where non-ASCII characters will be found, so that is properly taken care of: when they are concatenated with byte-strings the latter will be automatically coerced to unicode by Python (and that will work because literal strings in the code should only contain ASCII).
For the few cases where we need to care about the difference utf8/unicode (such as here, when dealing with non-ASCII utf-8), a safer thing to do is to use tools.ustr() to coerce all inputs to unicode, and then proceed with the operation. This will be safe in all cases, if the above policy is respected everywhere else: in the end the unicode result will be stored or sent to the world over the network, and everything will be fine.

Ideally OpenERP should work with unicode 100% of the time and convert back and forth to utf-8 only when exchanging bytes with the rest of the world (à la Python3): database, files, network. Unfortunately that is not the case everywhere for historical reasons, and the above policy provides a safe middle-ground until we get to that point. It seems safer to go to unicode as quickly as possible whenever an encoding issue occurs, in order to remove that encoding ambiguity properly, even at the risk of triggering an issue elsewher...

Read more...

review: Needs Fixing
Revision history for this message
Valentin Lab (vaab) wrote :

Many thanks for this lengthy answer. I wanted to get consistency with historic reason (duplication of ``encode`` function -- which appears 4 times in server code -- twice in orm.py !, and consistency with 'utf-8' return type of fields_view_get), and understood that it should be avoided.

I came across ``tools.ustr()``, but didn't know (how could I?) that it was the actual policy... I'm glad there's one, and grateful that you took the time to explain it carefully.

I have seen that Amit Parik did this conversion. So I consider this merge proposal closed ;)

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

> I have seen that Amit Parik did this conversion. So I consider this merge
> proposal closed ;)

Sure, so if you don't mind I'll mark it as Rejected to take it out from the list of to-be-reviewed merge proposals.

Thanks!

Unmerged revisions

6716. By Valentin Lab

[FIX] fixed encoding issue with ``time.icon`` and ``time.name`` that could be unicode.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'project_gtd/project_gtd.py'
2--- project_gtd/project_gtd.py 2011-12-21 22:15:04 +0000
3+++ project_gtd/project_gtd.py 2012-03-29 18:37:27 +0000
4@@ -101,6 +101,12 @@
5 return True
6
7 def fields_view_get(self, cr, uid, view_id=None, view_type='form', context=None, toolbar=False, submenu=False):
8+
9+ def encode(s):
10+ if isinstance(s, unicode):
11+ return s.encode('utf8')
12+ return s
13+
14 res = super(project_task,self).fields_view_get(cr, uid, view_id, view_type, context, toolbar=toolbar, submenu=submenu)
15 search_extended = False
16 timebox_obj = self.pool.get('project.gtd.timebox')
17@@ -109,10 +115,11 @@
18 search_extended =''
19 for time in tt:
20 if time.icon:
21- icon = time.icon
22+ icon = encode(time.icon)
23 else :
24 icon=""
25- search_extended += '''<filter domain="[('timebox_id','=', ''' + str(time.id) + ''')]" icon="''' + icon + '''" string="''' + time.name + '''" context="{'user_invisible': True}"/>\n'''
26+
27+ search_extended += '''<filter domain="[('timebox_id','=', ''' + str(time.id) + ''')]" icon="''' + icon + '''" string="''' + encode(time.name) + '''" context="{'user_invisible': True}"/>\n'''
28 search_extended +='''<separator orientation="vertical"/>'''
29
30 res['arch'] = res['arch'].replace('<separator name="gtdsep"/>', search_extended)