Merge into 6.1 : 6.1-fix-project-gtd-encoding-issue-5 : Code : Odoo Addons (MOVED TO GITHUB)

Status:

Rejected

Rejected by:

Olivier Dony (Odoo) on 2012-05-03

Proposed branch:

lp:~vaab/openobject-addons/6.1-fix-project-gtd-encoding-issue-5

Merge into:

lp:openobject-addons/6.1

Diff against target:

30 lines (+9/-2)

1 file modified

project_gtd/project_gtd.py (+9/-2)

To merge this branch:

bzr merge lp:~vaab/openobject-addons/6.1-fix-project-gtd-encoding-issue-5

Undecided

New

Link a bug report

Reviewer	Date Requested	Status
Olivier Dony (Odoo)	2012-03-29	Needs Fixing on 2012-05-02
Valentin Lab (community)		Needs Resubmitting on 2012-03-31
Amit Parik (community)		Disapprove on 2012-03-31
Review via email: mp+99998@code.launchpad.net

Description of the change

Enforce utf-8 encoding so that "search_extended" is of expected type.

Please see the bug description for more information.

Revision history for this message

Amit Parik (amit-parik) wrote on 2012-03-31:

#

Hello Valentin,

I think we no need to create a method for encoding.
Just change the res['arch'] = unicode(res['arch'], 'utf-8').replace('<separator name="gtdsep"/>', search_extended) at line#118 fields_view_get
of project_gtd.py. Which is solved the problem.

Would you please look in to lp:944761 at there correct fix is available.

Thank you!

review: Disapprove

Revision history for this message

Valentin Lab (vaab) wrote on 2012-03-31:

#

Yes, to be honest, this was the first hack that came to mind.

But after analysis, this is not logical way of solving this. This is why:
- res["arch"] is always encoded in "utf-8". This is due to "osv/orm.py". So no surprises here. But, there are no obvious way to tell that search_extended is unicode by looking to the code as it is build from simple strings in python and gets casted in unicode by side-effect. This is obviously what caused the original programmer to let this bug through.
- your solution changes the type of res["arch"], it is not anymore a "utf-8" encoded string but a unicode. This will lead to other bugs as everywhere res["arch"] is encoded in "utf-8".
- You can also note that in openobject-server, the habits are to encode unicode strings before using them. The "encode" function I use came directly from there.

For these reasons, it seemed much better to propose this patch.

Therefore I ask you to reconcider this patch.

review: Needs Resubmitting

Revision history for this message

Valentin Lab (vaab) wrote on 2012-04-02:

#

Please ignore the Resubmit, that's a mis-click ;).

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2012-05-02:

#

Download full text (4.1 KiB)

> But after analysis, this is not logical way of solving this. This is why:
> - res["arch"] is always encoded in "utf-8". This is due to "osv/orm.py". So no
> surprises here. But, there are no obvious way to tell that search_extended is
> unicode by looking to the code as it is build from simple strings in python
> and gets casted in unicode by side-effect. This is obviously what caused the
> original programmer to let this bug through.

A common Python2 issue, unfortunately. The situation is made worse because we have historically used different techniques to care for these situations.
It is particularly the case with fields_view_get() because we are using the ElementTree API for processing XML, and that library could not work with unicode data in the past.

> - your solution changes the type of res["arch"], it is not anymore a "utf-8"
> encoded string but a unicode. This will lead to other bugs as everywhere
> res["arch"] is encoded in "utf-8".
> - You can also note that in openobject-server, the habits are to encode
> unicode strings before using them. The "encode" function I use came directly
> from there.

I can understand your suggestion, given the way our current code produces the result of field_view_get().

However there is in fact a third solution that is normally recommended for solving cases of mix utf8/unicode strings: coerce every string you need to work with to unicode using tools.ustr(), eliminating any encoding ambiguity. This util method will gracefully accept input that is already unicode, so it is safe to use at any point when encoding issues arrive. If you search for "ustr(" in the codebase you will see how common it is.

Our current[1] policy is more or less this: code should be agnostic of the actual kind of strings it is using, use the common string API, and let Python and our framework do its job, which will be fine 99% of the time.
Field values come from the database as unicode and are the most common place where non-ASCII characters will be found, so that is properly taken care of: when they are concatenated with byte-strings the latter will be automatically coerced to unicode by Python (and that will work because literal strings in the code should only contain ASCII).
For the few cases where we need to care about the difference utf8/unicode (such as here, when dealing with non-ASCII utf-8), a safer thing to do is to use tools.ustr() to coerce all inputs to unicode, and then proceed with the operation. This will be safe in all cases, if the above policy is respected everywhere else: in the end the unicode result will be stored or sent to the world over the network, and everything will be fine.

Ideally OpenERP should work with unicode 100% of the time and convert back and forth to utf-8 only when exchanging bytes with the rest of the world (à la Python3): database, files, network. Unfortunately that is not the case everywhere for historical reasons, and the above policy provides a safe middle-ground until we get to that point. It seems safer to go to unicode as quickly as possible whenever an encoding issue occurs, in order to remove that encoding ambiguity properly, even at the risk of triggering an issue elsewher...

> But after analysis, this is not logical way of solving this. This is why:
> - res["arch"] is always encoded in "utf-8". This is due to "osv/orm.py". So no
> surprises here. But, there are no obvious way to tell that search_extended is
> unicode by looking to the code as it is build from simple strings in python
> and gets casted in unicode by side-effect. This is obviously what caused the
> original programmer to let this bug through.

A common Python2 issue, unfortunately. The situation is made worse because we have historically used different techniques to care for these situations.
It is particularly the case with fields_view_get() because we are using the ElementTree API for processing XML, and that library could not work with unicode data in the past.

> - your solution changes the type of res["arch"], it is not anymore a "utf-8"
> encoded string but a unicode. This will lead to other bugs as everywhere
> res["arch"] is encoded in "utf-8".
> - You can also note that in openobject-server, the habits are to encode
> unicode strings before using them. The "encode" function I use came directly
> from there.

I can understand your suggestion, given the way our current code produces the result of field_view_get().

However there is in fact a third solution that is normally recommended for solving cases of mix utf8/unicode strings: coerce every string you need to work with to unicode using tools.ustr(), eliminating any encoding ambiguity. This util method will gracefully accept input that is already unicode, so it is safe to use at any point when encoding issues arrive. If you search for "ustr(" in the codebase you will see how common it is.

Our current[1] policy is more or less this: code should be agnostic of the actual kind of strings it is using, use the common string API, and let Python and our framework do its job, which will be fine 99% of the time.
Field values come from the database as unicode and are the most common place where non-ASCII characters will be found, so that is properly taken care of: when they are concatenated with byte-strings the latter will be automatically coerced to unicode by Python (and that will work because literal strings in the code should only contain ASCII).
For the few cases where we need to care about the difference utf8/unicode (such as here, when dealing with non-ASCII utf-8), a safer thing to do is to use tools.ustr() to coerce all inputs to unicode, and then proceed with the operation. This will be safe in all cases, if the above policy is respected everywhere else: in the end the unicode result will be stored or sent to the world over the network, and everything will be fine.

Ideally OpenERP should work with unicode 100% of the time and convert back and forth to utf-8 only when exchanging bytes with the rest of the world (à la Python3): database, files, network. Unfortunately that is not the case everywhere for historical reasons, and the above policy provides a safe middle-ground until we get to that point. It seems safer to go to unicode as quickly as possible whenever an encoding issue occurs, in order to remove that encoding ambiguity properly, even at the risk of triggering an issue elsewhere in case *another* non-ASCII utf8 strings needs to be used.

Of course as we are explicitly dealing with fields_view_get() here, you could be tempted to keep the historical utf-8 strategy. However I think it would be sensible to apply the above policy, which is a better transition towards a 100% unicode world.

In any case I would go for a solution that is:
- minimalist: if possible not increasing the lines of code of the module, especially not duplicating code, e.g. your encode() method can be safely replaced by: tools.ustr(time.name).encode('utf-8')
- safe: not making assumptions to the actual type of strings, in compliance with the policy - if we need to ensure  res['arch'] is unicode at some point, we must not encode() it assuming it is utf-8, but should use tools.ustr() that will handle unicode input too. See my other comment on [2]

Hope this makes sense,

[1] implicit :-(
[2] https://code.launchpad.net/~openerp-dev/openobject-addons/trunk-bug-944761-amp/+merge/100272

review: Needs Fixing

Revision history for this message

Valentin Lab (vaab) wrote on 2012-05-02:

#

Many thanks for this lengthy answer. I wanted to get consistency with historic reason (duplication of ``encode`` function -- which appears 4 times in server code -- twice in orm.py !, and consistency with 'utf-8' return type of fields_view_get), and understood that it should be avoided.

I came across ``tools.ustr()``, but didn't know (how could I?) that it was the actual policy... I'm glad there's one, and grateful that you took the time to explain it carefully.

I have seen that Amit Parik did this conversion. So I consider this merge proposal closed ;)

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2012-05-03:

#

> I have seen that Amit Parik did this conversion. So I consider this merge
> proposal closed ;)

Sure, so if you don't mind I'll mark it as Rejected to take it out from the list of to-be-reviewed merge proposals.

Thanks!

Unmerged revisions

6716. By Valentin Lab on 2012-03-29: [FIX] fixed encoding issue with ``time.icon`` and ``time.name`` that could be unicode.

1	=== modified file 'project_gtd/project_gtd.py'
2	--- project_gtd/project_gtd.py 2011-12-21 22:15:04 +0000
3	+++ project_gtd/project_gtd.py 2012-03-29 18:37:27 +0000
4	@@ -101,6 +101,12 @@
5	return True
6
7	def fields_view_get(self, cr, uid, view_id=None, view_type='form', context=None, toolbar=False, submenu=False):
8	+
9	+ def encode(s):
10	+ if isinstance(s, unicode):
11	+ return s.encode('utf8')
12	+ return s
13	+
14	res = super(project_task,self).fields_view_get(cr, uid, view_id, view_type, context, toolbar=toolbar, submenu=submenu)
15	search_extended = False
16	timebox_obj = self.pool.get('project.gtd.timebox')
17	@@ -109,10 +115,11 @@
18	search_extended =''
19	for time in tt:
20	if time.icon:
21	- icon = time.icon
22	+ icon = encode(time.icon)
23	else :
24	icon=""
25	- search_extended += '''<filter domain="[('timebox_id','=', ''' + str(time.id) + ''')]" icon="''' + icon + '''" string="''' + time.name + '''" context="{'user_invisible': True}"/>\n'''
26	+
27	+ search_extended += '''<filter domain="[('timebox_id','=', ''' + str(time.id) + ''')]" icon="''' + icon + '''" string="''' + encode(time.name) + '''" context="{'user_invisible': True}"/>\n'''
28	search_extended +='''<separator orientation="vertical"/>'''
29
30	res['arch'] = res['arch'].replace('<separator name="gtdsep"/>', search_extended)

Odoo Addons (MOVED TO GITHUB)

Merge lp:~vaab/openobject-addons/6.1-fix-project-gtd-encoding-issue-5 into lp:openobject-addons/6.1

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers