Merge lp:~seif/zeitgeist/fixing-payloads into lp:zeitgeist/0.1

Proposed by Seif Lotfy
Status: Merged
Merged at revision: 1654
Proposed branch: lp:~seif/zeitgeist/fixing-payloads
Merge into: lp:zeitgeist/0.1
Diff against target: 98 lines (+54/-5)
3 files modified
_zeitgeist/engine/datamodel.py (+5/-2)
_zeitgeist/engine/sql.py (+8/-1)
test/remote-test.py (+41/-2)
To merge this branch: bzr merge lp:~seif/zeitgeist/fixing-payloads
Reviewer Review Type Date Requested Status
Siegfried Gevatter Approve
Mikkel Kamstrup Erlandsen Needs Information
Review via email: mp+44248@code.launchpad.net

Description of the change

This branch fixes
https://bugs.launchpad.net/zeitgeist/+bug/692645
by mapping the bytes in the payload to ord before sending it out to the client

To post a comment you must log in.
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Can you the Hello World test case to also use non-ascii unicode characters?

review: Needs Information
Revision history for this message
Seif Lotfy (seif) wrote :

huh? example ?

Revision history for this message
Seif Lotfy (seif) wrote :

I added another example using non-ascii unicorde character and got it to work by fixing an issue in the UnicodeCursor where

if isinstance(obj, str):
 obj = obj.decode("UTF-8")
return unicode(obj)

and according to the python channel
<Peng> seiflotfy: test.decode("UTF-8") returns a unicode object. Running it through unicode() again is redundant.

Revision history for this message
Markus Korn (thekorn) wrote :

sorry, but this is taking a completely weird direction.

First of all,
  test/engine-test.py ZeitgeistEngineTest.testEventWithBinaryPayload
is failing for me, but this is not the main issue.

My main problem is: we always had the principle that what ever a user puts into the log comes out in the same way, but with this branch it changes dramatically, something like

>>> event.payload = "Hallo world"
>>> ids = engine.insert_events([event,])
>>> result = engine.get_events(ids)[0]
>>> assert event.payload == result.payload

will not work anymore, instead you have to use

>>> assert map(ord, event.payload) == result.payload

And worst of all, to get a `human readable` representation of the results payload you have to

>>> u"".join(map(unichr, result.payload))

Also the `fix` in UnicodeCursor is wrong. I agree the unicode() is redundant for the case of an str argument, but it isn't for floats and other random datatypes.

There seems to be a lot of trouble with payloads recently, seems like we have to fix payloads in a sane way, and not by doing some random changes at some places until it *seems* to work. At the end it is mostly an issue of how we define `payload`, internally and to the outside world - one of the issues here is that the phrase "binary" is misleading, I will follow up on this thoughts on the mailing list, this MP is not the right place.

Revision history for this message
Seif Lotfy (seif) wrote :

Ofcourse it wont work. If you send a payload from the client to the engine
it will be an array of bytes. It is translated that way by dbus.
This means u wont get ur "Hello World" string on the other side.

The problem in the first place was that remotely we couldnt get events with
payloads. So this is where this all starts.
Try the the testEventWithStringPayload. Since it fails to send it back to
the client it means we are facing an internal issue.
I narrowed it down and I think my approach is actually right. I mean if u
send a binary u need to decode it to start something with it.
The test cases work.

The only other solution imho would be to change payload to a string instead
of an array of bytes.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

On 21 December 2010 11:58, Seif Lotfy <email address hidden> wrote:
> Ofcourse it wont work. If you send a payload from the client to the engine
> it will be an array of bytes. It is translated that way by dbus.
> This means u wont get ur "Hello World" string on the other side.
>
> The problem in the first place was that remotely we couldnt get events with
> payloads. So this is where this all starts.
> Try the the testEventWithStringPayload. Since it fails to send it back to
> the client it means we are facing an internal issue.
> I narrowed it down and I think my approach is actually right. I mean if u
> send a binary u need to decode it to start something with it.
> The test cases work.
>
> The only other solution imho would be to change payload to a string instead
> of an array of bytes.

Technically a string is *exactly identical* to an array of bytes. "The
difference is the same" as we say in Denmark :-) It's just the Python
logic here that messes it up, in C, fx., it's just a matter of how you
cast the pointers.

The contract that makes sense is twofold:

 1) In the Python client API: event_before.payload == event_after.payload, and
 2) On the DBus wire level the same invariant: event_before[2] == event_after[2]

So this is what the unit tests should assert. It seems we have a bug
here and that needs fixing of course - and a non-invasive fix like
Seif's patch here seems ok by me, even if there may theoretically
cleaner ways we can do this. Given no concrete alternatives we should
merge Seif's branch given that it implements the contract as I
outline.

That's my take on this at least.

lp:~seif/zeitgeist/fixing-payloads updated
1656. By Seif Lotfy

add a try, except statement

1657. By Seif Lotfy

remove whitespace

1658. By Seif Lotfy

fix remote test case

1659. By Seif Lotfy

remove try except statement

1660. By Seif Lotfy

removed redundant unicode casting and explained the try statement

1661. By Seif Lotfy

merge with trunk
Added new test cases and explainet the try exception

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Good work. Some comments though:

> self[2] = str(self[2])

What is this doing?

Also, the docstring for that method says "Ensure that all fields in the event struct are non-None", it doesn't mention any type conversions or other changes, so if we need some change here it should be documented properly.

----------

# seif: Python’s default encoding is ASCII, so whenever a character with
# an ASCII value > 127 is in the input data, you’ll get a UnicodeDecodeError
# because that character can’t be handled by the ASCII encoding.

So why is it "except:" and not "except UnicodeDecodeError:"?

Also, I'm not quite sure we want this exception. So we ignore it at this point, but what happens after that? We let SQLite see what it can do with it? What if this is an insert and we're placing crap into the database? Zeitgeist will crash once it retrieves it and tries to convert it to a unicode for sending it back through D-Bus.

----------

The commit is introducing again testDataSourcesRegistry, which was moved (and renamed) into another class.

----------

I believe it'd be nicer if the test cases didn't check the integer values, but the string instead, since this way they serve as documentation on how to recover the string.

Eg. instead of checking for:
 > self.assertEquals(events[0].payload, map(ord, payload))
... you could check for:
 > self.assertEquals(payload, ''.join(map(chr, events[0].payload)).decode('utf-8'))

But feel free to ignore this last point.

review: Needs Fixing
Revision history for this message
Seif Lotfy (seif) wrote :

On Tue, Dec 28, 2010 at 8:31 PM, Siegfried Gevatter <email address hidden>wrote:

> Review: Needs Fixing
> Good work. Some comments though:
>
>
> > self[2] = str(self[2])
>
> What is this doing?
>
> Also, the docstring for that method says "Ensure that all fields in the
> event struct are non-None", it doesn't mention any type conversions or other
> changes, so if we need some change here it should be documented properly.
>

For some reason I cant send unicode as payloads so I converted them into
strings

[2010-12-28 20:49:24,092] - ERROR - dbus.service - Unable to append
([Event([[u'25428', u'124', u'', u'', u'boo'], [Subject([u'file://yomomma',
u'', u'', u'', u'', u'', u''])], u'Hello World'])],) to message with
signature a(asaasay): <type 'exceptions.TypeError'>: an integer is required

I know its not optimal so I am fixing it with my next merge request

>
> ----------
>
> # seif: Python’s default encoding is ASCII, so whenever a character with
> # an ASCII value > 127 is in the input data, you’ll get a
> UnicodeDecodeError
> # because that character can’t be handled by the ASCII encoding.
>
> So why is it "except:" and not "except UnicodeDecodeError:"?
>
> Also, I'm not quite sure we want this exception. So we ignore it at this
> point, but what happens after that? We let SQLite see what it can do with
> it? What if this is an insert and we're placing crap into the database?
> Zeitgeist will crash once it retrieves it and tries to convert it to a
> unicode for sending it back through D-Bus.
>

My test case inserts this string as a payload into the DB and picks it out
again and send it for an assertion. I think we have that covered. But I will
change the Exception type

>
> ----------
>
> The commit is introducing again testDataSourcesRegistry, which was moved
> (and renamed) into another class.
>
> ----------
>

Sorry about that I am fixing it

> I believe it'd be nicer if the test cases didn't check the integer values,
> but the string instead, since this way they serve as documentation on how to
> recover the string.
>
> Eg. instead of checking for:
> > self.assertEquals(events[0].payload, map(ord, payload))
> ... you could check for:
> > self.assertEquals(payload, ''.join(map(chr,
> events[0].payload)).decode('utf-8'))
>

This won't really work since some payloads are actually pickles

>
> But feel free to ignore this last point.
> --
> https://code.launchpad.net/~seif/zeitgeist/fixing-payloads/+merge/44248
> You are the owner of lp:~seif/zeitgeist/fixing-payloads.
>

Revision history for this message
Siegfried Gevatter (rainct) wrote :

> This won't really work since some payloads are actually pickles
Yeah, the example was for the unicode one.

lp:~seif/zeitgeist/fixing-payloads updated
1662. By Seif Lotfy

fixed issues raised by Siegfried
checked for instances of unicode and convert to string
use UnicodeDecodeError for the exception

1663. By Seif Lotfy

removed redundant test case

Revision history for this message
Siegfried Gevatter (rainct) wrote :

(Another thing: testFindEventsWithNonASCIPayloa is missing a I in ASCII).

Looks acceptable to me, but I'd like a +1 from someone else since I'm not 100% sure what's going on.

review: Approve
lp:~seif/zeitgeist/fixing-payloads updated
1664. By Seif Lotfy

fix type in testcase name

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '_zeitgeist/engine/datamodel.py'
2--- _zeitgeist/engine/datamodel.py 2010-12-28 16:04:24 +0000
3+++ _zeitgeist/engine/datamodel.py 2010-12-28 21:35:08 +0000
4@@ -44,8 +44,11 @@
5 subject[n] = self._to_unicode(value)
6 # The payload require special handling, since it is binary data
7 # If there is indeed data here, we must not unicode encode it!
8- if self[2] is None: self[2] = u""
9-
10+ if self[2] is None:
11+ self[2] = u""
12+ elif isinstance(self[2], unicode):
13+ self[2] = str(self[2])
14+
15 @staticmethod
16 def get_plain(ev):
17 """
18
19=== modified file '_zeitgeist/engine/sql.py'
20--- _zeitgeist/engine/sql.py 2010-11-30 16:09:31 +0000
21+++ _zeitgeist/engine/sql.py 2010-12-28 21:35:08 +0000
22@@ -54,7 +54,14 @@
23 return obj
24 if isinstance(obj, str):
25 obj = obj.decode("UTF-8")
26- return unicode(obj)
27+ # seif: Python’s default encoding is ASCII, so whenever a character with
28+ # an ASCII value > 127 is in the input data, you’ll get a UnicodeDecodeError
29+ # because that character can’t be handled by the ASCII encoding.
30+ try:
31+ obj = unicode(obj)
32+ except UnicodeDecodeError, ex:
33+ pass
34+ return obj
35
36 def execute(self, statement, parameters=()):
37 parameters = [self.fix_unicode(p) for p in parameters]
38
39=== modified file 'test/remote-test.py'
40--- test/remote-test.py 2010-12-28 17:49:40 +0000
41+++ test/remote-test.py 2010-12-28 21:35:08 +0000
42@@ -9,6 +9,7 @@
43 import time
44 import tempfile
45 import shutil
46+import pickle
47 from subprocess import Popen, PIPE
48
49 # DBus setup
50@@ -312,8 +313,46 @@
51 mainloop.run()
52 self.assertEquals(len(result), 1)
53 self.assertEquals(result[0].actor, "firefox")
54-
55-
56+
57+ def testFindEventsWithStringPayload(self):
58+ mainloop = gobject.MainLoop()
59+ payload = "Hello World"
60+ def callback(ids):
61+ def callback2(events):
62+ mainloop.quit()
63+ self.assertEquals(events[0].payload, map(ord, payload))
64+ self.client.get_events(ids, callback2)
65+ events = [Event.new_for_values(actor=u"boo", timestamp=124, subject_uri="file://yomomma")]
66+ events[0].payload = payload
67+ self.client.insert_events(events, callback)
68+ mainloop.run()
69+
70+ def testFindEventsWithNonASCIIPayload(self):
71+ mainloop = gobject.MainLoop()
72+ payload = u"äöü".encode("utf-8")
73+ def callback(ids):
74+ def callback2(events):
75+ mainloop.quit()
76+ self.assertEquals(events[0].payload, map(ord, payload))
77+ self.client.get_events(ids, callback2)
78+ events = [Event.new_for_values(actor=u"boo", timestamp=124, subject_uri="file://yomomma")]
79+ events[0].payload = payload
80+ self.client.insert_events(events, callback)
81+ mainloop.run()
82+
83+ def testFindEventsWithBinaryPayload(self):
84+ mainloop = gobject.MainLoop()
85+ payload = pickle.dumps(1234)
86+ def callback(ids):
87+ def callback2(events):
88+ mainloop.quit()
89+ self.assertEquals(events[0].payload, map(ord, payload))
90+ self.client.get_events(ids, callback2)
91+ events = [Event.new_for_values(actor=u"boo", timestamp=124, subject_uri="file://yomomma")]
92+ events[0].payload = payload
93+ self.client.insert_events(events, callback)
94+ mainloop.run()
95+
96 class ZeitgeistRemoteInterfaceTest(unittest.TestCase):
97
98 def setUp(self):

Subscribers

People subscribed via source and target branches