Merge lp:~seif/zeitgeist/fixing-payloads into lp:zeitgeist/0.1
- fixing-payloads
- Merge into 0.8-python
Status: | Merged |
---|---|
Merged at revision: | 1654 |
Proposed branch: | lp:~seif/zeitgeist/fixing-payloads |
Merge into: | lp:zeitgeist/0.1 |
Diff against target: |
98 lines (+54/-5) 3 files modified
_zeitgeist/engine/datamodel.py (+5/-2) _zeitgeist/engine/sql.py (+8/-1) test/remote-test.py (+41/-2) |
To merge this branch: | bzr merge lp:~seif/zeitgeist/fixing-payloads |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Siegfried Gevatter | Approve | ||
Mikkel Kamstrup Erlandsen | Needs Information | ||
Review via email: mp+44248@code.launchpad.net |
Commit message
Description of the change
This branch fixes
https:/
by mapping the bytes in the payload to ord before sending it out to the client
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
Seif Lotfy (seif) wrote : | # |
huh? example ?
Seif Lotfy (seif) wrote : | # |
I added another example using non-ascii unicorde character and got it to work by fixing an issue in the UnicodeCursor where
if isinstance(obj, str):
obj = obj.decode("UTF-8")
return unicode(obj)
and according to the python channel
<Peng> seiflotfy: test.decode(
Markus Korn (thekorn) wrote : | # |
sorry, but this is taking a completely weird direction.
First of all,
test/
is failing for me, but this is not the main issue.
My main problem is: we always had the principle that what ever a user puts into the log comes out in the same way, but with this branch it changes dramatically, something like
>>> event.payload = "Hallo world"
>>> ids = engine.
>>> result = engine.
>>> assert event.payload == result.payload
will not work anymore, instead you have to use
>>> assert map(ord, event.payload) == result.payload
And worst of all, to get a `human readable` representation of the results payload you have to
>>> u"".join(
Also the `fix` in UnicodeCursor is wrong. I agree the unicode() is redundant for the case of an str argument, but it isn't for floats and other random datatypes.
There seems to be a lot of trouble with payloads recently, seems like we have to fix payloads in a sane way, and not by doing some random changes at some places until it *seems* to work. At the end it is mostly an issue of how we define `payload`, internally and to the outside world - one of the issues here is that the phrase "binary" is misleading, I will follow up on this thoughts on the mailing list, this MP is not the right place.
Seif Lotfy (seif) wrote : | # |
Ofcourse it wont work. If you send a payload from the client to the engine
it will be an array of bytes. It is translated that way by dbus.
This means u wont get ur "Hello World" string on the other side.
The problem in the first place was that remotely we couldnt get events with
payloads. So this is where this all starts.
Try the the testEventWithSt
the client it means we are facing an internal issue.
I narrowed it down and I think my approach is actually right. I mean if u
send a binary u need to decode it to start something with it.
The test cases work.
The only other solution imho would be to change payload to a string instead
of an array of bytes.
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
On 21 December 2010 11:58, Seif Lotfy <email address hidden> wrote:
> Ofcourse it wont work. If you send a payload from the client to the engine
> it will be an array of bytes. It is translated that way by dbus.
> This means u wont get ur "Hello World" string on the other side.
>
> The problem in the first place was that remotely we couldnt get events with
> payloads. So this is where this all starts.
> Try the the testEventWithSt
> the client it means we are facing an internal issue.
> I narrowed it down and I think my approach is actually right. I mean if u
> send a binary u need to decode it to start something with it.
> The test cases work.
>
> The only other solution imho would be to change payload to a string instead
> of an array of bytes.
Technically a string is *exactly identical* to an array of bytes. "The
difference is the same" as we say in Denmark :-) It's just the Python
logic here that messes it up, in C, fx., it's just a matter of how you
cast the pointers.
The contract that makes sense is twofold:
1) In the Python client API: event_before.
2) On the DBus wire level the same invariant: event_before[2] == event_after[2]
So this is what the unit tests should assert. It seems we have a bug
here and that needs fixing of course - and a non-invasive fix like
Seif's patch here seems ok by me, even if there may theoretically
cleaner ways we can do this. Given no concrete alternatives we should
merge Seif's branch given that it implements the contract as I
outline.
That's my take on this at least.
- 1656. By Seif Lotfy
-
add a try, except statement
- 1657. By Seif Lotfy
-
remove whitespace
- 1658. By Seif Lotfy
-
fix remote test case
- 1659. By Seif Lotfy
-
remove try except statement
- 1660. By Seif Lotfy
-
removed redundant unicode casting and explained the try statement
- 1661. By Seif Lotfy
-
merge with trunk
Added new test cases and explainet the try exception
Siegfried Gevatter (rainct) wrote : | # |
Good work. Some comments though:
> self[2] = str(self[2])
What is this doing?
Also, the docstring for that method says "Ensure that all fields in the event struct are non-None", it doesn't mention any type conversions or other changes, so if we need some change here it should be documented properly.
----------
# seif: Python’s default encoding is ASCII, so whenever a character with
# an ASCII value > 127 is in the input data, you’ll get a UnicodeDecodeError
# because that character can’t be handled by the ASCII encoding.
So why is it "except:" and not "except UnicodeDecodeEr
Also, I'm not quite sure we want this exception. So we ignore it at this point, but what happens after that? We let SQLite see what it can do with it? What if this is an insert and we're placing crap into the database? Zeitgeist will crash once it retrieves it and tries to convert it to a unicode for sending it back through D-Bus.
----------
The commit is introducing again testDataSources
----------
I believe it'd be nicer if the test cases didn't check the integer values, but the string instead, since this way they serve as documentation on how to recover the string.
Eg. instead of checking for:
> self.assertEqua
... you could check for:
> self.assertEqua
But feel free to ignore this last point.
Seif Lotfy (seif) wrote : | # |
On Tue, Dec 28, 2010 at 8:31 PM, Siegfried Gevatter <email address hidden>wrote:
> Review: Needs Fixing
> Good work. Some comments though:
>
>
> > self[2] = str(self[2])
>
> What is this doing?
>
> Also, the docstring for that method says "Ensure that all fields in the
> event struct are non-None", it doesn't mention any type conversions or other
> changes, so if we need some change here it should be documented properly.
>
For some reason I cant send unicode as payloads so I converted them into
strings
[2010-12-28 20:49:24,092] - ERROR - dbus.service - Unable to append
([Event([[u'25428', u'124', u'', u'', u'boo'], [Subject(
u'', u'', u'', u'', u'', u''])], u'Hello World'])],) to message with
signature a(asaasay): <type 'exceptions.
I know its not optimal so I am fixing it with my next merge request
>
> ----------
>
> # seif: Python’s default encoding is ASCII, so whenever a character with
> # an ASCII value > 127 is in the input data, you’ll get a
> UnicodeDecodeError
> # because that character can’t be handled by the ASCII encoding.
>
> So why is it "except:" and not "except UnicodeDecodeEr
>
> Also, I'm not quite sure we want this exception. So we ignore it at this
> point, but what happens after that? We let SQLite see what it can do with
> it? What if this is an insert and we're placing crap into the database?
> Zeitgeist will crash once it retrieves it and tries to convert it to a
> unicode for sending it back through D-Bus.
>
My test case inserts this string as a payload into the DB and picks it out
again and send it for an assertion. I think we have that covered. But I will
change the Exception type
>
> ----------
>
> The commit is introducing again testDataSources
> (and renamed) into another class.
>
> ----------
>
Sorry about that I am fixing it
> I believe it'd be nicer if the test cases didn't check the integer values,
> but the string instead, since this way they serve as documentation on how to
> recover the string.
>
> Eg. instead of checking for:
> > self.assertEqua
> ... you could check for:
> > self.assertEqua
> events[
>
This won't really work since some payloads are actually pickles
>
> But feel free to ignore this last point.
> --
> https:/
> You are the owner of lp:~seif/zeitgeist/fixing-payloads.
>
Siegfried Gevatter (rainct) wrote : | # |
> This won't really work since some payloads are actually pickles
Yeah, the example was for the unicode one.
- 1662. By Seif Lotfy
-
fixed issues raised by Siegfried
checked for instances of unicode and convert to string
use UnicodeDecodeError for the exception - 1663. By Seif Lotfy
-
removed redundant test case
Siegfried Gevatter (rainct) wrote : | # |
(Another thing: testFindEventsW
Looks acceptable to me, but I'd like a +1 from someone else since I'm not 100% sure what's going on.
- 1664. By Seif Lotfy
-
fix type in testcase name
Preview Diff
1 | === modified file '_zeitgeist/engine/datamodel.py' |
2 | --- _zeitgeist/engine/datamodel.py 2010-12-28 16:04:24 +0000 |
3 | +++ _zeitgeist/engine/datamodel.py 2010-12-28 21:35:08 +0000 |
4 | @@ -44,8 +44,11 @@ |
5 | subject[n] = self._to_unicode(value) |
6 | # The payload require special handling, since it is binary data |
7 | # If there is indeed data here, we must not unicode encode it! |
8 | - if self[2] is None: self[2] = u"" |
9 | - |
10 | + if self[2] is None: |
11 | + self[2] = u"" |
12 | + elif isinstance(self[2], unicode): |
13 | + self[2] = str(self[2]) |
14 | + |
15 | @staticmethod |
16 | def get_plain(ev): |
17 | """ |
18 | |
19 | === modified file '_zeitgeist/engine/sql.py' |
20 | --- _zeitgeist/engine/sql.py 2010-11-30 16:09:31 +0000 |
21 | +++ _zeitgeist/engine/sql.py 2010-12-28 21:35:08 +0000 |
22 | @@ -54,7 +54,14 @@ |
23 | return obj |
24 | if isinstance(obj, str): |
25 | obj = obj.decode("UTF-8") |
26 | - return unicode(obj) |
27 | + # seif: Python’s default encoding is ASCII, so whenever a character with |
28 | + # an ASCII value > 127 is in the input data, you’ll get a UnicodeDecodeError |
29 | + # because that character can’t be handled by the ASCII encoding. |
30 | + try: |
31 | + obj = unicode(obj) |
32 | + except UnicodeDecodeError, ex: |
33 | + pass |
34 | + return obj |
35 | |
36 | def execute(self, statement, parameters=()): |
37 | parameters = [self.fix_unicode(p) for p in parameters] |
38 | |
39 | === modified file 'test/remote-test.py' |
40 | --- test/remote-test.py 2010-12-28 17:49:40 +0000 |
41 | +++ test/remote-test.py 2010-12-28 21:35:08 +0000 |
42 | @@ -9,6 +9,7 @@ |
43 | import time |
44 | import tempfile |
45 | import shutil |
46 | +import pickle |
47 | from subprocess import Popen, PIPE |
48 | |
49 | # DBus setup |
50 | @@ -312,8 +313,46 @@ |
51 | mainloop.run() |
52 | self.assertEquals(len(result), 1) |
53 | self.assertEquals(result[0].actor, "firefox") |
54 | - |
55 | - |
56 | + |
57 | + def testFindEventsWithStringPayload(self): |
58 | + mainloop = gobject.MainLoop() |
59 | + payload = "Hello World" |
60 | + def callback(ids): |
61 | + def callback2(events): |
62 | + mainloop.quit() |
63 | + self.assertEquals(events[0].payload, map(ord, payload)) |
64 | + self.client.get_events(ids, callback2) |
65 | + events = [Event.new_for_values(actor=u"boo", timestamp=124, subject_uri="file://yomomma")] |
66 | + events[0].payload = payload |
67 | + self.client.insert_events(events, callback) |
68 | + mainloop.run() |
69 | + |
70 | + def testFindEventsWithNonASCIIPayload(self): |
71 | + mainloop = gobject.MainLoop() |
72 | + payload = u"äöü".encode("utf-8") |
73 | + def callback(ids): |
74 | + def callback2(events): |
75 | + mainloop.quit() |
76 | + self.assertEquals(events[0].payload, map(ord, payload)) |
77 | + self.client.get_events(ids, callback2) |
78 | + events = [Event.new_for_values(actor=u"boo", timestamp=124, subject_uri="file://yomomma")] |
79 | + events[0].payload = payload |
80 | + self.client.insert_events(events, callback) |
81 | + mainloop.run() |
82 | + |
83 | + def testFindEventsWithBinaryPayload(self): |
84 | + mainloop = gobject.MainLoop() |
85 | + payload = pickle.dumps(1234) |
86 | + def callback(ids): |
87 | + def callback2(events): |
88 | + mainloop.quit() |
89 | + self.assertEquals(events[0].payload, map(ord, payload)) |
90 | + self.client.get_events(ids, callback2) |
91 | + events = [Event.new_for_values(actor=u"boo", timestamp=124, subject_uri="file://yomomma")] |
92 | + events[0].payload = payload |
93 | + self.client.insert_events(events, callback) |
94 | + mainloop.run() |
95 | + |
96 | class ZeitgeistRemoteInterfaceTest(unittest.TestCase): |
97 | |
98 | def setUp(self): |
Can you the Hello World test case to also use non-ascii unicode characters?