Shorts converts & into & when reading RDF feeds

Bug #1543995 reported by Smurphy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical System Image
Confirmed
Medium
Alan Pope 🍺🐧🐱 🦄
Ubuntu Shorts App
Fix Committed
Medium
Unassigned

Bug Description

When opening a RDF Feed, shorts converts the & char into & which is interpreted as a parameter in the url.
Check the GET Line. The &bin.... is changed into &bin_id ...
Example is taken from my blog URL:
http://stargate.solsys.org/mod.php?mod=blog&op=rdf&user=102

Example from RDF File read from Kjots -> KDE:
GET /bin.php?bin=get&bin_id=774 HTTP/1.1
Host: stargate.solsys.org
Connection: keep-alive
User-Agent: Mozilla/5.0 (X11) KHTML/5.18.0 (like Gecko) Konqueror/5.18
Referer: http://stargate.solsys.org/mod.php?mod=blog&op=view&view=294&expand=yes&akregatorPreviewMode=true
Pragma: no-cache
Cache-control: no-cache
Accept: image/png, image/jpeg, video/x-mng, image/jp2, image/gif;q=0.5,*/*;q=0.1
Accept-Encoding: gzip, deflate, x-gzip, x-deflate
Accept-Charset: utf-8,*;q=0.5
Accept-Language: en-US,fr;q=0.9,de;q=0.8,en;q=0.7
Cookie: PHPSESSID=xxxxxxxxxxxxxxxxxxxxxxxxxxx

HTTP/1.1 200 OK
Date: Wed, 10 Feb 2016 11:02:44 GMT
Server: Apache/2.4.7 (Ubuntu) OpenSSL/1.0.1f
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-disposition: filename="navod.jpg"
Content-Length: 4517
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: image/jpeg

Shorts request:
GET /bin.php?bin=get&bin_id=774 HTTP/1.1
Cookie: PHPSESSID=xxxxxxxxxxxxxxxxxxxxxxxxx
Connection: Keep-Alive
Accept-Encoding: gzip, deflate
Accept-Language: en-US,*
User-Agent: Mozilla/5.0
Host: stargate.solsys.org

HTTP/1.1 401 Bad Request
Date: Wed, 10 Feb 2016 10:23:29 GMT
Server: Apache/2.4.7 (Ubuntu) OpenSSL/1.0.1f
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Length: 2563
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

Tags: ota9 rss shorts

Related branches

Changed in ubuntu-rssreader-app:
status: New → Confirmed
importance: Undecided → Medium
Changed in canonical-devices-system-image:
assignee: nobody → Alan Pope  (popey)
importance: Undecided → Medium
milestone: none → backlog
status: New → Confirmed
Revision history for this message
Joey Chan (qqworini) wrote :

“When opening a RDF Feed”

I do not understand, how did u "open" that feed ? ~

Revision history for this message
Smurphy (smurphy-linux) wrote :

Just add the feed to Shorts - and update the feed. Every request going to that site (for news) will issue a URL Get with the & converted into a & as in the packet-capture(in Wireshark -> Follow TCP Stream) shows.

Revision history for this message
Joey Chan (qqworini) wrote :

The attachment is a screenshot of your example URL, following is the steps of what I did:

1. swipe up to show the "search" page;
2. paste "http://stargate.solsys.org/mod.php?mod=blog&op=rdf&user=102" to the search field, hit enter;
3. result list only shows one result, select it then next
4. add to a topic, hit enter;
5. refresh then I can see all the articles.

/////////////////////////////////////////////////

Second test, I turn off the google engine in setting page, then repeat above steps, I found Shorts return nothing due to it does not support RDF format yet, Wireshark shows the server returns http 200, and I can see whole xml content in wireshark.

Pls correct my steps to reproduce this bug.

Revision history for this message
Smurphy (smurphy-linux) wrote :

That's the wrong approach for reproducing it.
The data is getting there using the Google Search all right.
I have (from work) also a Monitoring system that is analysing the actuall HTTP Traffic on the OSI Layer and that's the one who notified me on the error.
While the results are partly correct, I noticed the that the requests (when Google Search is disabled), that the "&" signs are replaced with & which is breaking the actual URL.
Check the attached wireshark screenshot. I can send you the packet capture if you want. Don't want to expose it to the net.
These "modified" entries are actually pictures embedded in the blog entry (RDF/RSS Feed), that need to be extracted through a call to a database. Hence the ID requirement.
I suspect that the entries inside the HTML Code being tagged as images get the URL normalized somehow, converting the & into &
I also attached a screenshot from the Monitoring Tool.

PS: Was skiing ;) sorry for the late reply.

Revision history for this message
Smurphy (smurphy-linux) wrote :

Monitoring tool output.

Revision history for this message
Joey Chan (qqworini) wrote :

Thx for your detail info :) should be useful

Before I fix this bug, pls keep using the "Google" mode

Revision history for this message
Jenkins Bot (ubuntu-core-apps-jenkins-bot) wrote :

Fix committed into lp:ubuntu-rssreader-app at revision 453, scheduled for release in ubuntu-rssreader-app, milestone alpha-1

Changed in ubuntu-rssreader-app:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.