No snaps appear in scope on Core 16

Bug #1645798 reported by Pat McGowan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical System Image
Won't Fix
Critical
Michał Sawicz
Unity8 Session Snap
Confirmed
Critical
Unassigned
unity-api (Ubuntu)
Invalid
Critical
Michi Henning
unity-scope-click (Ubuntu)
Won't Fix
Undecided
Unassigned
unity-scopes-api (Ubuntu)
Won't Fix
Critical
Michi Henning

Bug Description

No snaps appear in scope on Core 16 whereas they do appear in unity8 snap on classic

Changed in unity8-session-snap:
assignee: nobody → Michael Terry (mterry)
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Michael Terry (mterry) wrote :

Relevant section of unity8-dash.log when pulling down app scope to refresh it:

[2016-11-30:18:59:51.494] "clickscope" : results invalidated, programmatic: false , active: true
[2016-11-30:18:59:51.495] "clickscope" : Dispatching search: "" "" (programmatic: false )
[2016-11-30:19:00:36.496] "clickscope" : flushUpdates: #results = 0 finalize: true
[2016-11-30:19:00:36.496] "clickscope" : processPrimaryNavigationTag(): tag is ' "" '
[2016-11-30:19:00:36.496] "clickscope" : Processing 0 filters
[2016-11-30:19:00:36.496] "clickscope" : Removing all filters

Revision history for this message
Michael Terry (mterry) wrote :

Manually running the scoperunner gives an abort when refreshing the scope:

$ DBUS_SESSION_BUS_ADDRESS=`cat /run/user/0/dbus-session | cut -f2- -d=` HOME=/root/snap/unity8-session/264 SNAP_USER_DATA=/root/snap/unity8-session/264 SNAP_USER_COMMON=/root/snap/unity8-session/common SNAP_ARCH=amd64 SNAP=/snap/unity8-session/current U1_DEBUG=1 /root/snappyenv /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/scoperunner "" /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/clickapps/clickscope.ini

Failed to open departments db: Invalid departments database
WARNING: QApplication was not created in the main() thread.
(process:3952): ubuntu-app-launch-DEBUG: Initialized Click DB
scoperunner: /build/unity-scopes-api-WePnyG/unity-scopes-api-1.0.7+16.04.20161115/src/scopes/internal/zmq_middleware/ObjectAdapter.cpp:670: void unity::scopes::internal::zmq_middleware::ObjectAdapter::worker(): Assertion `buf.empty()' failed.
Aborted

Revision history for this message
Michael Terry (mterry) wrote :

The scoperunner does NOT crash if I run it under gdbserver. But I do see some errors in that case:

(process:4583): ubuntu-app-launch-DEBUG: Initialized Click DB
[2016-11-30 21:43:45.633] ERROR: clickscope: ObjectAdapter: error unmarshaling request header (id: , adapter: clickscope-c, op: ): ZmqReceiver::receive(): socket was closed
...
various snaps being found
...
[2016-11-30 21:43:45.690] ERROR: clickscope: ReplyImpl::error(): ThreadPool::submit(): cannot accept task for destroyed pool
[2016-11-30 21:43:45.693] ERROR: clickscope: ReplyImpl::error(): exception from finished(): ThreadPool::submit(): cannot accept task for destroyed pool
2016-11-30 21:43:45,707 - DEBUG - destroying search

I think I need to hand this off to someone that understands scopes better. Michi, is this something you would know about?

Changed in unity-api (Ubuntu):
assignee: nobody → Michi Henning (michihenning)
Revision history for this message
Michi Henning (michihenning) wrote :

I've added Pawel because of the "Invalid departments database" error.

I'm looking at the other errors. I suspect what's happening is that some of the local socket path names are wrong.

Revision history for this message
Michi Henning (michihenning) wrote :

Looking at /root/snap/unity8-session/common/.cache/upstart/scope-registry.log, there are lots of messages along the following lines:

scoperegistry: ignoring scope "clickscope": cannot create metadata: unity::SyscallException: RegistryObject::create_desktop_file: unable to create desktop file: "/root/snap/unity8-session/common/.local/share/applications/clickscope.desktop" (errno = 2)

The problem is that the applications subdirectory does not exist, but scopes expect that directory to be pre-installed. In turn, failure to create the desktop file causes the registry to not add metadata for local scopes.

After creating the directory, the errors go away, and unity8-dash.log looks clean. I'm seeing results from 7digital and amazon on a search (albeit without thumbnails) once the applications directory exists, which indicates that the scopes infrastructure is working.

Revision history for this message
Michi Henning (michihenning) wrote :

I would like to track down what causes the assertion failure in scopes-api regardless.

DBUS_SESSION_BUS_ADDRESS=`cat /run/user/0/dbus-session | cut -f2- -d=` HOME=/root/snap/unity8-session/264 SNAP_USER_DATA=/root/snap/unity8-session/264 SNAP_USER_COMMON=/root/snap/unity8-session/common SNAP_ARCH=amd64 SNAP=/snap/unity8-session/current U1_DEBUG=1 /root/snappyenv /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/scoperunner "" /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/clickapps/clickscope.ini

This won't run for me because I don't have /root/snappyenv. What's in that file? It looks like it's a script that sets up a bunch of env vars and execs the remainder of the command line?

Revision history for this message
Michi Henning (michihenning) wrote :

I'll push a branch to have the registry create the applications dir if it doesn't exist. That should be good enough for the moment. But the whole .desktop file generation thing is a hack. As far as we can work out, this was needed because online accounts refused to work unless there was a .desktop file.

Changed in unity-api (Ubuntu):
importance: Undecided → Critical
status: New → In Progress
Revision history for this message
Michael Terry (mterry) wrote :

@Michi, /root/snappyenv should be "/snap/unity8-session/current/snappyenv", my bad.

Revision history for this message
Michi Henning (michihenning) wrote :

The missing applications dir was coincidental and isn't the root cause of this problem.
The issue is racy, so I'm seeing different things in the logs each time.
Still digging...

Revision history for this message
Michi Henning (michihenning) wrote :

The assertion failure appears to be unrelated to the click scope not showing any results.
What is interesting is that the other scopes seems to work fine, it's only the click scope that is empty. With the missing applications dir in place, I'm not seeing any errors at all in the logs, just the empty search results from the click scope. This suggests that the scopes-api infrastructure is working and the problem is somewhere in click scope or its dependencies.

Revision history for this message
dobey (dobey) wrote :

@michi Is there at least the BFB result for the store? Does it show "Apps" at the top of the dash? Or is everything totally blank?

If answers are "yes, yes, no" then the result set is indeed just empty for some reason.
If answers are "no, yes, no" then either unity-scope-snappy is not installed and it's getting no results, or it's crashing/failing to complete the run.
If answers are "no, no, yes" then the system thinks the scope isn't available and it's not being run.

Revision history for this message
Michi Henning (michihenning) wrote :

Sorry, I don't know what a "BFB result" is. It does show "Apps". I've attached a screenshot.

Revision history for this message
dobey (dobey) wrote :

"BFB" is "Big Flavorful Button" ;)

It's the huge orange "Ubuntu Store" result that appears at the bottom of the scope.

So, from your screenshot it looks like you are hitting case number 2 from my previous post. Is there a crash report in /var/crash/ for the scoperunner? If not, does running "ubuntu-app-launch-appids" on this system show any snap results?

If this is the unity8 snap as well, it looks like in the screenshot that even System Settings isn't being found. With that and the missing store result, it seems like the scope is crashing or being killed before it can push any results. Can you attach the unity8-dash.log and scope-registry.log files? Normally they are in ~/.cache/upstart/ but not sure where exactly that equates to for the unity8 snap on core.

Revision history for this message
Michael Terry (mterry) wrote :

Michi, since it sounds like scoperunner is dying on you, try my scoperunner command up above again, with the right snappyenv. Here it is again (with fixed snappyenv):

DBUS_SESSION_BUS_ADDRESS=`cat /run/user/0/dbus-session | cut -f2- -d=` HOME=/root/snap/unity8-session/264 SNAP_USER_DATA=/root/snap/unity8-session/264 SNAP_USER_COMMON=/root/snap/unity8-session/common SNAP_ARCH=amd64 SNAP=/snap/unity8-session/current U1_DEBUG=1 /snap/unity8-session/current/snappyenv /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/scoperunner "" /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/clickapps/clickscope.ini

kevin gunn (kgunn72)
Changed in canonical-devices-system-image:
milestone: p1 → p2
Revision history for this message
Michi Henning (michihenning) wrote :

> "BFB" is "Big Flavorful Button" ;)

Ah, as in "BFG-9000" :-)

> So, from your screenshot it looks like you are hitting case number 2 from my previous post. Is
> there a crash report in /var/crash/ for the scoperunner?

/var/crash doesn't exist on a snappy machine. I have no idea where core dumps end up. It looks like core dumps are disabled. I can't find any cores anywhere.

> If not, does running "ubuntu-app-launch-appids" on this system show any snap results?

ubuntu-app-launch doesn't exist either.

> Can you attach the unity8-dash.log and scope-registry.log files? Normally they are in
> ~/.cache/upstart/ but not sure where exactly that equates to for the unity8 snap on core.

I'm seeing an unhandled exception from the click scope.

Registry log: pastebin.ubuntu.com/23602020

I suspect that the exception isn't coming from one of the threads that we hand to the scope. (We very carefully handle all exceptions that a scope might throw.) It seems more likely that the exception comes from a thread that wasn't created by the scopes run time.

The trail of exceptions from shutdown() is unsurprising because the scopes runtime realises that something isn't right and just logs the error.

unity8-dash log: pastebin.ubuntu.com/23602037

There is a line with "results invalidated". Not sure whether that is significant.

Note that no network is present inside the core 16 VM. I don't know how to get that working, and I don't know whether that relates to the crash. I set up the VM following the instructions Kevin posted:

https://docs.google.com/document/d/1o-jKITqUxRsujmN3OwRj3wRnn6dgblKuvrhKjeN8-Wc/edit?pli=1#heading=h.q06bivmaid8f

Changed in unity-scopes-api (Ubuntu):
importance: Undecided → Critical
assignee: nobody → Michi Henning (michihenning)
Changed in unity-api (Ubuntu):
status: In Progress → Invalid
Changed in unity-scopes-api (Ubuntu):
status: New → In Progress
Revision history for this message
Michi Henning (michihenning) wrote :

Michael, the command fails because /run/user/0/dbus-session doesn't exist.
I also looks like I have a different version installed. I see 264 and 277 (besides common) in /root/snap/unity8-session.

I adjusted your command to not set DBUS_SESSION_ADDRESS and explicitly set LD_LIBRARY_PATH (without using snappyenv):

1_DEBUG=1 SNAP_USER_COMMON=/root/snap/unity8-session/common SNAP_USER_DATA=/root/snap/unity8-session/277 SNAP_ARCH=amd64 HOME=/root/snap/unity8-session/277 LD_LIBRARY_PATH=/snap/unity8-session/current/usr/lib/x86_64-linux-gnu /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/scoperunner "" /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/clickapps/clickscope.ini

This yields:

# U1_DEBUG=1 SNAP_USER_COMMON=/root/snap/unity8-session/common SNAP_USER_DATA=/root/snap/unity8-session/277 SNAP_ARCH=amd64 HOME=/root/snap/unity8-session/277 LD_LIBRARY_PATH=/snap/unity8-session/current/usr/lib/x86_64-linux-gnu /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/scoperunner "" /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/clickapps/clickscope.ini QSqlDatabase: QSQLITE driver not loaded
QSqlDatabase: available drivers:
QSqlDatabase: an instance of QCoreApplication is required for loading driver plugins
Failed to open departments db: Cannot open departments database
[2016-12-09 05:56:44.991] ERROR: clickscope: unity::scopes::ConfigException: Cannot instantiate run time for clickscope, config file: :
    unity::scopes::MiddlewareException: cannot initialize zmq middleware for scope clickscope:
        unity::scopes::ConfigException: No endpoint directories specified, and XDG_RUNTIME_DIR environment variable not set
scoperunner: unity::scopes::ConfigException: Cannot instantiate run time for clickscope, config file: :
    unity::scopes::MiddlewareException: cannot initialize zmq middleware for scope clickscope:
        unity::scopes::ConfigException: No endpoint directories specified, and XDG_RUNTIME_DIR environment variable not set

Running this with snappyenv, I get:

# U1_DEBUG=1 SNAP_USER_COMMON=/root/snap/unity8-session/common SNAP_USER_DATA=/root/snap/unity8-session/277 SNAP_ARCH=amd64 HOME=/root/snap/unity8-session/277 /snap/unity8-session/current/snappyenv /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/scoperunner "" /snap/unity8-session/current/usr/lib/x86_64-linux-gnu/unity-scopes/clickapps/clickscope.ini ERROR: ld.so: object '/lib/libsnap-preload-shim.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
mkdir: cannot create directory ‘/lightdm-data’: Read-only file system
/snap/unity8-session/current/snappyenv: 113: exec: /bin/desktop-launch: not found

Revision history for this message
Michael Terry (mterry) wrote :

The "/bin/desktop-launch: not found" error is because you took out the SNAP= definition.

/run/user/0/dbus-session doesn't exist? You ran "sudo unity8-session" in a KVM?

Revision history for this message
dobey (dobey) wrote :

Michi, in that scope-registry.log you pastebinned, it looks like the ZMQ socket is getting closed/removed in the middle of calling Query in the scope:

(process:2658): ubuntu-app-launch-DEBUG: Initialized Click DB
[2016-12-09 05:20:22.797] ERROR: clickscope: ObjectAdapter: error unmarshaling request header (id: , adapter: clickscope-c, op: ): ZmqReceiver::receive(): socket was closed
<messages from UAL as we query for apps>
[2016-12-09 05:20:22.807] ERROR: clickscope: ReplyImpl::error(): unity::scopes::MiddlewareException: Cannot invoke operations while middleware is stopped

I'm not sure why this would happen, and I didn't realize we need to handle exceptions in the middleware, within the scope itself. Any idea why the socket would be getting destroyed from the other end? This seems to be why the scope is blank.

Revision history for this message
Michi Henning (michihenning) wrote :

@dobey: There is in the log: terminate called after throwing an instance of 'unity::scopes::MiddlewareException'

I don't know where this exception comes from. It would be great to find out. I believe that, if an exception is thrown (no matter what kind of exception) from one of the threads we send into the scope, the runtime will handle it correctly. (If not, that's a bug in scopes-api.) But, if the exception is thrown by a thread that wasn't created by the scopes runtime, there is obviously nothing much we can do. (In that case, the other errors are consequential errors, I'd say.)

@mterry: Yes, I did this in side the VM after starting the session. There is no /run/user/0/dbus-session.

Revision history for this message
dobey (dobey) wrote :

Michi, the exception itself seems to be from https://bazaar.launchpad.net/~unity-team/unity-scopes-api/trunk/view/head:/src/scopes/internal/zmq_middleware/ZmqMiddleware.cpp and getting bubbled up through the bits.

Judging from the logs, what appears to happen is that the scope is started, and initialized, and we are then sent a Query(), and we have a Reply object to push results back through. At that point, the socket seems to get closed and the middleware is stopped. The scope then has results and tries to push(), but since the middleware is stopped, we get the exception from deep inside the scopes API.

It's true that the scope itself could probably handle this very unexpected situation more gracefully on its own, but doing so won't really resolve the problem here of there not being any results, since the socket is closed and thus it's impossible for the scope to give any results back to the shell.

Revision history for this message
Michi Henning (michihenning) wrote :

I've spent pretty much all day trying to track this down. The debug cycle time with core 16 in a VM doesn't help :-(

What I can see is that crap comes out of the ZmqReceiver when an object adapter tries to dispatch an incoming request to the scope. The problem doesn't manifest itself in the same way on each run. Sometimes, the read from the socket returns zero bytes, which triggers the "socket was closed" error.

Other times, the number of bytes read doesn't divide by the word size, which triggers an "impossible message size" error.

Yet other times, things fall over in the ObjectAdapter::dispatch() method when something throws during unmarshaling (presumably because the data that was read is crap), which is reported as "error unmarshaling request header".

It all points to some problem with zmq not respecting framing boundaries or indicating that a socket has a message ready to read, but then delivering incomplete or crap data.

I don't understand at this point why this happens with core 16 and not on classic. The library versions for zmq match what I have on my desktop.

I'll try this again tomorrow on classic and see whether I can figure out what's going wrong by comparing trace.

Revision history for this message
Michi Henning (michihenning) wrote :

It looks like click-scope is the only scope that is failing. Queries to the music scope work just fine, as do queries to remote scopes (via smartscopesproxy). All these scopes run the exact same code in the runtime.

I have no idea what exactly is going wrong, but I strongly suspect that it is in click scope or one of the dependencies it calls into.

Note that the messages about the closed socket may well be consequential if a thread that wasn't created by the scopes run time throws. (If click-scope uses QtDBus, possibly an exception could escape from one of its threads? Or maybe net-cpp underneath or some such?)

It would be really useful to have some trace from click scope to see what it is up to when it receives a query.

Also, it looks like I'm not getting any core dumps in the unity8-session. Is there a way to enable those? It might give us a chance at getting a stack trace from the click-scope.

Revision history for this message
dobey (dobey) wrote :

You can run the scope with U1_DEBUG=1 set, and it will spew more debug info.

I doubt it's threads. We're not using QtDBus here really. The ubuntu-download-manager code does use it, but we're no longer really using it in the scope, only the client object gets created on startup, and then never used. There is an MP to remove the dependency on ubuntu-download-manager and libclick in https://bileto.ubuntu.com/#/ticket/2306, and I asked Michael to build a snap with it and try, but it was still failing for him with that.

I guess you'd need to run the scope manually under gdb to be able to get a trace, but I can't tell you how to do that on an all snaps system, unfortunately.

Revision history for this message
dobey (dobey) wrote :

Actually, since the app drawer has landed now, can we just remove unity-scope-click from the snap?

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Agree, marking this won't fix, we can re-open if we see other symptoms

Changed in canonical-devices-system-image:
status: Confirmed → Won't Fix
dobey (dobey)
Changed in unity-scope-click (Ubuntu):
status: New → Won't Fix
Changed in unity-scopes-api (Ubuntu):
status: In Progress → Won't Fix
Michael Terry (mterry)
Changed in unity8-session-snap:
assignee: Michael Terry (mterry) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.