Comment 3 for bug 1467932

Revision history for this message
Colin Ian King (colin-king) wrote :

There are a few issues that I spotted in the code:

1. The code make mistake a kernel thread as a normal user space process
2. If it thinks it is a kernel thread it puts the task name in [ ] brackets
3. The hashing is performed on the modified task name

So.. it may be that the crude kernel thread detection code failed, and then the process gets a different hash because the task name changed, and so we get a "new" hash timer entry, hence the large spike for a "new item" because the old history is wrong on incorrectly hashed timer stat.

I think the fixes needed are:

1. Add more levels of smarts into the kernel thread detection, namely:
    a) is cmdline zero length -> it is a kernel thread (should always work)
    b) is pgid of the pid zero -> it is a kernel thread (useful fallback)
    c) if can't determine above, compare against a database of known kernel threads (hacky, but consistent)

2. Keep two copies of the task name:
    a) The original
    b) The modified task name of it is a kernel thread

3. Hash on the pid, taskname, callback and timer function
    - if we get a process that dies and a new one matches a hash on that then I consider that very unlikely.