Comment 8 for bug 1860013

Revision history for this message
dann frazier (dannf) wrote :

I attempted to bisect this, using the following process:
  - Run the kernel-build-reboot-loop test on 3 machines in parallel
    I used 2 CRB1S systems (anuchin, bestovius) and 1 R120-T33 (seidel)
  - If any machine crashes w/ the parity error message, consider it failed
  - If all machines survive over night, consider it "OK".

Unfortunately, the commit it landed on looks bogus:

# first bad commit: [852643165aea0999bb862b36511c5b9f6b11449f] fs//binfmt_elf.c: move variables initialization closer to their usage
(Reverse bisect - this would in theory be the commit that *fixed* it)

Just in case, I tried reverting that commit from 5.5-rc6. As noted in comment #2, 5.5-rc6 seems immune to this problem. Reverting the commit didn't change that - 5.5-rc6 still survived over night.

Note: Of the 3 systems, anuchin was usually the one that failed during the bisect. It could be that this is a generic hw issue, and anuchin is just more severely impacted than the others. It could also be that this symptom can be caused by both a sw and a hw issue, and anuchin is impacted by the hw part, making it a bad choice for a bisect. Either way, bisection seems like a poor strategy for identifying the issue.