Error rate calculations inappropriately include Autopilot/LRT errors

Bug #1324455 reported by Matthew Paul Thomas
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Daisy
Fix Released
High
Brian Murray
lrt
Fix Released
Undecided
Unassigned

Bug Description

When Autopilot or LRT triggers a crash, the error is submitted to errors.ubuntu.com just as it would be if the error was experienced by a machine in normal use.

The error is then, as usual, included in the error rate calculations and the occurrence counts. But both of these are problems.

The calculated error rate is naturally understood (or would be, if the axis was labelled) as the number of errors per day experienced by a machine in normal use. But the point of automated tests is that they encounter errors much more quickly than normal use will. This is an example of Campbell's Law: reducing the error rate is a good thing, but we could reduce the measured error rate by not running the automated tests any more, which would be a bad thing.

Including fuzzer errors in the occurrence counts is also a bad thing, because it may lead to poor prioritization of fixes. For example, imagine there was a crash whenever you switched apps less than half a second after revealing the Launcher. Humans would seldom encounter errors like this, but a fuzzer often would, and would submit it many times. So it would rank highly in the occurrences table, misleading developers into thinking that it was more important than errors humans are encountering more often.

If the automated error reports are a drop in the bucket, neither of these things matter, so this bug can be marked Won't Fix.

Otherwise, either Autopilot and LRT should override Apport's usual behavior, and report bugs directly to Launchpad rather than submitting errors; or Errors should ignore Autopilot/LRT reports when calculating error rates and counting occurrences. The latter would be more complicated, but would have the advantage that the automated tools might sometimes provide the only evidence that a bug remains unfixed in a new package version.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Brian points out that the same problem affects Autopilot. It's reporting errors to errors.ubuntu.com that inflate the measured error rate.

description: updated
summary: - Error rate calculations inappropriately include fuzzer errors
+ Error rate calculations inappropriately include Autopilot/LRT errors
Revision history for this message
Evan (ev) wrote :

Brian and I agreed that automated testing systems should change their CRASH_DB_IDENTIFIER to start with "testing" such that we can filter them out server side from incrementing counters.

Revision history for this message
Evan (ev) wrote :

For the sake of consistency, we'll say that it needs to start with deadbeef

Revision history for this message
Thomi Richards (thomir-deactivatedaccount) wrote :

Any advice on how we do that?

Revision history for this message
Brian Murray (brian-murray) wrote :

It can be set in either /etc/init/whoopsie.conf or as an environmental variable when whoopsie is started.

no longer affects: autopilot
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I'll update my crash ids to start with deadbeef

affects: errors → daisy
Changed in daisy:
status: New → In Progress
assignee: nobody → Brian Murray (brian-murray)
importance: Undecided → High
Revision history for this message
Brian Murray (brian-murray) wrote : Re: [Bug 1324455] Re: Error rate calculations inappropriately include Autopilot/LRT errors

On Thu, Aug 21, 2014 at 11:49:03PM -0000, Thomi Richards wrote:
> ** No longer affects: autopilot

Does that mean you've setup autopilot to use the suggested
CRASHDB_IDENTIFIER or does it mean something else?

--
Brian Murray

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I've moved my crash identifiers back to start without deadbeef until comment 8 can be fixed.

This is the code I use to set the id

exec_with_adb "sed -i '/CRASH_DB_IDENTIFIER/d' /etc/init/whoopsie.conf"
exec_with_adb "sed -i '/env CRASH_DB_URL=https:\/\/daisy.ubuntu.com/a env CRASH_DB_IDENTIFIER=$CRASH_ID' /etc/init/whoopsie.conf"
exec_with_adb "reboot"

Revision history for this message
Brian Murray (brian-murray) wrote :

On Fri, Aug 29, 2014 at 01:16:48PM -0000, Chris Gagnon wrote:
> I've moved my crash identifiers back to start without deadbeef until
> comment 8 can be fixed.
>
> This is the code I use to set the id
>
> exec_with_adb "sed -i '/CRASH_DB_IDENTIFIER/d' /etc/init/whoopsie.conf"
> exec_with_adb "sed -i '/env CRASH_DB_URL=https:\/\/daisy.ubuntu.com/a env CRASH_DB_IDENTIFIER=$CRASH_ID' /etc/init/whoopsie.conf"
> exec_with_adb "reboot"

Um, where does $CRASH_ID get set?

--
Brian Murray

Revision history for this message
Brian Murray (brian-murray) wrote :

The changes have been deployed on the daisy frontends and the retracers now.

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

the id gets set earlier in the script:

if [ $test_to_run == "lrt.test_random_gestures" ]; then
CRASH_ID=6ef5699f2b73679bb94cc77874f81c4822482114dd8bf912741713eb15bf61533b5d8176f0d870f758cc6f241445ffd3359debe1616c832a8ab3b7626d9ed369
fi

if [ $test_to_run == "lrt.test_switch" ]; then
CRASH_ID=2af4898017372a2ac165df1ed82002d7cbeeb0aad6adaa39a7a65fc29cb6c724878e3009de7787c88de01911bf874932edb8d11d7debc7506331e8fb5c99a9e3
fi
echo $CRASH_ID

if [ $test_to_run == "lrt.test_ap_core_apps" ]; then
CRASH_ID=9ef25926ad56ff7b1050f384439a7900acdd7dcce06e81cb051c3417320a19baa4f1769d9fe20e9b01ac5b970d7b1a339526d6edf572d9e2eb7fa3c3232d13c6
fi

I'll try again with the string starting with deadbeef

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

Changing the string to start with deadbeef causes the system identifier to be dropped from the report like in comment #8 again.

Revision history for this message
Brian Murray (brian-murray) wrote :

Ah, its because you've prepended 'deadbeef' to the crash id, making it too long, while you need to replace the first 8 characters with 'deadbeef'.

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

This has been working now that the id is not too long.

Changed in lrt:
status: New → Fix Released
Changed in daisy:
status: In Progress → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

FWIW I disagree with the change that was made for this bug. The net effect is that bugs that were being discovered automatically, and might happen quite frequently under test, are now hidden from view of the developers - and yet internally, developers are still being asked to fix the bugs found by automated tests.

Every crash that's found in autotesting is a real crash. Particularly while the real userbase of the phone is small, it's important to surface all of these crashes even if they've only ever been seen in the lab. If the crashes seen in the lab are skewing the statistics, there's one sure-fire way to correct this: drive the number of crashes in the lab down to zero!

Also, while the automated tests may skew the crash counts overall, one place where they shouldn't be skewing is on the per-image / per-rootfs counts - because each combination is usually only tested once, or a small number of times. So including automated tests in these counts will provide a much better indicator of image quality than omitting them.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Steve, no-one disputes that auto-testing crashes are real crashes. But the purpose of any defect tracker is to help developers make best use of their time, and driving "the number of crashes in the lab down to zero" is not necessarily the best use of their time. Imagine that crash A is triggered by humans once a day on average, but by LRT once an hour on average, while crash B is hourly for humans and daily for LRT. If an engineer has time to fix one of those for a particular release, and errors.ubuntu.com leads them to fix A instead of B, it has failed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.