Daisy

Error rate calculations inappropriately include Autopilot/LRT errors

Bug #1324455 reported by Matthew Paul Thomas on 2014-05-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Daisy	Fix Released	High	Brian Murray
	lrt	Fix Released	Undecided	Unassigned

Bug Description

When Autopilot or LRT triggers a crash, the error is submitted to errors.ubuntu.com just as it would be if the error was experienced by a machine in normal use.

The error is then, as usual, included in the error rate calculations and the occurrence counts. But both of these are problems.

The calculated error rate is naturally understood (or would be, if the axis was labelled) as the number of errors per day experienced by a machine in normal use. But the point of automated tests is that they encounter errors much more quickly than normal use will. This is an example of Campbell's Law: reducing the error rate is a good thing, but we could reduce the measured error rate by not running the automated tests any more, which would be a bad thing.

Including fuzzer errors in the occurrence counts is also a bad thing, because it may lead to poor prioritization of fixes. For example, imagine there was a crash whenever you switched apps less than half a second after revealing the Launcher. Humans would seldom encounter errors like this, but a fuzzer often would, and would submit it many times. So it would rank highly in the occurrences table, misleading developers into thinking that it was more important than errors humans are encountering more often.

If the automated error reports are a drop in the bucket, neither of these things matter, so this bug can be marked Won't Fix.

Otherwise, either Autopilot and LRT should override Apport's usual behavior, and report bugs directly to Launchpad rather than submitting errors; or Errors should ignore Autopilot/LRT reports when calculating error rates and counting occurrences. The latter would be more complicated, but would have the advantage that the automated tools might sometimes provide the only evidence that a bug remains unfixed in a new package version.

See original description

Related branches

lp:daisy

lp:~daisy-pluckers/oops-repository/trunk

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2014-06-02:

Brian points out that the same problem affects Autopilot. It's reporting errors to errors.ubuntu.com that inflate the measured error rate.

Matthew Paul Thomas (mpt) on 2014-06-02

description:	updated
summary:	- Error rate calculations inappropriately include fuzzer errors + Error rate calculations inappropriately include Autopilot/LRT errors

Revision history for this message

Evan (ev) wrote on 2014-08-18:

Brian and I agreed that automated testing systems should change their CRASH_DB_IDENTIFIER to start with "testing" such that we can filter them out server side from incrementing counters.

Revision history for this message

Evan (ev) wrote on 2014-08-18:

For the sake of consistency, we'll say that it needs to start with deadbeef

Revision history for this message

Thomi Richards (thomir-deactivatedaccount) wrote on 2014-08-18:

Any advice on how we do that?

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-08-18:

It can be set in either /etc/init/whoopsie.conf or as an environmental variable when whoopsie is started.

Thomi Richards (thomir-deactivatedaccount) on 2014-08-21

no longer affects:

autopilot

Revision history for this message

Chris Gagnon (chris.gagnon) wrote on 2014-08-22:

I'll update my crash ids to start with deadbeef

Brian Murray (brian-murray) on 2014-08-26

affects:	errors → daisy
Changed in daisy:
status:	New → In Progress
assignee:	nobody → Brian Murray (brian-murray)
importance:	Undecided → High

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-08-26: Re: [Bug 1324455] Re: Error rate calculations inappropriately include Autopilot/LRT errors

On Thu, Aug 21, 2014 at 11:49:03PM -0000, Thomi Richards wrote:
> ** No longer affects: autopilot

Does that mean you've setup autopilot to use the suggested
CRASHDB_IDENTIFIER or does it mean something else?

--
Brian Murray

Revision history for this message

Chris Gagnon (chris.gagnon) wrote on 2014-08-27:

I don't have a system identifier when I use deadbeef at the start of the string, is there something that can be done to make it show up on errors.u.c?

Starts with deadbeef:
https://errors.ubuntu.com/oops/f1507d14-2e22-11e4-a636-fa163e339c81

https://errors.ubuntu.com/user/deadbeef6ef5699f2b73679bb94cc77874f81c4822482114dd8bf912741713eb15bf61533b5d8176f0d870f758cc6f241445ffd3359debe1616c832a8ab3b7626d9ed369

Doesn't start with deadbeef:

https://errors.ubuntu.com/oops/412a7372-2e18-11e4-abd2-fa163e22e467

https://errors.ubuntu.com/user/6ef5699f2b73679bb94cc77874f81c4822482114dd8bf912741713eb15bf61533b5d8176f0d870f758cc6f241445ffd3359debe1616c832a8ab3b7626d9ed369

Revision history for this message

Chris Gagnon (chris.gagnon) wrote on 2014-08-29:

I've moved my crash identifiers back to start without deadbeef until comment 8 can be fixed.

This is the code I use to set the id

exec_with_adb "sed -i '/CRASH_DB_IDENTIFIER/d' /etc/init/whoopsie.conf"
exec_with_adb "sed -i '/env CRASH_DB_URL=https:\/\/daisy.ubuntu.com/a env CRASH_DB_IDENTIFIER=$CRASH_ID' /etc/init/whoopsie.conf"
exec_with_adb "reboot"

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-08-29:

#10

On Fri, Aug 29, 2014 at 01:16:48PM -0000, Chris Gagnon wrote:
> I've moved my crash identifiers back to start without deadbeef until
> comment 8 can be fixed.
>
> This is the code I use to set the id
>
> exec_with_adb "sed -i '/CRASH_DB_IDENTIFIER/d' /etc/init/whoopsie.conf"
> exec_with_adb "sed -i '/env CRASH_DB_URL=https:\/\/daisy.ubuntu.com/a env CRASH_DB_IDENTIFIER=$CRASH_ID' /etc/init/whoopsie.conf"
> exec_with_adb "reboot"

Um, where does $CRASH_ID get set?

--
Brian Murray

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-08-29:

#11

The changes have been deployed on the daisy frontends and the retracers now.

Revision history for this message

Chris Gagnon (chris.gagnon) wrote on 2014-09-04:

#12

the id gets set earlier in the script:

if [ $test_to_run == "lrt.test_random_gestures" ]; then
CRASH_ID=6ef5699f2b73679bb94cc77874f81c4822482114dd8bf912741713eb15bf61533b5d8176f0d870f758cc6f241445ffd3359debe1616c832a8ab3b7626d9ed369
fi

if [ $test_to_run == "lrt.test_switch" ]; then
CRASH_ID=2af4898017372a2ac165df1ed82002d7cbeeb0aad6adaa39a7a65fc29cb6c724878e3009de7787c88de01911bf874932edb8d11d7debc7506331e8fb5c99a9e3
fi
echo $CRASH_ID

if [ $test_to_run == "lrt.test_ap_core_apps" ]; then
CRASH_ID=9ef25926ad56ff7b1050f384439a7900acdd7dcce06e81cb051c3417320a19baa4f1769d9fe20e9b01ac5b970d7b1a339526d6edf572d9e2eb7fa3c3232d13c6
fi

I'll try again with the string starting with deadbeef

Revision history for this message

Chris Gagnon (chris.gagnon) wrote on 2014-09-10:

#13

Changing the string to start with deadbeef causes the system identifier to be dropped from the report like in comment #8 again.

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-09-15:

#14

Ah, its because you've prepended 'deadbeef' to the crash id, making it too long, while you need to replace the first 8 characters with 'deadbeef'.

Revision history for this message

Chris Gagnon (chris.gagnon) wrote on 2014-09-23:

#15

This has been working now that the id is not too long.

Changed in lrt:
status:	New → Fix Released

Brian Murray (brian-murray) on 2014-09-26

Changed in daisy:
status:	In Progress → Fix Released

Revision history for this message

Steve Langasek (vorlon) wrote on 2014-10-14:

#16

FWIW I disagree with the change that was made for this bug. The net effect is that bugs that were being discovered automatically, and might happen quite frequently under test, are now hidden from view of the developers - and yet internally, developers are still being asked to fix the bugs found by automated tests.

Every crash that's found in autotesting is a real crash. Particularly while the real userbase of the phone is small, it's important to surface all of these crashes even if they've only ever been seen in the lab. If the crashes seen in the lab are skewing the statistics, there's one sure-fire way to correct this: drive the number of crashes in the lab down to zero!

Also, while the automated tests may skew the crash counts overall, one place where they shouldn't be skewing is on the per-image / per-rootfs counts - because each combination is usually only tested once, or a small number of times. So including automated tests in these counts will provide a much better indicator of image quality than omitting them.

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2014-10-15:

#17

Steve, no-one disputes that auto-testing crashes are real crashes. But the purpose of any defect tracker is to help developers make best use of their time, and driving "the number of crashes in the lab down to zero" is not necessarily the best use of their time. Imagine that crash A is triggered by humans once a day on average, but by LRT once an hour on average, while crash B is hourly for humans and daily for LRT. If an engineer has time to fix one of those for a particular release, and errors.ubuntu.com leads them to fix A instead of B, it has failed.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.