Stockwell

Reduce the impact of intermittent tests

Teammates

```
Geoff Brown (:gbrown)
```
```
Joel Maher (:jmaher)
```
```
William Lachance (:wlach)
```

What is the impact?

```
Sheriff time to star failures
```
```
Developer distractions
```
```
more tooling needed
```
```
increased load on limited resources
```
```
How do intermittents impact your job?
```

August 29th 2014: OrangeFactor 2.49
August 29th 2016: OrangeFactor 26.98

What happened?

What has changed?

```
Hired sheriffs
```
```
More platforms
```
```
More configurations (e10s, asan)
```
```
More tests and test suites
```
```
Many changes in Firefox
```

What do we run?

```
19 build/config types
```
```
1.05M possible tests/push
```
```
490K tests run/push on average
```
```
11 failures / push (OF=11.0)
```

How many intermittents?

```
Between 700 and 950 bugs / week
```

For 6 months (april-september):

```
7332 bugs occurred / 249279 failures
```
```
3310 bugs occurred <10 times
```
```
6018(82%) low frequency = 14% failures
```
```
560(7%) high frequency = 68% failures
```

What is intermittent?

```
High frequency >=50 times/week
```
```
Medium frequency 10<x<50 times/week
```
```
Low frequency <=10 times/week
```

What is your definition of intermittent?

What fails?

```
test timeouts
```
```
test failures
```
```
harness/task timeouts
```
```
Firefox crash/leak/assertion/hang
```
```
harness/infrastructure
```

Bad tests?

```
Majority of fixes are test fixes
```

178 mochitests do not run with --repeat

```
many uses of setTimeout()
```
```
poor use of api's
```
```
old tests written for old Firefox
```

Do we care?

```
Talked to dozens of engineers
```
```
Everyone wants to help 
```

Not all intermittents have a clear owner

```
Engineers have deliverables
```
```
Engineers don't want to waste time
```

What prevents you from fixing intermittent tests?

Experiments in Q4

```
quarantine jobs
```
```
test-lint jobs 
```
```
manual triage
```
```
OrangeFactor enhancements
```

Quarantine jobs

```
Always orange, long run times
```
```
Difficult to hack manifests
```
```
Leaks/Crashes/etc. still in other jobs
```

These would be ignored, unclear of value

Test Lint

Run extra tests on new/edited test cases

```
Did this for mochitests- 178 failures
```
```
Improves trust in tests
```
```
Will deploy in Q1 for mochitest
```
```
What causes you to not trust tests?
```

Manual Triage

```
In 2 weeks dropped OF from 23 to 11
```
```
Many patterns between bugs
```
```
Added info to make bugs actionable
```
```
Will continue to do this in Q1
```

Orange Factor++

bugzilla comments improved

```
relative frequency
```
```
ranking and priority
```

updated dev.tree-alerts to highlight number of high/mid/low frequency bugs

The Master Plan

Accept the fact that intermittents are here to stay

Develop a positive relationship with intermittent failures

Intermittent test failures are not seen on treeherder

On January 4, 2018- what would you expect to see?

Q1 Plan

```
P1 intermittents >=30 times/week
```

Make triaging easier

```
doing it full time
```
```
finding test owners
```
```
component filters on OrangeFactor
```

Increase confidence in tests/bugs

```
test-lint jobs
```
```
more data in bugs
```

```
More Experiments
```

More Experiments?
Don't you have enough data?


What experiments should we be doing?

Q1+ - More Experiments

Dashboards - more data for you

```
Triage bugs by component in OF
```
```
Disabled bugs in your component
```
```
New bugs in your component
```

Q1+ - More Experiments

Triage++

```
Identify common actionable data
```
```
List of data to include in new bugs
```
```
Create tools for getting common data
```
```
Identify spikes in occurrences faster
```

Q2+ - More Experiments

Reduce Noise / Better Tests

```
Improve auto classification
```

Consider ignoring low frequency failures

Look at rr chaos mode for the lint jobs

```
Best practices for writing, reviewing
```

Our Expectations

```
Assume good intent and common goals
```
```
Actionable bug == fix it!
```
```
disabling tests can be a good thing
```

Q&A

Goal: Reduce the impact of intermittents

What is your definition of intermittent?

```
How do intermittents impact your job?
```

What prevents you from fixing intermittent tests?

```
What causes you to not trust tests?
```

On January 4, 2018- what would you expect to see?

```
What experiments should we be doing?
```

Stockwell

By Joel Maher

Stockwell

9,846

Joel Maher

Open Source hacker for the Mozilla project.