All those times

we broke

MelbJS, 2019-06-19

Read these slides on your device:

What are we talking about?

Warning:

extreme levels of profanity ahead

I stole this talk idea

from

#1: The certificate outage

February 2014

We broke the registry a lot in January 2014

SSL in Node.js used to be a lot harder

Migrations are hard

This is going to be a theme

DOUBLE FUCK

Rolling back is hard,

even harder when everyone is yelling at you.

What did we learn?

  • We are bigger than we thought!
  • We can never change cert providers 🙁
  • Have a rollback plan

#2: The password change outage

April 2014

Our CouchDB

app was bad

I'm told there are better ones.

Load at 100%

no matter how big a box we buy.

WTF?

Password change errors

Who has time to fix them when the server is always on fire?

Fucked by silent string conversions

Infinite loops will fuck you

What did we learn?

  • We are not good at CouchDB
  • Don't roll your own... anything
  • Don't ignore "small" errors

#3: Registry 2.0 launch

April 2015

Farewell, Couch app

What did we learn?

Use canaries to test big changes

before they go out to everybody.

Canaries use real data

And real data is extremely messy

#4: left-pad

March 2016

Kik me

Then we fucked up

by not having a clear policy

404: left-pad not found

I will feel genuinely bad about left-pad forever

What did we learn?

  • Unpublishes are really dangerous
  • Unpublishes are impossible after 24 hours

What did we learn?

  • We're even bigger now
  • Ignore vague legal threats
  • Hire a damn lawyer
  • Have clear policies

#5: "fs" unpublished

August 2016

fs on npm

Does literally nothing.

fs in node

Does everything with the file system.

How dangerous can it be to unpublish something that doesn't do anything?

Oh, you sweet summer child.

So we put it back

It still gets downloaded 400,000 times per week.

What did we learn?

  • Don't unpublish things. FFS.
  • Internal process is important

#6: VS Code takes down the registry

November 2016

VS Code was just trying to be helpful

404 is an error

Do you cache error responses?

We didn't cache 404s

And neither did VS Code.

What did we learn?

  • Cache error states
  • VS Code is very popular
  • TypeScript is very popular
  • Microsoft is way nicer than it was in the 90s

#7: Nuked the payments database

December 2017

Scoped packages:

@user/name

Scoped packages can be private

Customer support are powerful people

DELETE FROM Customers;

See, this is why I hate ORMs.

What did we learn?

  • ORMs are dangerous
  • Always test your backups

#8: require-from-string

January 2018

Spam

It's why we can't have nice things

npm package pages have really great pagerank

You have to delete spam

Smyte: remember that name.

Sometimes real things look like spam

What did we learn?

  • STOP UNPUBLISHING THINGS FFS
  • Be careful giving robots too much power
  • Spammers are persistent fuckers

#9: Cloudflare migration

May 2018

We broke somebody else's registry

Infinite loop AGAIN, motherfucker

DOUBLE FUCK

Who deploys on a Friday???

What did we learn?

  • Don't deploy on a Friday
  • Don't have hard deadlines based on early estimates
  • We are so big we are responsible for stuff we're not even responsible for

#10: Smyte smites us

June 2018

Smyte turned off their API with 30 minutes of notice

Tweeting angrily will definitely help

What did we learn?

  • Plan for absurd failures
  • Beware of cheap APIs
  • Never tweet

We will fuck up again

We are particularly good at finding

new ways to accidentally delete things

@seldo

These slides are available right now

Now would be a good time to follow me on Twitter

I ❤️ you