This blog post is based on a talk I gave at SREcon18 EMEA. If you’d like to see the slides or watch the talk, click here to view the usenix website.
Let’s start with a radical concept: you’ve already got 50% of the work done to start capacity planning. A lot of what SRE teams do already feeds directly into understanding and forecasting capacity.
Recently there was an issue with particular dedicated hosts having network issues due to high traffic triggering a known bug in particular RealTek NICs.
Unfortunately puppet doesn’t expose facts about the networking equipment of a server so I wrote the below to expose the NIC drivers in use, their firmware version and which interfaces use them.
A lot of puppet configurations recommend using puppet’s tidy directive to manage puppet reports. The problem with this though is that in order to delete the file, puppet will create a file directive in state.yaml. The state file grows pretty quickly then because of this and I’ve experienced it slowing down puppetruns after a certain point.