Hacker News new | past | comments | ask | show | jobs | submit login

Sure. I didn't figure this out, but was relatively close to the investigation. My team provided a lot of data that ended up being used to come to root cause.

My team and extended teams managed almost 200,000 network devices, spread all over the world, most of which were Cisco, and most of which were installed in stores. And most of the switch ports were connected to customer facing Point Of Sale devices. Among these are employee facing registers and customer facing card scanners. That is, the devices you interact with whenever scan your card to pay for something.

With that many devices in that many locations in a largely unmanaged environment (the switches would be installed all over the store, often in the ceiling, and many of them experienced extreme temperatures), there were a constant stream of failures. The process to manage these failures was optimized, streamlined and largely automated.

However, it was discovered that switches were failing far more frequently in the northern Midwestern US than elsewhere, and then only in the winter.

So this wasn't a really big operational issue, but it had a substantial cost impact, and the rate was high enough that a lot of the affected stores did notice and were complaining.

Right. Very strange, very mysterious.

So, briefly, the root cause:

Apparently, people in the upper Midwest wear wool to stay warm far more frequently than other cold places, specifically the US northeast. And much of the time, the humidity is quite low. So, you have a lot of people wearing a lot of wool in low humidity air. These people generated a lot of static, which they would all too often discharge while interacting with the customer facing point of sale device. And, all too frequently, that pulse of static would end up flowing all the way back to the switch, often killing it.

I didn't follow the subsequent remediation efforts, so I don't know what if anything was done about that.




Amusing that you were causing them physical pain when parting with their money at the store, like negative conditioning.


Yes, that particular joke did surface after we discovered the problem. (:


That’s a fantastic anecdote. I would have loved to have been a fly on the wall when the results were reported to management.


Indeed, great story. I'd love to have seen the face of the person who finally figured it out.


Thanks.

I suspect it would not have been received as that big a deal to management or to anyone else. In those halcyon days, we were running into and usually solving all kinds of such edgy, extreme scale problems. It was a lot of work, but a hell of a lot of fun too.


One of the great stories right here. Thank you for sharing.


Wow, that’s one helluva edge case I’d never think of!


but at least you get to blame the users!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: