We have a chores process that involves engineers reviewing logs in Logstash on a daily basis. Sometimes we miss a day, though. Or, maybe we look at the dashboard at 22:00 UTC but there are a flood of errors beginning at 01:00 UTC. As a team, we would like to know about these errors before someone else tells us about them.
We should implement email alerts when there is a spike of messages, for example more than 10 (to be determined) per hour. We should first complete T328128: Reduce noise in Growth team's Logstash dashboard, so we have a better handle on expected log volume for our dashboard.
It seems like we'd need the OpenSearch Alerting plugin to implement this. I've asked about that in T293694: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period .