-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AgentRunner and LongRunnable to support long running agents #819
Conversation
Very cool! |
# periodically propagating and expiring Events. It also running TwitterStreamAgents and Agents that support long running | ||
# background jobs. | ||
|
||
Dotenv.load if Rails.env == 'development' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this so you don't have to use foreman
?
Rails.env.development?
works too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Foreman, right! I was wondering why I did not have issues like this before. Do you think it's worth keeping? In development I like to start the background worker without foreman to be able to restart them faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's totally reasonable to keep, if we can make it work both with and without a Procfile.
This seems really good! If someone wants to run multiple delayed_job processes, how would they do that in this new world? |
Not much compared to the current |
1329177
to
55a3aa5
Compare
I am running this in production now and everything seems fine so far. Had an issue with Would be great if some of you could test it in production too. /cc @knu @virtadpt |
I'll try to test it on Saturday morning, I'm swamped at work right now. |
I'll merge this into my personal Huginn and see how it goes as well. |
Hey @dsander, I deployed it and my twitter streaming agents stopped working:
|
@cantino Thanks for testing it! It the twitter agent seems to have raised an exception which I did not expect to happen. I changed the twitter exception code handling to show the exception class (all the information it exposes). Do you have any special filters? To be honest I just tested it with two |
Wow, this is frustrating. It turns out the While working on a fix for the twitter gem I discovered the I will open an issue for the |
Wow, that's pretty bad. However, I'm okay if we deploy this. I don't really need the |
I can deploy it again to my system soon and try it without the |
Actually we are requiring the |
cbbf7a6
to
bc786f2
Compare
The specs are passing again and I squashed and reordered the commits. Hopefully everything is working now 😄 The JabberAgent has been performing very well for me (I use it in a HipChat room to grab URLs and post the facebook opengraph summary of articles, HipChat used to have a integration which did this but it was removed due to "legal concerns") I am testing the TwitterStreamAgent in production now. |
5778b01
to
c7caee0
Compare
|
||
muc.join(agent.interpolated['jabber_receiver']) | ||
|
||
while @client.is_connected? do sleep 1; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be this?
sleep 1 while @client.is_connected?
If we don't hear anything in a day or so, let's merge it! On Monday, September 7, 2015, Dominik Sander notifications@github.com
|
Merged! @cantino Thanks for the reviews and testing! It required more work then I expected 😄 |
Add AgentRunner and LongRunnable to support long running agents
Nice! |
I had my Twitter stream stop again 3 days ago, unfortunately. It was on the pre-merged version of this PR. I just updated to master and added some logging to see if I can debug if it happens again. |
That is annoying, i suppose you did not get an exception? |
I was having trouble finding the right point in the logs, so I've added timestamps to the output and will keep an eye on it. |
So far, so good. |
@dsander, unfortunately my Twitter streams stopped working on the 21st. There are no errors in the jobs log, but the production.log shows this around the same time:
Any ideas? |
That sucks, judging from the log It looks like the |
Seemed like it might be just twitter since I was still seeing scheduled event propagation working? I think twitter is the only long runnable I use. |
@dsander, what is the separate background worker configuration you recommend? What's frustrating is that I can't reproduce this symptom without waiting a week or so, then twitter events just stop being created. |
I mean the old Procfile configuration, I wonder if the same happens when the TwitterStreamAgent is running in its own process. |
I can try this. Also seeing if my Event JSON resue code helps. Is there On Thursday, October 1, 2015, Dominik Sander notifications@github.com
|
Maybe around I am also wondering which log file did you post earlier? I switched to |
I posted upstart, but see also adding |
Okay, so everything crashed again last night after working for a day or two. First I got
then a series like
about 5 times then
a few times then the following just repeats indefinitely
I don't think the server has run out of RAM. Any ideas? Do you think we should revert this PR? |
Do you have a backtrace for the first one? It looks like the other issues are related to the first exception. Is it possible similar to this issue reported on the
If we cant resolve the issue we might have to, I will try to set up a instance using mysql myself since it seems not to be happening on postgres. |
I'll try updating the mysql2 gem. Here's the backtrace:
|
The newest that works with Rails appears to be 0.3.20, so I'm trying that. (See rails/rails#21544) |
Well, it seems like this has been running successfully on my system for On Wednesday, October 7, 2015, Dominik Sander notifications@github.com
|
On which commit is that rescue? In general it sounds like a good idea to rescue potential errors early, but even when the Agent crashed it should automatically be restarted. I never had any issues on my main instance (which has more then enough RAM) but saw a few crashes on my DO droplet due to the machine running OOM. |
I was referring to #1049 because I don't know of anything else that has changed. Also, at least some people aren't seeing memory leaks: https://twitter.com/rem0_o/status/664299186138038273 |
Right, totally forgot about that one, great that your instance is now running without issues. If the server has enough memory and restart huginn every few weeks I guess you would never notice a leak. I never found anything that was leaking but the memory usage looks a bit suspicious:
|
Sorry, yes I saw the allocated object count increase, sometimes even jump nearly by the factor 2, but the heap dumps never revealed the source. It could also be that these object were garbage collected a few ticks later. As far as I understand rubys memory allocation an increase of the RSS is not necessarily cause by a leak, it could also be a spike of allocated objects which are garbage collected later. |
I've recently seen this on other projects as well. It may be a symptom of the generational GC. |
This changes HuginnScheduler and TwitterStream to use the new AgentRunner
TwitterStream now uses thetwitter
gem to access the streaming API which removes theeventmachine
dependency (in production).I have been working on this for a while now and it finally seems to reach a point where everything is running smoothly (at least in development).
The
AgentRunner
is basically an improvedthreaded.rb
. It handles failure of workers and reloads them if the configuration changed.What is still missing is a way for a LongRunning Agents to receive events. This would, for example, allow the XMPP Agent to stay connected to the server.