Queue Size

Number of jobs that haven't been run yet or that failed and are going to be retried
Graphs make people happy. No denying that. So here’s a treat for you. Assuming you’re using Munin to monitor all sorts of stuff in your servers (Munin’s web interface is ugly but it’s really cool!), and using delayed_job as a queue, take this file:
http://github.com/obvio171/delayed_job/blob/423dc31b9ed6f2bee6e4c780e85d7d9a546e0a60/contrib/delayed_jobs_queue_size.rb
just drop it (or link to it) inside your /etc/munin/plugins/ on the servers that run jobs and that’s it! It’ll break!
Not what you were expecting huh? Ok, you have to do a little more work. First, you have to tell the plugin how to connect to your database. The easiest way, if you’re running Rails, is to set the environment variable DATABASE_YML to the path where that file is. Make sure the user who runs the plugin has access to that file. By default it’s the user munin, but you can specify a different one (cf. http://munin.projects.linpro.no/wiki/plugin-conf.d — this won’t link properly due to a bug in wordpress).
If you have some different setup, you can edit the plugin and pass a hash with username, password, etc. to Grapher.new. Don’t be scared! The code is really simple, take a look!
Also, if you’re running vanilla delayed_job (that is, from tobi’s repo), this plugin uses a table column that you don’t have, finished_at (we’ll come back to this column later). Just remove that extra AND finished_at IS NULL. Now it works! That’s how big your queue is right there, on that graph!
So why would you want that extra column again? Here’s something to wet your appetite:
Average Run Time

How long it's taking your jobs to run
Now, you can’t get that with vanilla delayed_job. Why not? Because it deletes finished jobs, so there’s no way of telling how long they took to run*.
How do you solve that?
Step 1: Run this migration:
Step 2: Use my branch of delayed_job (hey, gotta sell my fish). It adds the appropriate behavior related to those extra columns.
Step 3: Add the line Delayed::Job.destroy_successful_jobs = false to your environment.rb or, better yet, to some initializer.
Now your jobs stay there after they complete successfully so you can poke at them and gather statistics. Specifically, they keep the time when they first started, when they last started (these are different if there was more than 1 attempt) and when they finished successfully.
With that in hands, you can now add this one other plugin:
http://github.com/obvio171/delayed_job/blob/423dc31b9ed6f2bee6e4c780e85d7d9a546e0a60/contrib/delayed_jobs_run_time.rb
Follow the same instructions as with the first one, and you have the graph above with each point representing the average running time of jobs that got finished in the last 5 minutes. Sweet huh?
There’s more!
It’s a graphing spree! Know what else you can do with those statistics you’re gathering? From the README:
“This is useful for gathering statistics like how long a job took:
* to be picked up by the first worker: first_started_at – created_at
* in one successful run (the last one that didn’t fail): finished_at – last_started_at
* in failed retries: last_started_at – first_started_at”
Now these you’ll have to do yourself. I haven’t written them. But if you look at my plugins, they’re really simple, and it should be very easy for you to change them to output these other graphs. If you get around to doing that, please be nice and put them in that contrib folder. There’s certainly people who’ll find use for them!
Wait, so I’ll have these finished jobs lying around FOREVER?
If you don’t do anything, yep. But you can clean up with these handy rake tasks:
jobs:clear:all
jobs:clear:finished
jobs:clear:failed
Put them in a cron job somewhere and you’ll be fine.
* This is not entirely true. Delayed_job logs how long each job took on its last run. So you could parse the log and graph it. I don’t like parsing logs, and if you have more than one worker you’ll have to aggregate the different logs somehow, it’s just a big mess. Also, it only gives you how long it took to run, doesn’t give you when it started and finished, so you can’t graph that other stuff I mentioned at the end. My plugin, for example, has slightly different semantics, and tells you how long the job took IN TOTAL to run, counting retries. You can’t do that by parsing the log. I could’ve added the extra data to the log itself but, as I said, I hate parsing logs, and delayed_job already let you keep failed jobs around, so I thought it wouldn’t be too inconsistent to leave successful ones there too.

