Make sidekiq report _all_ exceptions to Instrumental by esquivalient · Pull Request #35 · Instrumental/metrician-ruby

esquivalient · 2019-10-23T20:23:41Z

… and not just StandardError and it's descendents.

Why is this change needed?

It came to our attention that some Metrician users have jobs that use code like Exception.new. That's not good practice, but reality is reality. In those cases, the Sidekiq middleware was not reporting those exceptions to Instrumental because rescue does not catch Exception by default.

How does it address the issue?

Updates the Sidekiq middleware to rescue all exceptions, send the appropriate data to Instrumental, and the re-raise.

On the way to doing this, several other changes were needed.

Our testing matrix was still looking at Ruby 2.1.5, 2.2.3, and 2.3.0. Those versions of Ruby are EOL, so they've been removed.
As a consequence, we increased the required Ruby version to 2.4.0...
And increased the Ruby version for developing Metrician to 2.6.0
Also, we added all the newest versions of Ruby to the testing matrix
Which meant we needed to update delayed_job because the previous version depended on yaml_as that was part of Psych. Psych is no longer included in Ruby and newer versions of delayed_job handle that.
In other news, the testing matrix now requires installing two versions of Bundler. The default version of Bundler from gem is >2.0, but 2.0 will try to use the version of Bundler in the lock file. The lock files all contain <2.0 because the test Rails project still requires it. As a consequence, you'll get both versions of Bundler until all the lock files are updated.

Any links to any relevant tickets, articles or other resources?

https://stackoverflow.com/a/54088849/1417774
https://stackoverflow.com/questions/53655913/error-upgrading-ruby-version-undefined-method-yaml-as
https://github.com/collectiveidea/delayed_job/blob/master/CHANGELOG.md#414---2017-12-29

Co-authored-by: Alex Overbeck alex@expectedbehavior.com

… and not just StandardError and it's descendents. Why is this change needed? -------------------------- It came to our attention that some Metrician users have jobs that use code like `Exception.new`. That's not good practice, but reality is reality. In those cases, the Sidekiq middleware was not reporting those exceptions to Instrumental because `rescue` does not catch `Exception` by default. How does it address the issue? ------------------------------ Updates the Sidekiq middleware to rescue all exceptions, send the appropriate data to Instrumental, and the re-raise. On the way to doing this, several other changes were needed. - Our testing matrix was still looking at Ruby 2.1.5, 2.2.3, and 2.3.0. Those versions of Ruby are EOL, so they've been removed. - As a consequence, we increased the required Ruby version to 2.4.0... - And increased the Ruby version for developing Metrician to 2.6.0 - Also, we added all the newest versions of Ruby to the testing matrix - Which meant we needed to update delayed_job because the previous version depended on `yaml_as` that was part of Psych. Psych is no longer included in Ruby and newer versions of delayed_job handle that. - In other news, the testing matrix now requires installing two versions of Bundler. The default version of Bundler from gem is >2.0, but 2.0 will try to use the version of Bundler in the lock file. The lock files all contain <2.0 because the test Rails project still requires it. As a consequence, you'll get both versions of Bundler until all the lock files are updated. Any links to any relevant tickets, articles or other resources? --------------------------------------------------------------- https://stackoverflow.com/a/54088849/1417774 https://stackoverflow.com/questions/53655913/error-upgrading-ruby-version-undefined-method-yaml-as https://github.com/collectiveidea/delayed_job/blob/master/CHANGELOG.md#414---2017-12-29 Co-authored-by: Alex Overbeck <alex@expectedbehavior.com>

janxious

Given Rails 4.2 and Rails 5.0 are both EOL, perhaps we should also add Gemfile.6.0, Gemfile.5.2, and Gemfile.5.1?

spec/support/test_sidekiq_worker.rb

script/setup

janxious · 2019-10-24T14:23:39Z

Also should figure out what Travis is angry. Looks like something with bundler but I didn't dig in.

Why is this change needed? -------------------------- Bundler 2.0 and greater attempts to use the version of Bundler in the lock file. If it doesn't find it, you get an error like `can't find gem bundler (>= 0.a) with executable bundle`. How does it address the issue? ------------------------------ We don't want to update all the lock files because the Rails project used for testing still requires Bundler < 2.0, so we're just making both the latest version of Bundler and the last version of Bundler compatible with the test version of Rails available. This worked in script/setup, so it'll probably work in CI. Any links to any relevant tickets, articles or other resources? --------------------------------------------------------------- https://travis-ci.org/Instrumental/metrician-ruby/jobs/602005821

esquivalient · 2019-10-24T20:41:30Z

Given Rails 4.2 and Rails 5.0 are both EOL, perhaps we should also add Gemfile.6.0, Gemfile.5.2, and Gemfile.5.1?

@janxious Alex and I worked on this today. It turned into a totally deep hole and may have revealed that Metrician does not work correctly on newer versions of Rails. The saga continues on Monday...

janxious · 2019-10-25T15:10:41Z

I am not shocked. ☹️

Anything you need a hand with?

esquivalient · 2019-10-28T18:35:00Z

Anything you need a hand with?

Not yet. Mostly just shaving the yak.

…m we support for each of those versions of Rails Why is this change needed? -------------------------- This is a gem for automatically instrumenting Rails and popular libraries used in Rails. We should test and make sure it works for some reasonable combination of versions. How does it address the issue? ------------------------------ Let me count the ways! 1) Adds script/generate_gemfiles This script automatically generates a gemfile for each valid combination of Rails, the required support gems, and the gems we automatically instrument. When run, it also provides the output that needs to be added to the .travis.yml file for those gemfiles to be run as part of the test suite. 2) Adds newly generated gemfiles for Rails 5.1.3, 5.2.0, and 6.0.0 Some gems we support (e.g. dalli/memcached) are mutually exclusive, so each version of Rails has more than one gemfile. 3) Fixes an issue with Bundler versions in Travis (and other places) If you want to install more than one version of Bundler and you don't specify a version for both, you only get one. That's bad, since some of the older gemfiles in this repo are still using Bundler 1.x. 4) Guards against infinite recursion Newer gemfiles have newer versions of rspec which have different loading behaviors that caused the active_record instrumentation code to load more than once. It turns out, that guard code doesn't work correctly and could cause infinite recursion. Here's the code comment, for the lazy You might wonder why this exception-based flow control is here instead of something more sane like instrumented_class.method_defined?(:log_without_metrician) The answer is that method_defined?, methods, instance_methods and other reasonable ways of answering this question do not work correctly in this circumstance. They return false even when log_without_metrician is defined on the class. The result is that the guard statement fails and runs this code twice. That makes log_without_metrician an alias of log_with_metrician and causes infinite recursion :( It seems like some kind of deep Ruby shenanigans avoud eigen classes and/or possibly a bug. It could maybe be improved by avoiding the use of class_eval, but we did not investigate on account of how many yaks we've already shaved this week. 5) Some test environment changes Newer versions of sidekiq don't automatically load sidekiq/testing, so that's no specifically required. Also, newer versions of HoneyBadger change the way they disabled data sending. That's conditionalized in the tests until we stop supporting the versions older than 3.3.1. Any links to any relevant tickets, articles or other resources? --------------------------------------------------------------- Co-authored-by: Alex Overbeck <alex@expectedbehavior.com>

Why is this change needed? -------------------------- We support a lot of different libraries across many ruby versions and rails versions. How does it address the issue? ------------------------------ There's some automation in place to make sure it's all done correctly, but now there's also words. Any links to any relevant tickets, articles or other resources? --------------------------------------------------------------- Co-authored-by: Alex Overbeck <alex@expectedbehavior.com>

Why is this change needed? -------------------------- We support the memcached gem. It's hard to test when there's no memcached running. How does it address the issue? ------------------------------ Installs the thing. Any links to any relevant tickets, articles or other resources? --------------------------------------------------------------- As requested in this PR: #35

Why is this change needed? -------------------------- Other people requested they be... more descriptive. How does it address the issue? ------------------------------ Now includes the specific differentiator of the two branches in the message. Any links to any relevant tickets, articles or other resources? --------------------------------------------------------------- As requested #35

jason-o-matic

Overall looks good. I commented about changing the method_defined? stuff. I also think we should add a test for that if it isn't super hard. Maybe something that explicitly tries to load the ActiveRecord instrumentation twice just to make sure it keeps working.

jason-o-matic · 2019-10-31T19:04:21Z

lib/metrician/reporters/active_record.rb

+        already_defined = false
+      end
+
+      return if already_defined


It looks like private_method_defined? is the way to get at this particular method here, if we don't want to use exception based flow control:

39: rescue 40: already_defined = false 41: end 42: debugger if !instrumented_class.method_defined?(:log_without_metrician) && already_defined 43: => 44: return if already_defined 45: debugger 46: instrumented_class.class_eval do 47: debugger 48: alias_method :log_without_metrician, :log (byebug) instrumented_class.method_defined?(:log_without_metrician) false (byebug) instrumented_class.private_method_defined?(:log_without_metrician) true

Maybe the way that the log method gets marked as private has changed in Rails and our alias is ending up marked as private? I could see checking both method_defined? and private_method_defined? to be sure.

I'm pro switching this, but I'm happy to talk about it more.

private_method_defined? does appear to work for some of the ruby/rails combos, but not all of them. I'm just doing a quick check, so it doesn't work for the oldest combo and it does work on the newest one.

Pros and cons

Attempting to get the unbound method definitely works across all versions. It will probably work forever because if you can't access a method through method then there is no method. But, you have to use exception flow control to use method.

We can try to figure out which methods work with which combos and conditionalize it. I think it's probably a pre-2.5/post-2.5 split.

We can try to verify that some combination of interrogation methods should cover every combo (e.g. method_defined? and private_method_defined?)

Given that the outcome is infinite recursion I prefer the one that definitely can't be wrong, but I'm convincible. It's not like I think exception-based flow control is the bees knees.

I don't feel super strongly either way. The fact that *method_defined? has changed over time does make me wonder if method will actually be trustworthy into the future, but probably. If we leave the exception based flow control we might want to move it into an already_defined? method just to encapsulate the badness.

In any case I think we should definitely add some kind of a test for this.

janxious reviewed Oct 24, 2019

View reviewed changes

spec/support/test_sidekiq_worker.rb Outdated Show resolved Hide resolved

script/setup Outdated Show resolved Hide resolved

esquivalient and others added 4 commits October 30, 2019 13:47

jason-o-matic requested changes Oct 31, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sidekiq report _all_ exceptions to Instrumental#35

Make sidekiq report _all_ exceptions to Instrumental#35
esquivalient wants to merge 6 commits intomasterfrom
sidekiq_should_catch_all_exceptions

esquivalient commented Oct 23, 2019

Uh oh!

janxious left a comment

Uh oh!

Uh oh!

Uh oh!

janxious commented Oct 24, 2019

Uh oh!

esquivalient commented Oct 24, 2019

Uh oh!

janxious commented Oct 25, 2019

Uh oh!

esquivalient commented Oct 28, 2019

Uh oh!

jason-o-matic left a comment

Uh oh!

jason-o-matic Oct 31, 2019

Uh oh!

esquivalient Oct 31, 2019

Uh oh!

jason-o-matic Nov 5, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

esquivalient commented Oct 23, 2019

Why is this change needed?

How does it address the issue?

Any links to any relevant tickets, articles or other resources?

Uh oh!

janxious left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

janxious commented Oct 24, 2019

Uh oh!

esquivalient commented Oct 24, 2019

Uh oh!

janxious commented Oct 25, 2019

Uh oh!

esquivalient commented Oct 28, 2019

Uh oh!

jason-o-matic left a comment

Choose a reason for hiding this comment

Uh oh!

jason-o-matic Oct 31, 2019

Choose a reason for hiding this comment

Uh oh!

esquivalient Oct 31, 2019

Choose a reason for hiding this comment

Uh oh!

jason-o-matic Nov 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jason-o-matic Nov 5, 2019 •

edited

Loading