Skip to content

Make sidekiq report _all_ exceptions to Instrumental#35

Open
esquivalient wants to merge 6 commits intomasterfrom
sidekiq_should_catch_all_exceptions
Open

Make sidekiq report _all_ exceptions to Instrumental#35
esquivalient wants to merge 6 commits intomasterfrom
sidekiq_should_catch_all_exceptions

Conversation

@esquivalient
Copy link
Member

… and not just StandardError and it's descendents.

Why is this change needed?

It came to our attention that some Metrician users have jobs that use code like Exception.new. That's not good practice, but reality is reality. In those cases, the Sidekiq middleware was not reporting those exceptions to Instrumental because rescue does not catch Exception by default.

How does it address the issue?

Updates the Sidekiq middleware to rescue all exceptions, send the appropriate data to Instrumental, and the re-raise.

On the way to doing this, several other changes were needed.

  • Our testing matrix was still looking at Ruby 2.1.5, 2.2.3, and 2.3.0. Those versions of Ruby are EOL, so they've been removed.
  • As a consequence, we increased the required Ruby version to 2.4.0...
  • And increased the Ruby version for developing Metrician to 2.6.0
  • Also, we added all the newest versions of Ruby to the testing matrix
  • Which meant we needed to update delayed_job because the previous version depended on yaml_as that was part of Psych. Psych is no longer included in Ruby and newer versions of delayed_job handle that.
  • In other news, the testing matrix now requires installing two versions of Bundler. The default version of Bundler from gem is >2.0, but 2.0 will try to use the version of Bundler in the lock file. The lock files all contain <2.0 because the test Rails project still requires it. As a consequence, you'll get both versions of Bundler until all the lock files are updated.

Any links to any relevant tickets, articles or other resources?

https://stackoverflow.com/a/54088849/1417774
https://stackoverflow.com/questions/53655913/error-upgrading-ruby-version-undefined-method-yaml-as
https://github.com/collectiveidea/delayed_job/blob/master/CHANGELOG.md#414---2017-12-29

Co-authored-by: Alex Overbeck alex@expectedbehavior.com

… and not just StandardError and it's descendents.

Why is this change needed?
--------------------------
It came to our attention that some Metrician users have jobs that use code like `Exception.new`. That's not good practice, but reality is reality. In those cases, the Sidekiq middleware was not reporting those exceptions to Instrumental because `rescue` does not catch `Exception` by default.

How does it address the issue?
------------------------------
Updates the Sidekiq middleware to rescue all exceptions, send the appropriate data to Instrumental, and the re-raise.

On the way to doing this, several other changes were needed.

- Our testing matrix was still looking at Ruby 2.1.5, 2.2.3, and 2.3.0. Those versions of Ruby are EOL, so they've been removed.
- As a consequence, we increased the required Ruby version to 2.4.0...
- And increased the Ruby version for developing Metrician to 2.6.0
- Also, we added all the newest versions of Ruby to the testing matrix
- Which meant we needed to update delayed_job because the previous version depended on `yaml_as` that was part of Psych. Psych is no longer included in Ruby and newer versions of delayed_job handle that.
- In other news, the testing matrix now requires installing two versions of Bundler. The default version of Bundler from gem is >2.0, but 2.0 will try to use the version of Bundler in the lock file. The lock files all contain <2.0 because the test Rails project still requires it. As a consequence, you'll get both versions of Bundler until all the lock files are updated.

Any links to any relevant tickets, articles or other resources?
---------------------------------------------------------------
https://stackoverflow.com/a/54088849/1417774
https://stackoverflow.com/questions/53655913/error-upgrading-ruby-version-undefined-method-yaml-as
https://github.com/collectiveidea/delayed_job/blob/master/CHANGELOG.md#414---2017-12-29

Co-authored-by: Alex Overbeck <alex@expectedbehavior.com>
Copy link
Contributor

@janxious janxious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Rails 4.2 and Rails 5.0 are both EOL, perhaps we should also add Gemfile.6.0, Gemfile.5.2, and Gemfile.5.1?

@janxious
Copy link
Contributor

Also should figure out what Travis is angry. Looks like something with bundler but I didn't dig in.

Why is this change needed?
--------------------------
Bundler 2.0 and greater attempts to use the version of Bundler in the lock file. If it doesn't find it, you get an error like `can't find gem bundler (>= 0.a) with executable bundle`.

How does it address the issue?
------------------------------
We don't want to update all the lock files because the Rails project used for testing still requires Bundler < 2.0, so we're just making both the latest version of Bundler and the last version of Bundler compatible with the test version of Rails available. This worked in script/setup, so it'll probably work in CI.

Any links to any relevant tickets, articles or other resources?
---------------------------------------------------------------
https://travis-ci.org/Instrumental/metrician-ruby/jobs/602005821
@esquivalient
Copy link
Member Author

Given Rails 4.2 and Rails 5.0 are both EOL, perhaps we should also add Gemfile.6.0, Gemfile.5.2, and Gemfile.5.1?

@janxious Alex and I worked on this today. It turned into a totally deep hole and may have revealed that Metrician does not work correctly on newer versions of Rails. The saga continues on Monday...

@janxious
Copy link
Contributor

I am not shocked. ☹️

Anything you need a hand with?

@esquivalient
Copy link
Member Author

Anything you need a hand with?

Not yet. Mostly just shaving the yak.

esquivalient and others added 4 commits October 30, 2019 13:47
…m we support for each of those versions of Rails

Why is this change needed?
--------------------------
This is a gem for automatically instrumenting Rails and popular libraries used in Rails. We should test and make sure it works for some reasonable combination of versions.

How does it address the issue?
------------------------------
Let me count the ways!

1) Adds script/generate_gemfiles

This script automatically generates a gemfile for each valid combination of Rails, the required support gems, and the gems we automatically instrument. When run, it also provides the output that needs to be added to the .travis.yml file for those gemfiles to be run as part of the test suite.

2) Adds newly generated gemfiles for Rails 5.1.3, 5.2.0, and 6.0.0

Some gems we support (e.g. dalli/memcached) are mutually exclusive, so each version of Rails has more than one gemfile.

3) Fixes an issue with Bundler versions in Travis (and other places)

If you want to install more than one version of Bundler and you don't specify a version for both, you only get one. That's bad, since some of the older gemfiles in this repo are still using Bundler 1.x.

4) Guards against infinite recursion

Newer gemfiles have newer versions of rspec which have different loading behaviors that caused the active_record instrumentation code to load more than once. It turns out, that guard code doesn't work correctly and could cause infinite recursion. Here's the code comment, for the lazy

You might wonder why this exception-based flow control is here instead of
something more sane like
instrumented_class.method_defined?(:log_without_metrician)
The answer is that method_defined?, methods, instance_methods
and other reasonable ways of answering this question do not
work correctly in this circumstance.  They return false even
when log_without_metrician is defined on the class.  The
result is that the guard statement fails and runs this code
twice.	That makes log_without_metrician an alias of
log_with_metrician and causes infinite recursion :(

It seems like some kind of deep Ruby shenanigans avoud eigen classes and/or possibly a bug. It could maybe be improved by avoiding the use of class_eval, but we did not investigate on account of how many yaks we've already shaved this week.

5) Some test environment changes
Newer versions of sidekiq don't automatically load sidekiq/testing, so that's no specifically required.

Also, newer versions of HoneyBadger change the way they disabled data sending. That's conditionalized in the tests until we stop supporting the versions older than 3.3.1.

Any links to any relevant tickets, articles or other resources?
---------------------------------------------------------------

Co-authored-by: Alex Overbeck <alex@expectedbehavior.com>
Why is this change needed?
--------------------------
We support a lot of different libraries across many ruby versions and rails versions.

How does it address the issue?
------------------------------
There's some automation in place to make sure it's all done correctly, but now there's also words.

Any links to any relevant tickets, articles or other resources?
---------------------------------------------------------------

Co-authored-by: Alex Overbeck <alex@expectedbehavior.com>
Why is this change needed?
--------------------------
We support the memcached gem. It's hard to test when there's no memcached running.

How does it address the issue?
------------------------------
Installs the thing.

Any links to any relevant tickets, articles or other resources?
---------------------------------------------------------------
As requested in this PR: #35
Why is this change needed?
--------------------------
Other people requested they be... more descriptive.

How does it address the issue?
------------------------------
Now includes the specific differentiator of the two branches in the message.

Any links to any relevant tickets, articles or other resources?
---------------------------------------------------------------
As requested #35
Copy link
Member

@jason-o-matic jason-o-matic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I commented about changing the method_defined? stuff. I also think we should add a test for that if it isn't super hard. Maybe something that explicitly tries to load the ActiveRecord instrumentation twice just to make sure it keeps working.

already_defined = false
end

return if already_defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like private_method_defined? is the way to get at this particular method here, if we don't want to use exception based flow control:

   39:       rescue
   40:         already_defined = false
   41:       end
   42:       debugger if !instrumented_class.method_defined?(:log_without_metrician) && already_defined
   43:
=> 44:       return if already_defined
   45:       debugger
   46:       instrumented_class.class_eval do
   47:         debugger
   48:         alias_method :log_without_metrician, :log
(byebug) instrumented_class.method_defined?(:log_without_metrician)
false
(byebug) instrumented_class.private_method_defined?(:log_without_metrician)
true

Maybe the way that the log method gets marked as private has changed in Rails and our alias is ending up marked as private? I could see checking both method_defined? and private_method_defined? to be sure.

I'm pro switching this, but I'm happy to talk about it more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private_method_defined? does appear to work for some of the ruby/rails combos, but not all of them. I'm just doing a quick check, so it doesn't work for the oldest combo and it does work on the newest one.

Pros and cons

  1. Attempting to get the unbound method definitely works across all versions. It will probably work forever because if you can't access a method through method then there is no method. But, you have to use exception flow control to use method.

  2. We can try to figure out which methods work with which combos and conditionalize it. I think it's probably a pre-2.5/post-2.5 split.

  3. We can try to verify that some combination of interrogation methods should cover every combo (e.g. method_defined? and private_method_defined?)

Given that the outcome is infinite recursion I prefer the one that definitely can't be wrong, but I'm convincible. It's not like I think exception-based flow control is the bees knees.

Copy link
Member

@jason-o-matic jason-o-matic Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel super strongly either way. The fact that *method_defined? has changed over time does make me wonder if method will actually be trustworthy into the future, but probably. If we leave the exception based flow control we might want to move it into an already_defined? method just to encapsulate the badness.

In any case I think we should definitely add some kind of a test for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants