Correct begin and end (line+char) positions for each token and node#23
Open
nene wants to merge 6 commits into
Open
Correct begin and end (line+char) positions for each token and node#23nene wants to merge 6 commits into
nene wants to merge 6 commits into
Conversation
Previously the @line field that some Nodes had wasn't always the number of the line where a Node started. And it was hard to tell why some nodes had this info, while others didn't. Now both the start and end line of every Token and Node is stored in @range field as two-element array. The #line method is still there for backwards compatibility, but the @range field is the one that holds the actual data. For now I had to disable the tests in test_comments.rb as the #apply_comments (which it effectively tested) relied on only some nodes having the line number data available.
Each Token and Node range is now a 2D array of the following:
[[start_line, start_char], [end_line, end_char]]
One can now know the exact position of each node.
Instead of attempting to associate all comments with syntax nodes (and actually failing to correctly do so) all comments found are added to the root SoureElementsNode.comments - all other nodes will have an empty #comments array.
Create classes for holding character positions data. - The CharPos contains both line/char and absolute character index. - The CharRange combines two CharPos objects. - Both have #next method to abstract away the creation of the ranges inside Tokenizer.
Contributor
Author
|
Already took care of the two points mentioned and replaced the array-based ranges with CharRange and CharPos objects. Much cleaner. |
Alias the #to_s to #inspect so that when using `pp` to inspect the created AST, the ranges will appear nice and compact. Also tweak to #to_s a bit to output its stuff in a bit shorter form.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related to the issue I just raised: #22
I replaced the rudimentary
@linefield in Token and Node classes with@rangefield that contains the following 2D array:Unlike the old implementation, all nodes now have this information. And I've kept the #line accessor for backwards compatibility.
Also, I eliminated the whole comments-to-syntax-nodes mapping. Placing the array of all comments into the root node.
There are things to improve here:
Maybe the 2D array should be replaced with a hash or object, so it would be a bit more readable. e.g.{:start => {:line => 3, :char => 8}, :end => {:line => 3, :char => 10}}Instead of line+character based ranges we could provide a simple index based ranges. Some tools might prefer one way, others a different way. So the best might be to support both.@commentsfield. So maybe it would be better to wrap everything inside a special ProgramNode that contains this@commentsfield. Or not return the comments at all, instead providing RKelly::Parser#comments accessor to access the comments list if needed.All this aside, I think it's an improvement over the current implementation which is simply broken.