Ruby Virtual Machine Internals and Investigating Variable Scope
The idea of scope in any programming language is a pretty fundamental concept that – although people often struggle with it at first – quickly becomes second nature; the idea of getting tripped up by an issue with scope in a language you’ve been using for years is laughable! If you want a laugh then keep reading…
When working with an existing piece of code that I wanted to call in a different context I stumbled across some behaviour that I couldn’t explain. This was a Ruby module that was intended to be mixed-in and used in the context of a Chef recipe. I wanted to reuse it elsewhere but it was already called in many different places so I wanted to touch it as little as possible. To that end I passed in a parameter that would be used if a particular instance variable wasn’t available. By way of illustration it was something like:
module ExistingModule def existing_method(first_parameter, new_overriding_parameter=nil) if new_overriding_parameter instance_variable = new_overriding_parameter end # Do some other stuff before returning end end
If a value was passed into the method for new_overriding_parameter then that would be set as the value of instance_variable but, frustratingly, the value of instance_variable was nil when new_overriding_parameter was nil – it appeared that the existing variable, which should have been in scope, was being trampled. After printing out a lot of debugging statements I narrowed it down to the presence of the local assignment in the method. Without that particular line this behaved as I’d expected it to.
So what is happening here? I had a hunch but to confirm it I need to see what instructions were being passed to the Ruby virtual machine. To do this we need to inspect instances of the InstructionSequence class in the RubyVM module ((Ruby core documentation – http://ruby-doc.org/core-2.2.2/RubyVM/InstructionSequence.html)).
These InstructionSequence objects represent the bytecode instructions sent to the Ruby virtual machine. The Ruby virtual machine was originally written by Koichi Sasada as an external project, called YARV ((https://en.wikipedia.org/wiki/YARV)). On 31st of December 2006 Yukihiro Matsumoto merged YARV into the main Ruby project ((https://github.com/ruby/ruby/commit/a3e1b1ce7ed7e7ffac23015fc2fde56511b30681)) where it has since been the default virtual machine for the language.
To create an InstructionSequence object and inspect its instruction sequence we do something like:
test_class = <<-EOF class T attr_accessor(:a) def initialize @a = 42 end def t(this_a=nil) if this_a a = this_a end a end end EOF iseq = RubyVM::InstructionSequence.compile(test_class) puts(iseq.disassemble)
Running that will give us the following output:
== disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>========== 0000 trace 1 ( 1) 0002 putspecialobject 3 0004 putnil 0005 defineclass :T, <class:T>, 0 0009 leave == disasm: <RubyVM::InstructionSequence:<class:T>@<compiled>>=========== 0000 trace 2 ( 1) 0002 trace 1 ( 2) 0004 putself 0005 putobject :a 0007 opt_send_without_block <callinfo!mid:attr_accessor, argc:1, FCALL|ARGS_SIMPLE> 0009 pop 0010 trace 1 ( 4) 0012 putspecialobject 1 0014 putspecialobject 2 0016 putobject :initialize 0018 putiseq initialize 0020 opt_send_without_block <callinfo!mid:core#define_method, argc:3, ARGS_SIMPLE> 0022 pop 0023 trace 1 ( 8) 0025 putspecialobject 1 0027 putspecialobject 2 0029 putobject :t 0031 putiseq t 0033 opt_send_without_block <callinfo!mid:core#define_method, argc:3, ARGS_SIMPLE> 0035 trace 4 ( 14) 0037 leave ( 8) == disasm: <RubyVM::InstructionSequence:initialize@<compiled>>========== 0000 trace 8 ( 4) 0002 trace 1 ( 5) 0004 putobject 42 0006 dup 0007 setinstancevariable :@a, <is:0> 0010 trace 16 ( 6) 0012 leave ( 5) == disasm: <RubyVM::InstructionSequence:t@<compiled>>=================== local table (size: 3, argc: 0 [opts: 1, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 3] this_a<Opt=0>[ 2] a 0000 putnil ( 8) 0001 setlocal_OP__WC__0 3 0003 trace 8 0005 trace 1 ( 9) 0007 getlocal_OP__WC__0 3 0009 branchunless 17 0011 trace 1 ( 10) 0013 getlocal_OP__WC__0 3 0015 setlocal_OP__WC__0 2 0017 trace 1 ( 12) 0019 getlocal_OP__WC__0 2 0021 trace 16 ( 13) 0023 leave ( 12)
The block of text above isn’t the prettiest thing but there should be some familiar bits we can pick out: <class:T>, attr_accessor, initialize. We can infer that these are the parts concerned with defining the class and setting up the class attribute and methods. The part we’re interested in is below the line: == disasm: <RubyVM::InstructionSequence:t@<compiled>>=================== (highlighted above).
It might not that clear exactly what’s going on but we can see instructions that look like setlocal and getlocal and we can guess that these are operating on local variables.
All of these instructions are defined in a file call insns.def in the Ruby source code ((https://github.com/ruby/ruby/blob/v2_2_2/insns.def)). The comment for setlocal does indeed say, “Set a local variable” so our assumption was valid.
To get something to compare it to, lets look at what happens when we don’t have the assignment to our a variable; just taking the lines we’re interested in:
== disasm: <RubyVM::InstructionSequence:t@<compiled>>=================== local table (size: 2, argc: 0 [opts: 1, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] this_a<Opt=0> 0000 putnil ( 8) 0001 setlocal_OP__WC__0 2 0003 trace 8 0005 trace 1 ( 9) 0007 getlocal_OP__WC__0 2 0009 branchunless 11 0011 trace 1 ( 12) 0013 putself 0014 opt_send_without_block <callinfo!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0016 trace 16 ( 13) 0018 leave ( 12)
An interesting place to look is the line starting local table: in the first example that has size: 3; this time it has size: 2. This table is where Ruby stores all references to local variables ((Ruby Under a Microscope, Pat Shaughnessy, p. 46)). It seems that just by including a local assignment we tell the VM that there is a local variable by this name, thus overriding the class attribute of the same name. That means that when we refer to a anywhere in this context it refers to the local variable.
We can see more evidence of this further down where in the first example, at instruction 0019, there is a getlocal operation – in the second example, at instructions 0013 and 0014 there is a putself (put the current instance onto the stack) followed by an opt_send_without_block, with the argument a.
So it appears that just introducing the assignment to a creates a local variable of that name – regardless of whether or not that assignment is actually executed.
The confusing part of this problem was that the body of the if expression wasn’t being evaluated so there was no way that anything new could have been assigned to the instance variable. But sure enough, the value of a ended up being nil. It was clear after this investigation that we’d actually introduced another variable of the same name.
For an example Ruby script that will create two files with the different bytecode instructions used here, see https://gist.github.com/grahamlyons/b8487540dc283b38b4f7.
(The solution was to introduce a local variable with a new name, having either the method argument or the instance variable assigned to it – in actual fact, much more sane than overwriting the instance variable anyway…)
When we want to refer to the instance variable we can actually be more explicit by prefixing it with self., which is implicitly called in the virtual machine instructions in the case of a variable which isn’t available in the local table. Interestingly, there is a difference in the bytecode instructions when calling a compared to self.a.
By putting the following code into a file called test.rb and running it with ruby –dump insns ./test.rb (another way to disassemble the code) we can inspect the bytecode:
class T attr_accessor(:a) def initialize @a = 42 end def t self.a a @a end end
The output looks like this:
== disasm: <RubyVM::InstructionSequence:t@./test.rb>==================== 0000 trace 8 ( 8) 0002 trace 1 ( 9) 0004 putself 0005 opt_send_without_block <callinfo!mid:a, argc:0, ARGS_SIMPLE> 0007 pop 0008 trace 1 ( 10) 0010 putself 0011 opt_send_without_block <callinfo!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0013 pop 0014 trace 1 ( 11) 0016 getinstancevariable :@a, <is:0> 0019 trace 16 ( 12) 0021 leave ( 11)
The pertinent instructions here are 0004 and 0005 which look very similar to 0010 and 0011 – these both include the instruction putself, even though only the first one actually explicitly refers to the local object instance in the Ruby code. Interestingly, there is a difference in the flags passed to these instructions: FCALL|VCALL|ARGS_SIMPLE vs ARGS_SIMPLE. The reason for this difference is left as an exercise for the reader i.e. I don’t know…
And now for the Columbo moment – just one more thing… Instruction 0016 above accesses the instance variable @a directly, but does nothing with it other than leaving it to be returned from the method. If this instruction is moved one line up to where it won’t be returned then it doesn’t appear in the instructions to the virtual machine – the compiler optimises it away. Curiouser and curiouser…