Interacting with the Ruby scheduler
2019-12-07
I'm trying to wrap my head around how Ruby application code interacts with the (thread) scheduler in the runtime. My main interest it the interaction between C extensions and the scheduler.
I'm thinking about / looking at MRI Ruby 2.6. Usual disclaimer: this is me thinking out loud as I learn. Lots of guessing and speculation.
The elephant in the room is the Global VM Lock (GVL) also known as the
Global Interpreter Lock (GIL). Ruby threads (thread.c
) are backed by
Posix threads (thread_pthread.c
), unless you're on win32
(thread_win32.c
). I don't know or work with win32 so I'm going to
ignore that part.
Every time you create a new Ruby thread (Thread.new
) you create a
new Posix thread. The OS will schedule those as it sees fit. Most of
the time, though, those threads will just sit around blocked on trying
to acquire the GVL. Only one thread at a time can use the Ruby VM.
Because of this, a multi-threaded Ruby program is seriously limited in
how well it can utilize multiple CPU's with multi-threading.
One of the main things we want from a scheduler is fairness: concurrent tasks should all make a fair amount of progress. You don't want some task to stall for a long time at the expense of others.
Fairness is especially important in multi threaded application servers where you handle requests from many unrelated clients (users). As a user you don't want someone else's slow request to make your otherwise fast request slow.
Cooperative schedulers (e.g. Ruby, Go) cannot guarantee fairness; because they are cooperative they are at the mercy of the application code. Even within cooperative schedulers I suspect Ruby is at the unfair end of the fairness spectrum.
That is a strong claim and I may be wrong. But here is why I think this is the case.
- Ruby application code typically relies a lot on C extensions
- Ruby C extensions commonly do not release the GVL, except for doing IO or syscalls
I can back up neither of these claims with numbers; they are gut feelings based on my experience with working at GitLab. Take this with as many grains of salt as necessary.
The first claim rests on two things. One, the high number of gems (Ruby libraries) in GitLab that have C extensions. Two, the general philosophy I have picked up from books (which books?) of building applications in "scripting languages" for their high initial development velocity, and then replacing the slow parts with C. This is not the only way to grow applications over time but it is a way, and I think this philosophy is in line with the idea of building application in Ruby on Rails.
To be honest I would like to have a better understanding of whether C extensions really are that prevalent in Ruby and if so why. For now I make do with this strong impression.
The second claim is also something that ought to be backed up by an analysis. I might do one one day. For now I have been doing spot checks on some of the Ruby libraries used in GitLab, and usually see little or no interaction with the scheduler -- except in libraries that do IO outside the standard library, like kgio, nio4r, raindrops, grpc.
Browsing through the Ruby source code I see the following reasons for code to interact with the scheduler:
- "I'm blocked, please let another thread take the GVL" —
rb_thread_sleep_*
- "Now might be a good time to pass the GVL to another thread" —
rb_thread_schedule
- "I'm going to be busy for a while without touching the Ruby VM" —
rb_thread_call_without_gvl
I don't have a good understanding of reason (2) yet. My hunch is that this is mainly done to prevent deadlocks.
Reasons (1) and (3) line up well with IO and system calls. In Ruby
itself, you can see rb_thread_call_without_gvl
being used for that
in dir.c
, file.c
, io.c
and process.c
. You can see sleep-related
calls (rb_thead_sleep.*
) in io.c
and process.c
.
There is an interesting exception to the "IO and syscalls" pattern in
bignum.c
. It appears that there we have a function that does
arithmetic, so probably something that burns through a lot of CPU,
running outside the GVL.
Another major exception is in ext/fiddle/function.c
. Fiddle is a
library that lets you dynamically define FFI calls into C libraries.
There it appears that such function calls release the GVL by default,
which is good. And contrary to the main pessimistic thrust of this
note. I don't think it takes away from my claims, though, because in
my impression, a lot of C extensions do not use Fiddle so they don't
benefit from this "release GVL by default" behavior.
I see some instances of rb_thread_call_without_gvl
in the Socket
,
Readline
, Zlib
and OpenSSL
parts of the Ruby stdlib. I suspect
that those are all also syscalls or IO. In OpenSSL, for example, the
GVL is released around function calls that generate keys. That usually
involves IO from /dev/random
or something related.
Tags: ruby