diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index c67a96a2a389..acdad96f78e9 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -1,5 +1,3 @@ - - @@ -65,8 +63,8 @@ All that aside, here are the categories of currently known RCU requirements:
This is followed by a summary, -which is in turn followed by the inevitable -answers to the quick quizzes. +however, the answers to each quick quiz immediately follows the quiz. +Select the big white space with your mouse to see the answer.
Quick Quiz 1:
-Wait a minute!
-You said that updaters can make useful forward progress concurrently
-with readers, but pre-existing readers will block
-synchronize_rcu()!!!
-Just who are you trying to fool???
-
Answer
+
Quick Quiz: |
---|
+ Wait a minute! + You said that updaters can make useful forward progress concurrently + with readers, but pre-existing readers will block + synchronize_rcu()!!! + Just who are you trying to fool??? + |
Answer: |
+ First, if updaters do not wish to be blocked by readers, they can use + call_rcu() or kfree_rcu(), which will + be discussed later. + Second, even when using synchronize_rcu(), the other + update-side code does run concurrently with readers, whether + pre-existing or not. + |
This scenario resembles one of the first uses of RCU in @@ -210,9 +222,20 @@ to guarantee that do_something() never runs concurrently with recovery(), but with little or no synchronization overhead in do_something_dlm(). -
Quick Quiz 2:
-Why is the synchronize_rcu() on line 28 needed?
-
Answer
+
Quick Quiz: |
---|
+ Why is the synchronize_rcu() on line 28 needed? + |
Answer: |
+ Without that extra grace period, memory reordering could result in + do_something_dlm() executing do_something() + concurrently with the last bits of recovery(). + |
In order to avoid fatal problems such as deadlocks, @@ -332,12 +355,27 @@ It also prevents any number of “interesting” compiler optimizations, for example, the use of gp as a scratch location immediately preceding the assignment. -
Quick Quiz 3:
-But rcu_assign_pointer() does nothing to prevent the
-two assignments to p->a and p->b
-from being reordered.
-Can't that also cause problems?
-
Answer
+
Quick Quiz: |
---|
+ But rcu_assign_pointer() does nothing to prevent the + two assignments to p->a and p->b + from being reordered. + Can't that also cause problems? + |
Answer: |
+ No, it cannot. + The readers cannot see either of these two fields until + the assignment to gp, by which time both fields are + fully initialized. + So reordering the assignments + to p->a and p->b cannot possibly + cause any problems. + |
It is tempting to assume that the reader need not do anything special @@ -494,11 +532,42 @@ The rcu_access_pointer() on line 6 is similar to code protected by the corresponding update-side lock. -
Quick Quiz 4:
-Without the rcu_dereference() or the
-rcu_access_pointer(), what destructive optimizations
-might the compiler make use of?
-
Answer
+
Quick Quiz: |
---|
+ Without the rcu_dereference() or the + rcu_access_pointer(), what destructive optimizations + might the compiler make use of? + |
Answer: |
+ Let's start with what happens to do_something_gp()
+ if it fails to use rcu_dereference().
+ It could reuse a value formerly fetched from this same pointer.
+ It could also fetch the pointer from gp in a byte-at-a-time
+ manner, resulting in load tearing, in turn resulting a bytewise
+ mash-up of two distince pointer values.
+ It might even use value-speculation optimizations, where it makes
+ a wrong guess, but by the time it gets around to checking the
+ value, an update has changed the pointer to match the wrong guess.
+ Too bad about any dereferences that returned pre-initialization garbage
+ in the meantime!
+
+
+ + For remove_gp_synchronous(), as long as all modifications + to gp are carried out while holding gp_lock, + the above optimizations are harmless. + However, + with CONFIG_SPARSE_RCU_POINTER=y, + sparse will complain if you + define gp with __rcu and then + access it without using + either rcu_access_pointer() or rcu_dereference(). + |
In short, RCU's publish-subscribe guarantee is provided by the combination @@ -571,28 +640,156 @@ systems with more than one CPU: synchronize_rcu() migrates in the meantime. -
Quick Quiz 5:
-Given that multiple CPUs can start RCU read-side critical sections
-at any time without any ordering whatsoever, how can RCU possibly tell whether
-or not a given RCU read-side critical section starts before a
-given instance of synchronize_rcu()?
-
Answer
+
Quick Quiz: |
---|
+ Given that multiple CPUs can start RCU read-side critical sections + at any time without any ordering whatsoever, how can RCU possibly + tell whether or not a given RCU read-side critical section starts + before a given instance of synchronize_rcu()? + |
Answer: |
+ If RCU cannot tell whether or not a given + RCU read-side critical section starts before a + given instance of synchronize_rcu(), + then it must assume that the RCU read-side critical section + started first. + In other words, a given instance of synchronize_rcu() + can avoid waiting on a given RCU read-side critical section only + if it can prove that synchronize_rcu() started first. + |
Quick Quiz 6:
-The first and second guarantees require unbelievably strict ordering!
-Are all these memory barriers really required?
-
Answer
+
Quick Quiz: |
---|
+ The first and second guarantees require unbelievably strict ordering! + Are all these memory barriers really required? + |
Answer: |
+ Yes, they really are required.
+ To see why the first guarantee is required, consider the following
+ sequence of events:
+
- Quick Quiz 7:
-You claim that rcu_read_lock() and rcu_read_unlock()
-generate absolutely no code in some kernel builds.
-This means that the compiler might arbitrarily rearrange consecutive
-RCU read-side critical sections.
-Given such rearrangement, if a given RCU read-side critical section
-is done, how can you be sure that all prior RCU read-side critical
-sections are done?
-Won't the compiler rearrangements make that impossible to determine?
-
+ Therefore, there absolutely must be a full memory barrier between the + end of the RCU read-side critical section and the end of the + grace period. + + + + The sequence of events demonstrating the necessity of the second rule + is roughly similar: + + +
+ And similarly, without a memory barrier between the beginning of the + grace period and the beginning of the RCU read-side critical section, + CPU 1 might end up accessing the freelist. + + + + The “as if” rule of course applies, so that any + implementation that acts as if the appropriate memory barriers + were in place is a correct implementation. + That said, it is much easier to fool yourself into believing + that you have adhered to the as-if rule than it is to actually + adhere to it! + |
Quick Quiz: |
---|
+ You claim that rcu_read_lock() and rcu_read_unlock() + generate absolutely no code in some kernel builds. + This means that the compiler might arbitrarily rearrange consecutive + RCU read-side critical sections. + Given such rearrangement, if a given RCU read-side critical section + is done, how can you be sure that all prior RCU read-side critical + sections are done? + Won't the compiler rearrangements make that impossible to determine? + |
Answer: |
+ In cases where rcu_read_lock() and rcu_read_unlock()
+ generate absolutely no code, RCU infers quiescent states only at
+ special locations, for example, within the scheduler.
+ Because calls to schedule() had better prevent calling-code
+ accesses to shared variables from being rearranged across the call to
+ schedule(), if RCU detects the end of a given RCU read-side
+ critical section, it will necessarily detect the end of all prior
+ RCU read-side critical sections, no matter how aggressively the
+ compiler scrambles the code.
+
+
+ + Again, this all assumes that the compiler cannot scramble code across + calls to the scheduler, out of interrupt handlers, into the idle loop, + into user-mode code, and so on. + But if your kernel build allows that sort of scrambling, you have broken + far more than just RCU! + |
Note that these memory-barrier requirements do not replace the fundamental @@ -637,9 +834,19 @@ inconvenience can be avoided through use of the call_rcu() and kfree_rcu() API members described later in this document. -
Quick Quiz 8:
-But how does the upgrade-to-write operation exclude other readers?
-
Answer
+
Quick Quiz: |
---|
+ But how does the upgrade-to-write operation exclude other readers? + |
Answer: |
+ It doesn't, just like normal RCU updates, which also do not exclude + RCU readers. + |
This guarantee allows lookup code to be shared between read-side @@ -725,9 +932,20 @@ to do significant reordering. This is by design: Any significant ordering constraints would slow down these fast-path APIs. -
Quick Quiz 9:
-Can't the compiler also reorder this code?
-
Answer
+
Quick Quiz: |
---|
+ Can't the compiler also reorder this code? + |
Answer: |
+ No, the volatile casts in READ_ONCE() and + WRITE_ONCE() prevent the compiler from reordering in + this particular case. + |
Quick Quiz 10:
-Suppose that synchronize_rcu() did wait until all readers had completed.
-Would the updater be able to rely on this?
-
Answer
+
Quick Quiz: |
---|
+ Suppose that synchronize_rcu() did wait until all readers had completed. + Would the updater be able to rely on this? + |
Answer: |
+ No. + Even if synchronize_rcu() were to wait until + all readers had completed, a new reader might start immediately after + synchronize_rcu() completed. + Therefore, the code following + synchronize_rcu() cannot rely on there being no readers + in any case. + |
Quick Quiz 11:
-How long a sequence of grace periods, each separated by an RCU read-side
-critical section, would be required to partition the RCU read-side
-critical sections at the beginning and end of the chain?
-
Answer
+
Quick Quiz: |
---|
+ How long a sequence of grace periods, each separated by an RCU + read-side critical section, would be required to partition the RCU + read-side critical sections at the beginning and end of the chain? + |
Answer: |
+ In theory, an infinite number. + In practice, an unknown number that is sensitive to both implementation + details and timing considerations. + Therefore, even in practice, RCU users must abide by the + theoretical rather than the practical answer. + |
Quick Quiz 12:
-What about sleeping locks?
-
Answer
+
Quick Quiz: |
---|
+ What about sleeping locks? + |
Answer: |
+ These are forbidden within Linux-kernel RCU read-side critical
+ sections because it is not legal to place a quiescent state
+ (in this case, voluntary context switch) within an RCU read-side
+ critical section.
+ However, sleeping locks may be used within userspace RCU read-side
+ critical sections, and also within Linux-kernel sleepable RCU
+ (SRCU)
+ read-side critical sections.
+ In addition, the -rt patchset turns spinlocks into a
+ sleeping locks so that the corresponding critical sections
+ can be preempted, which also means that these sleeplockified
+ spinlocks (but not other sleeping locks!) may be acquire within
+ -rt-Linux-kernel RCU read-side critical sections.
+
+
+ + Note that it is legal for a normal RCU read-side + critical section to conditionally acquire a sleeping locks + (as in mutex_trylock()), but only as long as it does + not loop indefinitely attempting to conditionally acquire that + sleeping locks. + The key point is that things like mutex_trylock() + either return with the mutex held, or return an error indication if + the mutex was not immediately available. + Either way, mutex_trylock() returns immediately without + sleeping. + |
It often comes as a surprise that many algorithms do not require a @@ -1378,12 +1658,27 @@ write an RCU callback function that takes too long. Long-running operations should be relegated to separate threads or (in the Linux kernel) workqueues. -
Quick Quiz 13:
-Why does line 19 use rcu_access_pointer()?
-After all, call_rcu() on line 25 stores into the
-structure, which would interact badly with concurrent insertions.
-Doesn't this mean that rcu_dereference() is required?
-
Answer
+
Quick Quiz: |
---|
+ Why does line 19 use rcu_access_pointer()? + After all, call_rcu() on line 25 stores into the + structure, which would interact badly with concurrent insertions. + Doesn't this mean that rcu_dereference() is required? + |
Answer: |
+ Presumably the ->gp_lock acquired on line 18 excludes + any changes, including any insertions that rcu_dereference() + would protect against. + Therefore, any insertions will be delayed until after + ->gp_lock + is released on line 25, which in turn means that + rcu_access_pointer() suffices. + |
However, all that remove_gp_cb() is doing is @@ -1430,14 +1725,31 @@ This was due to the fact that RCU was not heavily used within DYNIX/ptx, so the very few places that needed something like synchronize_rcu() simply open-coded it. -
Quick Quiz 14:
-Earlier it was claimed that call_rcu() and
-kfree_rcu() allowed updaters to avoid being blocked
-by readers.
-But how can that be correct, given that the invocation of the callback
-and the freeing of the memory (respectively) must still wait for
-a grace period to elapse?
-
Answer
+
Quick Quiz: |
---|
+ Earlier it was claimed that call_rcu() and + kfree_rcu() allowed updaters to avoid being blocked + by readers. + But how can that be correct, given that the invocation of the callback + and the freeing of the memory (respectively) must still wait for + a grace period to elapse? + |
Answer: |
+ We could define things this way, but keep in mind that this sort of + definition would say that updates in garbage-collected languages + cannot complete until the next time the garbage collector runs, + which does not seem at all reasonable. + The key point is that in most cases, an updater using either + call_rcu() or kfree_rcu() can proceed to the + next update as soon as it has invoked call_rcu() or + kfree_rcu(), without having to wait for a subsequent + grace period. + |
But what if the updater must wait for the completion of code to be @@ -1862,11 +2174,26 @@ kthreads to be spawned. Therefore, invoking synchronize_rcu() during scheduler initialization can result in deadlock. -
Quick Quiz 15:
-So what happens with synchronize_rcu() during
-scheduler initialization for CONFIG_PREEMPT=n
-kernels?
-
Answer
+
Quick Quiz: |
---|
+ So what happens with synchronize_rcu() during + scheduler initialization for CONFIG_PREEMPT=n + kernels? + |
Answer: |
+ In CONFIG_PREEMPT=n kernel, synchronize_rcu() + maps directly to synchronize_sched(). + Therefore, synchronize_rcu() works normally throughout + boot in CONFIG_PREEMPT=n kernels. + However, your code must also work in CONFIG_PREEMPT=y kernels, + so it is still necessary to avoid invoking synchronize_rcu() + during scheduler initialization. + |
I learned of these boot-time requirements as a result of a series of @@ -2571,10 +2898,23 @@ If you needed to wait on multiple different flavors of SRCU (but why???), you would need to create a wrapper function resembling call_my_srcu() for each SRCU flavor. -
Quick Quiz 16:
-But what if I need to wait for multiple RCU flavors, but I also need
-the grace periods to be expedited?
-
Answer
+
Quick Quiz: |
---|
+ But what if I need to wait for multiple RCU flavors, but I also need + the grace periods to be expedited? + |
Answer: |
+ If you are using expedited grace periods, there should be less penalty + for waiting on them in succession. + But if that is nevertheless a problem, you can use workqueues + or multiple kthreads to wait on the various expedited grace + periods concurrently. + |
Again, it is usually better to adjust the RCU read-side critical sections @@ -2678,377 +3018,4 @@ and is provided under the terms of the Creative Commons Attribution-Share Alike 3.0 United States license. -
Quick Quiz 1: -Wait a minute! -You said that updaters can make useful forward progress concurrently -with readers, but pre-existing readers will block -synchronize_rcu()!!! -Just who are you trying to fool??? - - -
Answer: -First, if updaters do not wish to be blocked by readers, they can use -call_rcu() or kfree_rcu(), which will -be discussed later. -Second, even when using synchronize_rcu(), the other -update-side code does run concurrently with readers, whether pre-existing -or not. - - -
Back to Quick Quiz 1. - - -
Quick Quiz 2: -Why is the synchronize_rcu() on line 28 needed? - - -
Answer: -Without that extra grace period, memory reordering could result in -do_something_dlm() executing do_something() -concurrently with the last bits of recovery(). - - -
Back to Quick Quiz 2. - - -
Quick Quiz 3: -But rcu_assign_pointer() does nothing to prevent the -two assignments to p->a and p->b -from being reordered. -Can't that also cause problems? - - -
Answer: -No, it cannot. -The readers cannot see either of these two fields until -the assignment to gp, by which time both fields are -fully initialized. -So reordering the assignments -to p->a and p->b cannot possibly -cause any problems. - - -
Back to Quick Quiz 3. - - -
Quick Quiz 4: -Without the rcu_dereference() or the -rcu_access_pointer(), what destructive optimizations -might the compiler make use of? - - -
Answer: -Let's start with what happens to do_something_gp() -if it fails to use rcu_dereference(). -It could reuse a value formerly fetched from this same pointer. -It could also fetch the pointer from gp in a byte-at-a-time -manner, resulting in load tearing, in turn resulting a bytewise -mash-up of two distince pointer values. -It might even use value-speculation optimizations, where it makes a wrong -guess, but by the time it gets around to checking the value, an update -has changed the pointer to match the wrong guess. -Too bad about any dereferences that returned pre-initialization garbage -in the meantime! - -
-For remove_gp_synchronous(), as long as all modifications -to gp are carried out while holding gp_lock, -the above optimizations are harmless. -However, -with CONFIG_SPARSE_RCU_POINTER=y, -sparse will complain if you -define gp with __rcu and then -access it without using -either rcu_access_pointer() or rcu_dereference(). - - -
Back to Quick Quiz 4. - - -
Quick Quiz 5: -Given that multiple CPUs can start RCU read-side critical sections -at any time without any ordering whatsoever, how can RCU possibly tell whether -or not a given RCU read-side critical section starts before a -given instance of synchronize_rcu()? - - -
Answer: -If RCU cannot tell whether or not a given -RCU read-side critical section starts before a -given instance of synchronize_rcu(), -then it must assume that the RCU read-side critical section -started first. -In other words, a given instance of synchronize_rcu() -can avoid waiting on a given RCU read-side critical section only -if it can prove that synchronize_rcu() started first. - - -
Back to Quick Quiz 5. - - -
Quick Quiz 6: -The first and second guarantees require unbelievably strict ordering! -Are all these memory barriers really required? - - -
Answer: -Yes, they really are required. -To see why the first guarantee is required, consider the following -sequence of events: - -
-Therefore, there absolutely must be a full memory barrier between the -end of the RCU read-side critical section and the end of the -grace period. - -
-The sequence of events demonstrating the necessity of the second rule -is roughly similar: - -
-And similarly, without a memory barrier between the beginning of the -grace period and the beginning of the RCU read-side critical section, -CPU 1 might end up accessing the freelist. - -
-The “as if” rule of course applies, so that any implementation -that acts as if the appropriate memory barriers were in place is a -correct implementation. -That said, it is much easier to fool yourself into believing that you have -adhered to the as-if rule than it is to actually adhere to it! - - -
Back to Quick Quiz 6. - - -
Quick Quiz 7: -You claim that rcu_read_lock() and rcu_read_unlock() -generate absolutely no code in some kernel builds. -This means that the compiler might arbitrarily rearrange consecutive -RCU read-side critical sections. -Given such rearrangement, if a given RCU read-side critical section -is done, how can you be sure that all prior RCU read-side critical -sections are done? -Won't the compiler rearrangements make that impossible to determine? - - -
Answer: -In cases where rcu_read_lock() and rcu_read_unlock() -generate absolutely no code, RCU infers quiescent states only at -special locations, for example, within the scheduler. -Because calls to schedule() had better prevent calling-code -accesses to shared variables from being rearranged across the call to -schedule(), if RCU detects the end of a given RCU read-side -critical section, it will necessarily detect the end of all prior -RCU read-side critical sections, no matter how aggressively the -compiler scrambles the code. - -
-Again, this all assumes that the compiler cannot scramble code across -calls to the scheduler, out of interrupt handlers, into the idle loop, -into user-mode code, and so on. -But if your kernel build allows that sort of scrambling, you have broken -far more than just RCU! - - -
Back to Quick Quiz 7. - - -
Quick Quiz 8: -But how does the upgrade-to-write operation exclude other readers? - - -
Answer: -It doesn't, just like normal RCU updates, which also do not exclude -RCU readers. - - -
Back to Quick Quiz 8. - - -
Quick Quiz 9: -Can't the compiler also reorder this code? - - -
Answer: -No, the volatile casts in READ_ONCE() and -WRITE_ONCE() prevent the compiler from reordering in -this particular case. - - -
Back to Quick Quiz 9. - - -
Quick Quiz 10: -Suppose that synchronize_rcu() did wait until all readers had completed. -Would the updater be able to rely on this? - - -
Answer: -No. -Even if synchronize_rcu() were to wait until -all readers had completed, a new reader might start immediately after -synchronize_rcu() completed. -Therefore, the code following -synchronize_rcu() cannot rely on there being no readers -in any case. - - -
Back to Quick Quiz 10. - - -
Quick Quiz 11: -How long a sequence of grace periods, each separated by an RCU read-side -critical section, would be required to partition the RCU read-side -critical sections at the beginning and end of the chain? - - -
Answer: -In theory, an infinite number. -In practice, an unknown number that is sensitive to both implementation -details and timing considerations. -Therefore, even in practice, RCU users must abide by the theoretical rather -than the practical answer. - - -
Back to Quick Quiz 11. - - -
Quick Quiz 12: -What about sleeping locks? - - -
Answer: -These are forbidden within Linux-kernel RCU read-side critical sections -because it is not legal to place a quiescent state (in this case, -voluntary context switch) within an RCU read-side critical section. -However, sleeping locks may be used within userspace RCU read-side critical -sections, and also within Linux-kernel sleepable RCU -(SRCU) -read-side critical sections. -In addition, the -rt patchset turns spinlocks into a sleeping locks so -that the corresponding critical sections can be preempted, which -also means that these sleeplockified spinlocks (but not other sleeping locks!) -may be acquire within -rt-Linux-kernel RCU read-side critical sections. - -
-Note that it is legal for a normal RCU read-side critical section -to conditionally acquire a sleeping locks (as in mutex_trylock()), -but only as long as it does not loop indefinitely attempting to -conditionally acquire that sleeping locks. -The key point is that things like mutex_trylock() -either return with the mutex held, or return an error indication if -the mutex was not immediately available. -Either way, mutex_trylock() returns immediately without sleeping. - - -
Back to Quick Quiz 12. - - -
Quick Quiz 13: -Why does line 19 use rcu_access_pointer()? -After all, call_rcu() on line 25 stores into the -structure, which would interact badly with concurrent insertions. -Doesn't this mean that rcu_dereference() is required? - - -
Answer: -Presumably the ->gp_lock acquired on line 18 excludes -any changes, including any insertions that rcu_dereference() -would protect against. -Therefore, any insertions will be delayed until after ->gp_lock -is released on line 25, which in turn means that -rcu_access_pointer() suffices. - - -
Back to Quick Quiz 13. - - -
Quick Quiz 14: -Earlier it was claimed that call_rcu() and -kfree_rcu() allowed updaters to avoid being blocked -by readers. -But how can that be correct, given that the invocation of the callback -and the freeing of the memory (respectively) must still wait for -a grace period to elapse? - - -
Answer: -We could define things this way, but keep in mind that this sort of -definition would say that updates in garbage-collected languages -cannot complete until the next time the garbage collector runs, -which does not seem at all reasonable. -The key point is that in most cases, an updater using either -call_rcu() or kfree_rcu() can proceed to the -next update as soon as it has invoked call_rcu() or -kfree_rcu(), without having to wait for a subsequent -grace period. - - -
Back to Quick Quiz 14. - - -
Quick Quiz 15: -So what happens with synchronize_rcu() during -scheduler initialization for CONFIG_PREEMPT=n -kernels? - - -
Answer: -In CONFIG_PREEMPT=n kernel, synchronize_rcu() -maps directly to synchronize_sched(). -Therefore, synchronize_rcu() works normally throughout -boot in CONFIG_PREEMPT=n kernels. -However, your code must also work in CONFIG_PREEMPT=y kernels, -so it is still necessary to avoid invoking synchronize_rcu() -during scheduler initialization. - - -
Back to Quick Quiz 15. - - -
Quick Quiz 16: -But what if I need to wait for multiple RCU flavors, but I also need -the grace periods to be expedited? - - -
Answer: -If you are using expedited grace periods, there should be less penalty -for waiting on them in succession. -But if that is nevertheless a problem, you can use workqueues or multiple -kthreads to wait on the various expedited grace periods concurrently. - - -