1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

14679 commits

Author SHA1 Message Date
Rafael J. Wysocki
9ee71f513c Merge branch 'pm-cpuidle'
* pm-cpuidle:
  cpuidle: Measure idle state durations with monotonic clock
  cpuidle: fix a suspicious RCU usage in menu governor
  cpuidle: support multiple drivers
  cpuidle: prepare the cpuidle core to handle multiple drivers
  cpuidle: move driver checking within the lock section
  cpuidle: move driver's refcount to cpuidle
  cpuidle: fixup device.h header in cpuidle.h
  cpuidle / sysfs: move structure declaration into the sysfs.c file
  cpuidle: Get typical recent sleep interval
  cpuidle: Set residency to 0 if target Cstate not enter
  cpuidle: Quickly notice prediction failure in general case
  cpuidle: Quickly notice prediction failure for repeat mode
  cpuidle / sysfs: move kobj initialization in the syfs file
  cpuidle / sysfs: change function parameter
2012-11-29 21:46:14 +01:00
Rafael J. Wysocki
d4c091f13d Merge branch 'acpi-general'
* acpi-general: (38 commits)
  ACPI / thermal: _TMP and _CRT/_HOT/_PSV/_ACx dependency fix
  ACPI: drop unnecessary local variable from acpi_system_write_wakeup_device()
  ACPI: Fix logging when no pci_irq is allocated
  ACPI: Update Dock hotplug error messages
  ACPI: Update Container hotplug error messages
  ACPI: Update Memory hotplug error messages
  ACPI: Update CPU hotplug error messages
  ACPI: Add acpi_handle_<level>() interfaces
  ACPI: remove use of __devexit
  ACPI / PM: Add Sony Vaio VPCEB1S1E to nonvs blacklist.
  ACPI / battery: Correct battery capacity values on Thinkpads
  Revert "ACPI / x86: Add quirk for "CheckPoint P-20-00" to not use bridge _CRS_ info"
  ACPI: create _SUN sysfs file
  ACPI / memhotplug: bind the memory device when the driver is being loaded
  ACPI / memhotplug: don't allow to eject the memory device if it is being used
  ACPI / memhotplug: free memory device if acpi_memory_enable_device() failed
  ACPI / memhotplug: fix memory leak when memory device is unbound from acpi_memhotplug
  ACPI / memhotplug: deal with eject request in hotplug queue
  ACPI / memory-hotplug: add memory offline code to acpi_memory_device_remove()
  ACPI / memory-hotplug: call acpi_bus_trim() to remove memory device
  ...

Conflicts:
	include/linux/acpi.h (two additions at the end of the same file)
2012-11-29 21:43:06 +01:00
Rafael J. Wysocki
c8b6817103 Merge branch 'pm-qos'
* pm-qos:
  PM / QoS: Handle device PM QoS flags while removing constraints
  PM / QoS: Resume device before exposing/hiding PM QoS flags
  PM / QoS: Document request manipulation requirement for flags
  PM / QoS: Fix a free error in the dev_pm_qos_constraints_destroy()
  PM / QoS: Fix the return value of dev_pm_qos_update_request()
  PM / ACPI: Take device PM QoS flags into account
  PM / Domains: Check device PM QoS flags in pm_genpd_poweroff()
  PM / QoS: Make it possible to expose PM QoS device flags to user space
  PM / QoS: Introduce PM QoS device flags support
  PM / QoS: Prepare struct dev_pm_qos_request for more request types
  PM / QoS: Introduce request and constraint data types for PM QoS flags
  PM / QoS: Prepare device structure for adding more constraint types
2012-11-29 21:40:32 +01:00
David S. Miller
8a2cf062b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 12:51:17 -05:00
Al Viro
541880d9a2 do_coredump(): get rid of pt_regs argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:25 -05:00
Al Viro
4aaefee589 print_fatal_signal(): get rid of pt_regs argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:25 -05:00
Al Viro
94eb22d505 ptrace_signal(): get rid of unused arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:24 -05:00
Al Viro
b7f9591c44 get rid of ptrace_signal_deliver() arguments
the first one is equal to signal_pt_regs(), the second is never used
(and always NULL, while we are at it).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:24 -05:00
Al Viro
e80d6661c3 flagday: kill pt_regs argument of do_fork()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:08 -05:00
Al Viro
18c26c27ae death to idle_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:42 -05:00
Al Viro
62e791c1b8 don't pass regs to copy_process()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:42 -05:00
Al Viro
afa86fc426 flagday: don't pass regs to copy_thread()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:42 -05:00
Al Viro
c62d773a37 audit: no nested contexts anymore...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:36 -05:00
Al Viro
d2125043ae generic sys_fork / sys_vfork / sys_clone
... and get rid of idiotic struct pt_regs * in asm-generic/syscalls.h
prototypes of the same, while we are at it.  Eventually we want those
in linux/syscalls.h, of course, but that'll have to wait a bit.

Note that there are *three* variants of sys_clone() order of arguments.
Braindamage galore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:49:04 -05:00
Al Viro
c4144670fd kill daemonize()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:49:02 -05:00
Greg Thelen
9718ceb343 cgroup: list_del_init() on removed events
Use list_del_init() rather than list_del() to remove events from
cgrp->event_list.  No functional change.  This is just defensive
coding.

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-28 13:52:14 -08:00
Greg Thelen
205a872bd6 cgroup: fix lockdep warning for event_control
The cgroup_event_wake() function is called with the wait queue head
locked and it takes cgrp->event_list_lock. However, in cgroup_rmdir()
remove_wait_queue() was being called after taking
cgrp->event_list_lock.  Correct the lock ordering by using a temporary
list to obtain the event list to remove from the wait queue.

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-28 13:51:56 -08:00
Bill Pemberton
e3a1a5ec5c kernel/ksysfs.c: remove CONFIG_HOTPLUG ifdefs
Remove conditional code based on CONFIG_HOTPLUG being false.  It's
always on now in preparation of it going away as an option.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-28 10:33:03 -08:00
Bill Pemberton
3b572b506c sysctl: remove CONFIG_HOTPLUG ifdefs
Remove conditional code based on CONFIG_HOTPLUG being false.  It's
always on now in preparation of it going away as an option.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-28 10:33:03 -08:00
Frederic Weisbecker
fa09205783 cputime: Comment cputime's adjusting code
The reason for the scaling and monotonicity correction performed
by cputime_adjust() may not be immediately clear to the reviewer.

Add some comments to explain what happens there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28 17:08:20 +01:00
Frederic Weisbecker
d37f761dbd cputime: Consolidate cputime adjustment code
task_cputime_adjusted() and thread_group_cputime_adjusted()
essentially share the same code. They just don't use the same
source:

* The first function uses the cputime in the task struct and the
previous adjusted snapshot that ensures monotonicity.

* The second adds the cputime of all tasks in the group and the
previous adjusted snapshot of the whole group from the signal
structure.

Just consolidate the common code that does the adjustment. These
functions just need to fetch the values from the appropriate
source.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28 17:08:10 +01:00
Frederic Weisbecker
e80d0a1ae8 cputime: Rename thread_group_times to thread_group_cputime_adjusted
We have thread_group_cputime() and thread_group_times(). The naming
doesn't provide enough information about the difference between
these two APIs.

To lower the confusion, rename thread_group_times() to
thread_group_cputime_adjusted(). This name better suggests that
it's a version of thread_group_cputime() that does some stabilization
on the raw cputime values. ie here: scale on top of CFS runtime
stats and bound lower value for monotonicity.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28 17:07:57 +01:00
Frederic Weisbecker
a634f93335 cputime: Move thread_group_cputime() to sched code
thread_group_cputime() is a general cputime API that is not only
used by posix cpu timer. Let's move this helper to sched code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28 17:07:38 +01:00
Li Zhong
fddfb02ad0 cgroup: move list add after list head initilization
2243076ad1 ("cgroup: initialize cgrp->allcg_node in
init_cgroup_housekeeping()") initializes cgrp->allcg_node in
init_cgroup_housekeeping().  Then in init_cgroup_root(), we should
call init_cgroup_housekeeping() before adding it to &root->allcg_list;
otherwise, we are initializing an entry already in a list.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-28 06:02:39 -08:00
Marcelo Tosatti
e0b306fef9 time: export time information for KVM pvclock
As suggested by John, export time data similarly to how its
done by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.

Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:12 -02:00
Marcelo Tosatti
582b336ec2 sched: add notifier for cross-cpu migrations
Originally from Jeremy Fitzhardinge.

Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:09 -02:00
Darren Hart
aa10990e02 futex: avoid wake_futex() for a PI futex_q
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed.  Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.

While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.

Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected.  To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.

This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.

[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26 17:41:24 -08:00
Chuansheng Liu
8ffeb9b0e6 watchdog: using u64 in get_sample_period()
In get_sample_period(), unsigned long is not enough:

  watchdog_thresh * 2 * (NSEC_PER_SEC / 5)

case1:
  watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800

case2:
 set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000

In case2, we need use u64 to express the sample period.  Otherwise,
changing the threshold thru proc often can not be successful.

Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26 17:41:24 -08:00
Thomas Gleixner
9c3f9e2816 Merge branch 'fortglx/3.8/time' of git://git.linaro.org/people/jstultz/linux into timers/core
Fix trivial conflicts in: kernel/time/tick-sched.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-11-21 20:31:52 +01:00
Tao Ma
d0b2fdd2a5 cgroup: remove obsolete guarantee from cgroup_task_migrate.
'guarantee' is already removed from cgroup_task_migrate, so remove
the corresponding comments. Some other typos in cgroup are also
changed.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-20 06:44:58 -08:00
Eric W. Biederman
98f842e675 proc: Usable inode numbers for the namespace file descriptors.
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:49 -08:00
Eric W. Biederman
c450f371d4 userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
To keep things sane in the context of file descriptor passing derive the
user namespace that uids are mapped into from the opener of the file
instead of from current.

When writing to the maps file the lower user namespace must always
be the parent user namespace, or setting the mapping simply does
not make sense.  Enforce that the opener of the file was in
the parent user namespace or the user namespace whose mapping
is being set.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:55 -08:00
Eric W. Biederman
b2e0d98705 userns: Implement unshare of the user namespace
- Add CLONE_THREAD to the unshare flags if CLONE_NEWUSER is selected
  As changing user namespaces is only valid if all there is only
  a single thread.
- Restore the code to add CLONE_VM if CLONE_THREAD is selected and
  the code to addCLONE_SIGHAND if CLONE_VM is selected.
  Making the constraints in the code clear.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:14 -08:00
Eric W. Biederman
cde1975bc2 userns: Implent proc namespace operations
This allows entering a user namespace, and the ability
to store a reference to a user namespace with a bind
mount.

Addition of missing userns_ns_put in userns_install
from Gao feng <gaofeng@cn.fujitsu.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:13 -08:00
Eric W. Biederman
4c44aaafa8 userns: Kill task_user_ns
The task_user_ns function hides the fact that it is getting the user
namespace from struct cred on the task.  struct cred may go away as
soon as the rcu lock is released.  This leads to a race where we
can dereference a stale user namespace pointer.

To make it obvious a struct cred is involved kill task_user_ns.

To kill the race modify the users of task_user_ns to only
reference the user namespace while the rcu lock is held.

Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:44 -08:00
Eric W. Biederman
bcf58e725d userns: Make create_new_namespaces take a user_ns parameter
Modify create_new_namespaces to explicitly take a user namespace
parameter, instead of implicitly through the task_struct.

This allows an implementation of unshare(CLONE_NEWUSER) where
the new user namespace is not stored onto the current task_struct
until after all of the namespaces are created.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:43 -08:00
Eric W. Biederman
142e1d1d5f userns: Allow unprivileged use of setns.
- Push the permission check from the core setns syscall into
  the setns install methods where the user namespace of the
  target namespace can be determined, and used in a ns_capable
  call.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:42 -08:00
Eric W. Biederman
b33c77ef23 userns: Allow unprivileged users to create new namespaces
If an unprivileged user has the appropriate capabilities in their
current user namespace allow the creation of new namespaces.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:41 -08:00
Eric W. Biederman
37657da3c5 userns: Allow setting a userns mapping to your current uid.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:40 -08:00
Dave Jones
bf3071f5a0 tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read
WARN shouldn't be used as a means of communicating failure to a userspace programmer.

Link: http://lkml.kernel.org/r/20120725153908.GA25203@redhat.com

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-19 15:25:09 -05:00
Anton Vorontsov
717a9ef7f3 tracing: Remove unneeded checks from the stack tracer
It seems that 'ftrace_enabled' flag should not be used inside the tracer
functions. The ftrace core is using this flag for internal purposes, and
the flag wasn't meant to be used in tracers' runtime checks.

stack tracer is the only tracer that abusing the flag. So stop it from
serving as a bad example.

Also, there is a local 'stack_trace_disabled' flag in the stack tracer,
which is never updated; so it can be removed as well.

Link: http://lkml.kernel.org/r/1342637761-9655-1-git-send-email-anton.vorontsov@linaro.org

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-19 15:07:13 -05:00
Tejun Heo
0a950f65e1 cgroup: add cgroup->id
With the introduction of generic cgroup hierarchy iterators, css_id is
being phased out.  It was unnecessarily complex, id'ing the wrong
thing (cgroups need IDs, not CSSes) and has other oddities like not
being available at ->css_alloc().

This patch adds cgroup->id, which is a simple per-hierarchy
ida-allocated ID which is assigned before ->css_alloc() and released
after ->css_free().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2012-11-19 09:02:12 -08:00
Tejun Heo
033fa1c5f5 cgroup, cpuset: remove cgroup_subsys->post_clone()
Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone().  Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core.  cpuset can
check the flag from its ->css_allocate() and take the necessary
action.

Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().

Loosely based on Glauber's "generalize post_clone into post_create"
patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Original-patch-by: Glauber Costa <glommer@parallels.com>
Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-19 08:13:39 -08:00
Tejun Heo
2260e7fc1f cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
clone_children is only meaningful for cpuset and will stay that way.
Rename the flag to reflect that and update documentation.  Also, drop
clone_children() wrapper in cgroup.c.  The thin wrapper is used only a
few times and one of them will go away soon.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-19 08:13:38 -08:00
Tejun Heo
92fb97487a cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are.  Also, update documentation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:38 -08:00
Tejun Heo
b1929db42f cgroup: allow ->post_create() to fail
There could be cases where controllers want to do initialization
operations which may fail from ->post_create().  This patch makes
->post_create() return -errno to indicate failure and online_css()
relay such failures.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-19 08:13:38 -08:00
Tejun Heo
4b8b47eb00 cgroup: update cgroup_create() failure path
cgroup_create() was ignoring failure of cgroupfs files.  Update it
such that, if file creation fails, it rolls back by calling
cgroup_destroy_locked() and returns failure.

Note that error out goto labels are renamed.  The labels are a bit
confusing but will become better w/ later cgroup operation renames.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:38 -08:00
Tejun Heo
b8a2df6a5b cgroup: use mutex_trylock() when grabbing i_mutex of a new cgroup directory
All cgroup directory i_mutexes nest outside cgroup_mutex; however, new
directory creation is a special case.  A new cgroup directory is
created while holding cgroup_mutex.  Populating the new directory
requires both the new directory's i_mutex and cgroup_mutex.  Because
all directory i_mutexes nest outside cgroup_mutex, grabbing both
requires releasing cgroup_mutex first, which isn't a good idea as the
new cgroup isn't yet ready to be manipulated by other cgroup
opreations.

This is worked around by grabbing the new directory's i_mutex while
holding cgroup_mutex before making it visible.  As there's no other
user at that point, grabbing the i_mutex under cgroup_mutex can't lead
to deadlock.

cgroup_create_file() was using I_MUTEX_CHILD to tell lockdep not to
worry about the reverse locking order; however, this creates pseudo
locking dependency cgroup_mutex -> I_MUTEX_CHILD, which isn't true -
all directory i_mutexes are still nested outside cgroup_mutex.  This
pseudo locking dependency can lead to spurious lockdep warnings.

Use mutex_trylock() instead.  This will always succeed and lockdep
doesn't create any locking dependency for it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo
d19e19de48 cgroup: simplify cgroup_load_subsys() failure path
Now that cgroup_unload_subsys() can tell whether the root css is
online or not, we can safely call cgroup_unload_subsys() after idr
init failure in cgroup_load_subsys().

Replace the manual unrolling and invoke cgroup_unload_subsys() on
failure.  This drops cgroup_mutex inbetween but should be safe as the
subsystem will fail try_module_get() and thus can't be mounted
inbetween.  As this means that cgroup_unload_subsys() can be called
before css_sets are rehashed, remove BUG_ON() on %NULL
css_set->subsys[] from cgroup_unload_subsys().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo
a31f2d3ff7 cgroup: introduce CSS_ONLINE flag and on/offline_css() helpers
New helpers on/offline_css() respectively wrap ->post_create() and
->pre_destroy() invocations.  online_css() sets CSS_ONLINE after
->post_create() is complete and offline_css() invokes ->pre_destroy()
iff CSS_ONLINE is set and clears it while also handling the temporary
dropping of cgroup_mutex.

This patch doesn't introduce any behavior change at the moment but
will be used to improve cgroup_create() failure path and allow
->post_create() to fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00