There's an existing helper for setting TASK_RUNNING; must've gotten
lost last time we did this cleanup.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.409696194@infradead.org
With the introduction of per-cpu wakeup devices that can be used in
preference to the broadcast timer, print the name of such devices when
they are available.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-6-will@kernel.org
When configuring the broadcast timer on entry to and exit from deep idle
states, prefer a per-CPU wakeup timer if one exists.
On entry to idle, stop the tick device and transfer the next event into
the oneshot wakeup device, which will serve as the wakeup from idle. To
avoid the overhead of additional hardware accesses on exit from idle,
leave the timer armed and treat the inevitable interrupt as a (possibly
spurious) tick event.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-5-will@kernel.org
Some SoCs have two per-cpu timer implementations where the timer with the
higher rating stops in deep idle (i.e. suffers from CLOCK_EVT_FEAT_C3STOP)
but is otherwise preferable to the timer with the lower rating. In such a
design, selecting the higher rated devices relies on a global broadcast
timer and IPIs to wake up from deep idle states.
To avoid the reliance on a global broadcast timer and also to reduce the
overhead associated with the IPI wakeups, extend
tick_install_broadcast_device() to manage per-cpu wakeup timers separately
from the broadcast device.
For now, these timers remain unused.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-4-will@kernel.org
In preparation for adding support for per-cpu wakeup timers, split
_tick_broadcast_oneshot_control() into a helper function which deals
only with the broadcast timer management across idle transitions.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-3-will@kernel.org
tick-broadcast.o is only built if CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
so remove the redundant #ifdef guards around the definition of
tick_receive_broadcast().
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-2-will@kernel.org
Use the DEVICE_ATTR_[RO|WO] helpers instead of plain DEVICE_ATTR, which
makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210523065825.19684-1-yuehaibing@huawei.com
Checking for and processing RCU-nocb deferred wakeup upon user/guest
entry is only relevant when nohz_full runs on the local CPU, otherwise
the periodic tick should take care of it.
Make sure we don't needlessly pollute these fast-paths as a -3%
performance regression on a will-it-scale.per_process_ops has been
reported so far.
Fixes: 47b8ff194c (entry: Explicitly flush pending rcuog wakeup before last rescheduling point)
Fixes: 4ae7dc97f7 (entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point)
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210527113441.465489-1-frederic@kernel.org
Call tick_nohz_task_switch() slightly earlier after the context switch
to benefit from disabled IRQs. This way the function doesn't need to
disable them once more.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210512232924.150322-10-frederic@kernel.org
When the tick dependency of a task is updated, we want it to aknowledge
the new state and restart the tick if needed. If the task is not
running, we don't need to kick it because it will observe the new
dependency upon scheduling in. But if the task is running, we may need
to send an IPI to it so that it gets notified.
Unfortunately we don't have the means to check if a task is running
in a race free way. Checking p->on_cpu in a synchronized way against
p->tick_dep_mask would imply adding a full barrier between
prepare_task_switch() and tick_nohz_task_switch(), which we want to
avoid in this fast-path.
Therefore we blindly fire an IPI to the task's CPU.
Meanwhile we can check if the task is queued on the CPU rq because
p->on_rq is always set to TASK_ON_RQ_QUEUED _before_ schedule() and its
full barrier that precedes tick_nohz_task_switch(). And if the task is
queued on a nohz_full CPU, it also has fair chances to be running as the
isolation constraints prescribe running single tasks on full dynticks
CPUs.
So use this as a trick to check if we can spare an IPI toward a
non-running task.
NOTE: For the ordering to be correct, it is assumed that we never
deactivate a task while it is running, the only exception being the task
deactivating itself while scheduling out.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-9-frederic@kernel.org
Rather than waking up all nohz_full CPUs on the system, only wake up
the target CPUs of member threads of the signal.
Reduces interruptions to nohz_full CPUs.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-8-frederic@kernel.org
When adding a tick dependency to a task, its necessary to
wake up the CPU where the task resides to reevaluate tick
dependencies on that CPU.
However the current code wakes up all nohz_full CPUs, which
is unnecessary.
Switch to waking up a single CPU, by using ordering of writes
to task->cpu and task->tick_dep_mask.
[ mingo: Minor readability edit. ]
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-7-frederic@kernel.org
CONFIG_NO_HZ_FULL behaves just like CONFIG_NO_HZ_IDLE by default.
Reassure distros about it.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-6-frederic@kernel.org
The idle_exittime field of tick_sched is used to record the time when
the idle state was left. but currently the idle_exittime is updated in
the function tick_nohz_restart_sched_tick(), which is not always in idle
state when nohz_full is configured:
tick_irq_exit
tick_nohz_irq_exit
tick_nohz_full_update_tick
tick_nohz_restart_sched_tick
ts->idle_exittime = now;
It's thus overwritten by mistake on nohz_full tick restart. Move the
update to the appropriate idle exit path instead.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-5-frederic@kernel.org
The vtime_accounting_enabled_this_cpu() early check already makes what
follows as dead code in the case of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.
No need to keep the ifdeferry around.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-4-frederic@kernel.org
In nohz_full mode, switching from idle to a task will unconditionally
issue a tick restart. If the task is alone in the runqueue or is the
highest priority, the tick will fire once then eventually stop. But that
alone is still undesired noise.
Therefore, only restart the tick on idle exit when it's strictly
necessary.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-3-frederic@kernel.org
RTC drivers used to leave .set_alarm() NULL in order to signal the RTC
device doesn't support alarms. The drivers are now clearing the
RTC_FEATURE_ALARM bit for that purpose in order to keep the rtc_class_ops
structure const. So now, .set_alarm() is set unconditionally and this
possibly causes the alarmtimer code to select an RTC device that doesn't
support alarms.
Test RTC_FEATURE_ALARM instead of relying on ops->set_alarm to determine
whether alarms are available.
Fixes: 7ae41220ef ("rtc: introduce features bitfield")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210511014516.563031-1-alexandre.belloni@bootlin.com
This reverts commit dcd42591eb.
The only user was RCU/nocb.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation,
zap under read lock, enable/disably dirty page logging under
read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing
the architecture-specific code
- Some selftests improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
+2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
=AXUi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate
cpufreq driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx
cpufreq driver and drop the CPU PM clock .set_parent method for
armada-37xx (Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values
in cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags()
to avoid issuse in some cases when the device depends on an ACPI
power resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks()
to pm_runtime.h and drop the unused try_to_freeze_nowarn()
definition (YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in
the wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check
during resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to
the new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmCHAUISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxAxMP/0tFjgxeaJ3chYaiqoPlk2QX/XdwqJvm
8OOu2qBMWbt2bubcIlAPpdlCNaERI4itF7E8za7t9alswdq7YPWGmNR9snCXUKhD
BzERuicZTeOcCk2P3DTgzLVc4EzF6wutA3lTdYYZIpf+LuuB+guG8zgMzScRHIsM
N3I83O+iLTA9ifIqN0/wH//a0ISvo6rSWtcbx+48d5bYvYixW7CsBmoxWHhGiYsw
4PJ4AzbdNJEhQp91SBYPIAmqwV88FZUPoYnRazXMxOSevMewhP9JuCHXAujC3gLV
l5d2TBaBV4EBYLD5tfCpJvHMXhv/yBpg6KRivjri+zEnY1TAqIlfR4vYiL7puVvQ
PdwjyvNFDNGyUSX/AAwYF6F4WCtIhw8hCahw6Dw2zcDz0plFdRZmWAiTdmIjECJK
8EvwJNlmdl27G1y+EBc6NnwzEFZQwiu9F5bUHUkmc3fF1M1aFTza8WDNDo30TC94
94c+uVBRL2fBePl4FfGZATfJbOMb8+vDIkroQxrIjQDT/7Ha3Mz75JZDRHItZo92
+4fES2eFdfZISCLIQMBY5TeXox3O8qsirC1k4qELwy71gPUE9CpP3FkxKIvyZLlv
+6fS9ttpUfyFBF7gqrEy+ziVk1Rm4oorLmWCtthz4xyerzj5+vtZQqKzcySk0GA5
hYkseZkedR6y
=t+SG
-----END PGP SIGNATURE-----
Merge tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add some new hardware support (for example, IceLake-D idle
states in intel_idle), fix some issues (for example, the handling of
negative "sleep length" values in cpuidle governors), add new
functionality to the existing drivers (for example, scale-invariance
support in the ACPI CPPC cpufreq driver) and clean up code all over.
Specifics:
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate cpufreq
driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx cpufreq
driver and drop the CPU PM clock .set_parent method for armada-37xx
(Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values in
cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags() to
avoid issuse in some cases when the device depends on an ACPI power
resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks() to
pm_runtime.h and drop the unused try_to_freeze_nowarn() definition
(YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in the
wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check during
resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to the
new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda)"
* tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
PM: wakeup: remove redundant assignment to variable retval
PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
cpufreq: Kconfig: fix documentation links
PM: wakeup: use dev_set_name() directly
PM: runtime: Add documentation for pm_runtime_resume_and_get()
cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()
cpufreq: armada-37xx: Fix module unloading
cpufreq: armada-37xx: Remove cur_frequency variable
cpufreq: armada-37xx: Fix determining base CPU frequency
cpufreq: armada-37xx: Fix driver cleanup when registration failed
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix setting TBG parent for load levels
cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
cpuidle: tegra: Remove do_idle firmware call
cpuidle: tegra: Fix C7 idling state on Tegra114
PM: sleep: fix typos in comments
cpufreq: Remove unused for_each_policy macro
...
Core changes:
- Allow runtime power management when the clocksource is changed.
- A correctness fix for clock_adjtime32() so that the return value
on success is not overwritten by the result of the copy to user.
- Allow late installment of broadcast clockevent devices which was
broken because nothing switched them over to oneshot mode. This went
unnoticed so far because clockevent devices used to be built in, but
now people started to make them modular.
- Debugfs related simplifications
- Small cleanups and improvements here and there
Driver changes:
- The usual set of device tree binding updates for a wide range
of drivers/devices.
- The usual updates and improvements for drivers all over the place but
nothing outstanding.
- No new clocksource/event drivers. They'll come back next time.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGieYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobRJEACNCtecUXdyt/u+ViDgHwG1XOHSZUkG
zBO6E/uZ3G6ZUkr6FogAaY2eMMrSdSUyqbiNBSYBJki2ptMJWF5Li5VzqINmrBuD
VyjK3FEDV0bXW9EJOm4d+95pMyFQ/pYv9VPcByj7VW21t+IDE/4pLeZ8M8shNDHa
pmMnR/tgX4ZZtSrX2NqCUNoTrkycaz8d5NOuso5HjKvPkJ5BU2kSxULTGmvaeTil
8d+70AetApDgzAWpCnJFPlLlOHIPyhnMxS5edvsMIbMIkRLsnI+b3LsPZe+CqVZ0
zaP6KYvG+iqU8nKdz7OweV1fLgBD52GKgHlpTkhhYs3GW4XBEXDrsyoEyeIiZ22u
YUkTzFvZ4JG/+80UUaKpLDIGYWUj1h+xe/EtWS0s8lj108RsNLghd/0YjFMikspT
fYC2WpaXJDz3URbSV57OXGbwhg2zOYI5Supg6wNrmFfcld3k6CSitG4idDpIGjJE
8WIcZmeZSelDufskiY8RmsiTumqNOf5P33F71r9JRI6QU9RsyYb3fJN71AFKnLq2
31YEAShpzPYG5EGRinPymJRi3icdmcEQECz/pWUb6ua0s/HG1+HD9emLwHzvPdul
hcWRq19GaK1YBzOfV60+8cdxW8ZEOROvRVdYJO8FoYcnueUJmOSM+boqSkRtDw3o
RywO8BetxukPJg==
=F6Du
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The time and timers updates contain:
Core changes:
- Allow runtime power management when the clocksource is changed.
- A correctness fix for clock_adjtime32() so that the return value on
success is not overwritten by the result of the copy to user.
- Allow late installment of broadcast clockevent devices which was
broken because nothing switched them over to oneshot mode. This
went unnoticed so far because clockevent devices used to be built
in, but now people started to make them modular.
- Debugfs related simplifications
- Small cleanups and improvements here and there
Driver changes:
- The usual set of device tree binding updates for a wide range of
drivers/devices.
- The usual updates and improvements for drivers all over the place
but nothing outstanding.
- No new clocksource/event drivers. They'll come back next time"
* tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
posix-timers: Preserve return value in clock_adjtime32()
tick/broadcast: Allow late registered device to enter oneshot mode
tick: Use tick_check_replacement() instead of open coding it
time/timecounter: Mark 1st argument of timecounter_cyc2time() as const
dt-bindings: timer: nuvoton,npcm7xx: Add wpcm450-timer
clocksource/drivers/arm_arch_timer: Add __ro_after_init and __init
clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
clocksource/drivers/dw_apb_timer_of: Add handling for potential memory leak
clocksource/drivers/npcm: Add support for WPCM450
clocksource/drivers/sh_cmt: Don't use CMTOUT_IE with R-Car Gen2/3
clocksource/drivers/pistachio: Fix trivial typo
clocksource/drivers/ingenic_ost: Fix return value check in ingenic_ost_probe()
clocksource/drivers/timer-ti-dm: Add missing set_state_oneshot_stopped
clocksource/drivers/timer-ti-dm: Fix posted mode status check order
dt-bindings: timer: renesas,cmt: Document R8A77961
dt-bindings: timer: renesas,cmt: Add r8a779a0 CMT support
clocksource/drivers/ingenic-ost: Add support for the JZ4760B
clocksource/drivers/ingenic: Add support for the JZ4760
dt-bindings: timer: ingenic: Add compatible strings for JZ4760(B)
...
Core changes:
- Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can be
cleaned up which either use a seperate mechanism to prevent auto-enable
at request time or have a racy mechanism which disables the interrupt
right after request.
- Get rid of the last usage of irq_create_identity_mapping() and remove
the interface.
- An overhaul of tasklet_disable(). Most usage sites of tasklet_disable()
are in task context and usually in cleanup, teardown code pathes.
tasklet_disable() spinwaits for a tasklet which is currently executed.
That's not only a problem for PREEMPT_RT where this can lead to a live
lock when the disabling task preempts the softirq thread. It's also
problematic in context of virtualization when the vCPU which runs the
tasklet is scheduled out and the disabling code has to spin wait until
it's scheduled back in. Though there are a few code pathes which invoke
tasklet_disable() from non-sleepable context. For these a new disable
variant which still spinwaits is provided which allows to switch
tasklet_disable() to a sleep wait mechanism. For the atomic use cases
this does not solve the live lock issue on PREEMPT_RT. That is mitigated
by blocking on the RT specific softirq lock.
- The PREEMPT_RT specific implementation of softirq processing and
local_bh_disable/enable().
On RT enabled kernels soft interrupt processing happens always in task
context and all interrupt handlers, which are not explicitly marked to
be invoked in hard interrupt context are forced into task context as
well. This allows to protect against softirq processing with a per
CPU lock, which in turn allows to make BH disabled regions preemptible.
Most of the softirq handling code is still shared. The RT/non-RT
specific differences are addressed with a set of inline functions which
provide the context specific functionality. The local_bh_disable() /
local_bh_enable() mechanism are obviously seperate.
- The usual set of small improvements and cleanups
Driver changes:
- New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers
- Extended functionality for MStar, STM32 and SC7280 irq chips
- Enhanced robustness for ARM GICv3/4.1 drivers
- The usual set of cleanups and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGh5wTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZ+/EACWBpQ/2ZHizEw1bzjaDzJrR8U228xu
wNi7nSP92Y07nJ3cCX7a6TJ53mqd0n3RT+DprlsOuqSN0D7Ktr/x44V/aZtm0d3N
GkFOlpeGCRnHusLaUTwk7a8289LuoQ7OhSxIB409n1I4nLI96ZK41D1tYonMYl6E
nxDiGADASfjaciBWbjwJO/mlwmiW/VRpSTxswx0wzakFfbIx9iKyKv1bCJQZ5JK+
lHmf0jxpDIs1EVK/ElJ9Ky6TMBlEmZyiX7n6rujtwJ1W+Jc/uL/y8pLJvGwooVmI
yHTYsLMqzviCbAMhJiB3h1qs3GbCGlM78prgJTnOd0+xEUOCcopCRQlsTXVBq8Nb
OS+HNkYmYXRfiSH6lINJsIok8Xis28bAw/qWz2Ho+8wLq0TI8crK38roD1fPndee
FNJRhsPPOBkscpIldJ0Cr0X5lclkJFiAhAxORPHoseKvQSm7gBMB7H99xeGRffTn
yB3XqeTJMvPNmAHNN4Brv6ey3OjwnEWBgwcnIM2LtbIlRtlmxTYuR+82OPOgEvzk
fSrjFFJqu0LEMLEOXS4pYN824PawjV//UAy4IaG8AodmUUCSGHgw1gTVa4sIf72t
tXY54HqWfRWRpujhVRgsZETqBUtZkL6yvpoe8f6H7P91W5tAfv3oj4ch9RkhUo+Z
b0/u9T0+Fpbg+w==
=id4G
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The usual updates from the irq departement:
Core changes:
- Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can
be cleaned up which either use a seperate mechanism to prevent
auto-enable at request time or have a racy mechanism which disables
the interrupt right after request.
- Get rid of the last usage of irq_create_identity_mapping() and
remove the interface.
- An overhaul of tasklet_disable().
Most usage sites of tasklet_disable() are in task context and
usually in cleanup, teardown code pathes. tasklet_disable()
spinwaits for a tasklet which is currently executed. That's not
only a problem for PREEMPT_RT where this can lead to a live lock
when the disabling task preempts the softirq thread. It's also
problematic in context of virtualization when the vCPU which runs
the tasklet is scheduled out and the disabling code has to spin
wait until it's scheduled back in.
There are a few code pathes which invoke tasklet_disable() from
non-sleepable context. For these a new disable variant which still
spinwaits is provided which allows to switch tasklet_disable() to a
sleep wait mechanism. For the atomic use cases this does not solve
the live lock issue on PREEMPT_RT. That is mitigated by blocking on
the RT specific softirq lock.
- The PREEMPT_RT specific implementation of softirq processing and
local_bh_disable/enable().
On RT enabled kernels soft interrupt processing happens always in
task context and all interrupt handlers, which are not explicitly
marked to be invoked in hard interrupt context are forced into task
context as well. This allows to protect against softirq processing
with a per CPU lock, which in turn allows to make BH disabled
regions preemptible.
Most of the softirq handling code is still shared. The RT/non-RT
specific differences are addressed with a set of inline functions
which provide the context specific functionality. The
local_bh_disable() / local_bh_enable() mechanism are obviously
seperate.
- The usual set of small improvements and cleanups
Driver changes:
- New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt
controllers
- Extended functionality for MStar, STM32 and SC7280 irq chips
- Enhanced robustness for ARM GICv3/4.1 drivers
- The usual set of cleanups and improvements all over the place"
* tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP
irqchip/gic-v3: Do not enable irqs when handling spurious interrups
dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller
irqchip: Add support for IDT 79rc3243x interrupt controller
irqdomain: Drop references to recusive irqdomain setup
irqdomain: Get rid of irq_create_strict_mappings()
irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
ARM: PXA: Kill use of irq_create_strict_mappings()
irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection
irqchip/tb10x: Use 'fallthrough' to eliminate a warning
genirq: Reduce irqdebug cacheline bouncing
kernel: Initialize cpumask before parsing
irqchip/wpcm450: Drop COMPILE_TEST
irqchip/irq-mst: Support polarity configuration
irqchip: Add driver for WPCM450 interrupt controller
dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic
dt-bindings: qcom,pdc: Add compatible for sc7280
irqchip/stm32: Add usart instances exti direct event support
irqchip/gic-v3: Fix OF_BAD_ADDR error handling
irqchip/sifive-plic: Mark two global variables __ro_after_init
...
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
ACeKW1w5
=fpey
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.13
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
The return value on success (>= 0) is overwritten by the return value of
put_old_timex32(). That works correct in the fault case, but is wrong for
the success case where put_old_timex32() returns 0.
Just check the return value of put_old_timex32() and return -EFAULT in case
it is not zero.
[ tglx: Massage changelog ]
Fixes: 3a4d44b616 ("ntp: Move adjtimex related compat syscalls to native counterparts")
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210414030449.90692-1-chenjun102@huawei.com
The broadcast device is switched to oneshot mode when the system switches
to oneshot mode. If a broadcast clock event device is registered after the
system switched to oneshot mode, it will stay in periodic mode forever.
Ensure that a late registered device which is selected as broadcast device
is initialized in oneshot mode when the system already uses oneshot mode.
[ tglx: Massage changelog ]
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210331083318.21794-1-jindong.yue@nxp.com
The function tick_check_replacement() is the combination of
tick_check_percpu() and tick_check_preferred(), but tick_check_new_device()
has the same logic open coded.
Use the helper to simplify the code.
[ tglx: Massage changelog ]
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210326022328.3266-1-wangwensheng4@huawei.com
Make the tick_nohz_get_next_hrtimer() kerneldoc comment state clearly
that the function may return negative numbers.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.
Introduce a clocksource id field in struct clocksource which is by default
set to CSID_GENERIC (0). Clocksource implementations can set that field to
a value which allows to identify the clocksource.
Store the clocksource id of the current clocksource in the
system_time_snapshot so callers can evaluate which clocksource was used to
take the snapshot and act accordingly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-5-jianyong.wu@arm.com
The struct clocksource callbacks enable() and disable() are described as a
way to allow clock sources to enter a power save mode. See commit
4614e6adaf ("clocksource: add enable() and disable() callbacks")
But using runtime PM from these callbacks triggers a cyclic lockdep warning when
switching clock source using change_clocksource().
# echo e60f0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
======================================================
WARNING: possible circular locking dependency detected
------------------------------------------------------
migration/0/11 is trying to acquire lock:
ffff0000403ed220 (&dev->power.lock){-...}-{2:2}, at: __pm_runtime_resume+0x40/0x74
but task is already holding lock:
ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (tk_core.seq.seqcount){----}-{0:0}:
ktime_get+0x28/0xa0
hrtimer_start_range_ns+0x210/0x2dc
generic_sched_clock_init+0x70/0x88
sched_clock_init+0x40/0x64
start_kernel+0x494/0x524
-> #1 (hrtimer_bases.lock){-.-.}-{2:2}:
hrtimer_start_range_ns+0x68/0x2dc
rpm_suspend+0x308/0x5dc
rpm_idle+0xc4/0x2a4
pm_runtime_work+0x98/0xc0
process_one_work+0x294/0x6f0
worker_thread+0x70/0x45c
kthread+0x154/0x160
ret_from_fork+0x10/0x20
-> #0 (&dev->power.lock){-...}-{2:2}:
_raw_spin_lock_irqsave+0x7c/0xc4
__pm_runtime_resume+0x40/0x74
sh_cmt_start+0x1c4/0x260
sh_cmt_clocksource_enable+0x28/0x50
change_clocksource+0x9c/0x160
multi_cpu_stop+0xa4/0x190
cpu_stopper_thread+0x90/0x154
smpboot_thread_fn+0x244/0x270
kthread+0x154/0x160
ret_from_fork+0x10/0x20
other info that might help us debug this:
Chain exists of:
&dev->power.lock --> hrtimer_bases.lock --> tk_core.seq.seqcount
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(tk_core.seq.seqcount);
lock(hrtimer_bases.lock);
lock(tk_core.seq.seqcount);
lock(&dev->power.lock);
*** DEADLOCK ***
2 locks held by migration/0/11:
#0: ffff8000113c9278 (timekeeper_lock){-.-.}-{2:2}, at: change_clocksource+0x2c/0x160
#1: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
Rework change_clocksource() so it enables the new clocksource and disables
the old clocksource outside of the timekeeper_lock and seqcount write held
region. There is no requirement that these callbacks are invoked from the
lock held region.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20210211134318.323910-1-niklas.soderlund+renesas@ragnatech.se
There is no need to keep the dentry pointer around for the created
debugfs file, as it is only needed when removing it from the system.
When it is to be removed, ask debugfs itself for the pointer, to save on
storage and make things a bit simpler.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210216155020.1012407-1-gregkh@linuxfoundation.org
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.
To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.527563866@linutronix.de
Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT.
Add a new helper which sets restart_block->fn and calls a dummy
arch_set_restart_data() helper.
Fixes: 609c19a385 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com
hrtimer_force_reprogram() and hrtimer_interrupt() invokes
__hrtimer_get_next_event() to find the earliest expiry time of hrtimer
bases. __hrtimer_get_next_event() does not update
cpu_base::[softirq_]_expires_next to preserve reprogramming logic. That
needs to be done at the callsites.
hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when
the first expiring timer is a softirq timer and the soft interrupt is not
activated. That's wrong because cpu_base::softirq_expires_next is left
stale when the first expiring timer of all bases is a timer which expires
in hard interrupt context. hrtimer_interrupt() does never update
cpu_base::softirq_expires_next which is wrong too.
That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and
the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting
CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that
timer before the stale cpu_base::softirq_expires_next.
cpu_base::softirq_expires_next is cached to make the check for raising the
soft interrupt fast. In the above case the soft interrupt won't be raised
until clock monotonic reaches the stale cpu_base::softirq_expires_next
value. That's incorrect, but what's worse it that if the softirq timer
becomes the first expiring timer of all clock bases after the hard expiry
timer has been handled the reprogramming of the clockevent from
hrtimer_interrupt() will result in an interrupt storm. That happens because
the reprogramming does not use cpu_base::softirq_expires_next, it uses
__hrtimer_get_next_event() which returns the actual expiry time. Once clock
MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is
raised and the storm subsides.
Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard
bases seperately, update softirq_expires_next and handle the case when a
soft expiring timer is the first of all bases by comparing the expiry times
and updating the required cpu base fields. Split this functionality into a
separate function to be able to use it in hrtimer_interrupt() as well
without copy paste.
Fixes: 5da7016046 ("hrtimer: Implement support for softirq based hrtimers")
Reported-by: Mikael Beckius <mikael.beckius@windriver.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikael Beckius <mikael.beckius@windriver.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210223160240.27518-1-anna-maria@linutronix.de
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
- nolibc fixes for the torture tests
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAs9lgRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j/axAAsqIvarDD6OLmgcOPCyWSvfG6LsIFgqI9
CY0JdQBtvFBTvE8Q2No5ktbLVmuYsBh0dGeFkv4HQZJyRlr7mjstVMNN4SeBDVIS
/+zZO1wlwzaXfKQopLctTK1O/UzFqIN2sqyzA3nLzGGj8DqgxXJyreJ10feK5XM+
6ttZPd1qm4hqtpA22ZEODbct5OFqZuvnK8VNqBb2YHabA1rasUXbIEJPBpsuv/W2
l9W5AGP4erdOFm3nHJxiCpvLJtgHy4njvw0HJp5f99Abj6OVeAzw5kFjvRB3n1Qd
ayKyTw8T/1mfmkjvYkGsMAqhEmqwXcryFX0dR/14/XPdXyjPhZlbkz+MfRKrn4NT
LBJPX+MlX9lVFWBNR9HMe2o/083+gorlwZt9wtyt0OBBGGgudYo4uKNdbyy6tB3Y
Gb98P2vtVSO24EsQce6M+ppHN4TgVBd6id82MQxNuFw+PQJdBiCY0JJfNQApbAry
cIKOchSSR2SkJHlAevNVaKAeiTnkAXd1jDBKtCnvCqOUyvtnhE3rQCqwS5xT2Cno
oQydpudwBKT7uO/GUyS0ESErjHuy9zhExNSYD0ydxlBCrGbzrrgPg57ntXHA1die
mtFyvc2tfT/AshWRNYiuCG+eaUG3qK7n7jN7Vc6/K5DR4GMb5tOhL9wPx2ljCRGu
Z8WDg0pJGz4=
=31yj
-----END PGP SIGNATURE-----
Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"These are the latest RCU updates for v5.12:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide
allocator return addresses to more easily locate bugs. This has a
couple of RCU-related commits, but is mostly MM. Was pulled in with
akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks, which enables
better debugging information and smarter reactions to large numbers
of callbacks.
- The first round of changes to allow CPUs to be runtime switched
from and to callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a
"torture everything" script that runs rcutorture, locktorture,
scftorture, rcuscale, and refscale. Plus does an allmodconfig
build.
- nolibc fixes for the torture tests"
* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
mm: Make mem_obj_dump() vmalloc() dumps include start and length
mm: Make mem_dump_obj() handle vmalloc() memory
mm: Make mem_dump_obj() handle NULL and zero-sized pointers
mm: Add mem_dump_obj() to print source of memory block
tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
tools/nolibc: Fix position of -lgcc in the documented example
tools/nolibc: Emit detailed error for missing alternate syscall number definitions
tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
tools/nolibc: Get timeval, timespec and timezone from linux/time.h
tools/nolibc: Implement poll() based on ppoll()
tools/nolibc: Implement fork() based on clone()
tools/nolibc: Make getpgrp() fall back to getpgid(0)
tools/nolibc: Make dup2() rely on dup3() when available
tools/nolibc: Add the definition for dup()
rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
torture: Maintain torture-specific set of CPUs-online books
torture: Clean up after torture-test CPU hotplugging
rcutorture: Make object_debug also double call_rcu() heap object
...
- Instead of new drivers remove tango, sirf, u300 and atlas drivers
- Add suspend/resume support for microchip pit64b
- The usual fixes, improvements and cleanups here and there
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmAqjecTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodVTD/9UuKlOifRBNd4ECR+yF65MzLfjqNHU
j76E3Dzuf4QCbXTjmQAsadaqQ+w9l8Ie7OVT51XlmTczEBkJiV3FOGLVJeRun6R4
7OLF3VOYyBUtMoRdHzyXaYOTbsOK9gZitucDeLCQhvKhDCkVnKKFXNJR+TTkSYM3
xDBLwuI7uuHWHyh0+W+3SI1pTiEA4yMe5ZbqqoJbjGQapr3Eao+nyjd1aa3ERb2f
PtS7UVQ69QowRqq6DQyZk0yKit8J3HZnHfCPH/T6eXsxGnui36GiUnTGCMhLMZpD
Xvl/5cjqQuKjgt2093t8nGiumOGBOfrb8uvc/qMW777DzFe/VJtXrC/7pySVLAhK
oc9Swj0iX/WPARzlpyOk3lfpDMzv6qyjMNJIXcnav2lrknITp+TMORKWOADB03UV
sswlN7YFTrNe7d7uxEdybKkNX6bUwgOzo2m69A1IdSXwKPzYkZQmku6Y7GnzYErZ
aiJiZl858VB9g24ROKLt/uQTarzYCS0sjcdnDgO1KSR7zKHZ4iUpd3zucd3mlmUo
fGTMIbCqL/gzl4Zcl6njvzMVfJeMzOeDiQ41wCyYnOsXlIKmWNi1rONdGZcyDYvN
bOiGVUMKicEZwBlSZQQ1GdR9eGf8/Ix5cTlynZepN3dkHUDc7SU+OvQZdg4Yd/RY
BsquFDHnt4gl+A==
=xZQt
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Time and timer updates:
- Instead of new drivers remove tango, sirf, u300 and atlas drivers
- Add suspend/resume support for microchip pit64b
- The usual fixes, improvements and cleanups here and there"
* tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Delete no-op time_ns_init()
alarmtimer: Update kerneldoc
clocksource/drivers/timer-microchip-pit64b: Add clocksource suspend/resume
clocksource/drivers/prima: Remove sirf prima driver
clocksource/drivers/atlas: Remove sirf atlas driver
clocksource/drivers/tango: Remove tango driver
clocksource/drivers/u300: Remove the u300 driver
dt-bindings: timer: nuvoton: Clarify that interrupt of timer 0 should be specified
clocksource/drivers/davinci: Move pr_fmt() before the includes
clocksource/drivers/efm32: Drop unused timer code
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The bug fixed by commit e3fab2f3de ("ntp: Fix RTC synchronization on
32-bit platforms") revealed an underlying issue: RTC synchronization may
happen anytime, even while the system is partially suspended.
On systems where the RTC is connected to an I2C bus, the I2C bus controller
may already or still be suspended, triggering a WARNING during suspend or
resume from s2ram:
WARNING: CPU: 0 PID: 124 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x634/0x680
i2c i2c-6: Transfer while suspended
[...]
Workqueue: events_power_efficient sync_hw_clock
[...]
(__i2c_transfer)
(i2c_transfer)
(regmap_i2c_read)
...
(da9063_rtc_set_time)
(rtc_set_time)
(sync_hw_clock)
(process_one_work)
Fix this race condition by using the freezable instead of the normal
power-efficient workqueue.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20210125143039.1051912-1-geert+renesas@glider.be
Due to an integer overflow, RTC synchronization now happens every 2s
instead of the intended 11 minutes. Fix this by forcing 64-bit
arithmetic for the sync period calculation.
Annotate the other place which multiplies seconds for consistency as well.
Fixes: c9e6189fb0 ("ntp: Make the RTC synchronization more reliable")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210111103956.290378-1-geert+renesas@glider.be
This commit addresses a few code-style nits in callback-offloading
toggling, including one that predates this toggling.
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a timer_curr_running() function that verifies that the
current code is running in the context of the specified timer's handler.
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
and fix a typo in the Kconfig help text.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl/oUUkRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gEnQ//Ud8SF9fOq2w7IVqYYQadH9BSj6jFOTlb
5/pgRKZLI6OCRZ8Wkx9lV3hvlmyZRtKhuLcOx/LR/TXNBlIN7wfkRcjPFIojhqM3
G5mr/TMmbL+PSOz8gEWwPsz7A+pDgoDDdvgId3Dn6CUKEZLIcf2tEzIOBjtST94b
svS5Y/wBe3xzlBD4yfNKkmwSYlgpDv7ZfAF3q38h3dTgb3nTYRgfKmflKe+aBf7z
iy05I/j7Hw8WmioG0oEUfX9t/j9dYdGzbK/3S8UV4igXj+i+XnzUIs9+dQpEpByS
tgS9KJbq8AYtuUdHU7xInG2ltrODrriJUdCQpzi65/jEUshoBLw+Dj3fmVKHJA48
LW0jgQ7eJm8dkkJyo7+s4Z5wTmS6zzqJ+2MZq34fvlHQFzHgJ3KAwTzoiqBLx4FY
1H/KEmK8hX2IDiJj5qrWHcZRmGiKbeKxiJkB0+6EQGj+aAVy3A6AR2TeQJf58OSN
nc+e7UWQFEumGyEN8cVTjdTuLJc1iY9ULpXuQfdi4ksDCpLMTMWk9V9psWB+pC15
MTqbsJCzJU4oeeYaZVWdMPcaZSjgg4Ar3ojvXvPoUkdqCuvnMPoBxTVXjkwNReZF
SxFIpYyKbSkZiN/FOZkICIh0tRMWLmMVNqXV0rjEi41BvcoH9tKXT2y5WgQEHvf1
BctV+BFfblU=
=T0Y6
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Update/fix two CPU sanity checks in the hotplug and the boot code, and
fix a typo in the Kconfig help text.
[ Context: the first two commits are the result of an ongoing
annotation+review work of (intentional) tick_do_timer_cpu() data
races reported by KCSAN, but the annotations aren't fully cooked
yet ]"
* tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix spelling mistake in Kconfig "fullfill" -> "fulfill"
tick/sched: Remove bogus boot "safety" check
tick: Remove pointless cpu valid check in hotplug code
There is a spelling mistake in the Kconfig help text. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20201217171705.57586-1-colin.king@canonical.com
can_stop_idle_tick() checks whether the do_timer() duty has been taken over
by a CPU on boot. That's silly because the boot CPU always takes over with
the initial clockevent device.
But even if no CPU would have installed a clockevent and taken over the
duty then the question whether the tick on the current CPU can be stopped
or not is moot. In that case the current CPU would have no clockevent
either, so there would be nothing to keep ticking.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20201206212002.725238293@linutronix.de
tick_handover_do_timer() which is invoked when a CPU is unplugged has a
check for cpumask_first(cpu_online_mask) when it tries to hand over the
tick update duty.
Checking the result of cpumask_first() there is pointless because if the
online mask is empty at this point, then this would be the last CPU in the
system going offline, which is impossible. There is always at least one CPU
remaining. If online mask would be really empty then the timer duty would
be the least of the resulting problems.
Remove the well meant check simply because it is pointless and confusing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20201206212002.582579516@linutronix.de