1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

34700 commits

Author SHA1 Message Date
Christoph Hellwig
abdaf11ac1 dma-mapping: add (back) arch_dma_mark_clean for ia64
Add back a hook to optimize dcache flushing after reading executable
code using DMA.  This gets ia64 out of the business of pretending to
be dma incoherent just for this optimization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-11 09:10:17 +02:00
Christoph Hellwig
ef1a85b6ca dma-mapping: fix DMA_OPS dependencies
Driver that select DMA_OPS need to depend on HAS_DMA support to
work.  The vop driver was missing that dependency, so add it, and also
add a another depends in DMA_OPS itself.  That won't fix the issue due
to how the Kconfig dependencies work, but at least produce a warning
about unmet dependencies.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11 09:09:41 +02:00
Christoph Hellwig
ec91ccb274 dma-debug: remove most exports
Now that the main dma mapping entry points are out of line most of the
symbols in dma-debug.c can only be called from built-in code.  Remove
the unused exports.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11 08:13:12 +02:00
Christoph Hellwig
3d842b51a0 dma-mapping: remove the dma_dummy_ops export
dma_dummy_ops is only used by the ACPI code, which can't be modular.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11 08:13:03 +02:00
Julien Thierry
00089c048e objtool: Rename frame.h -> objtool.h
Header frame.h is getting more code annotations to help objtool analyze
object files.

Rename the file to objtool.h.

[ jpoimboe: add objtool.h to MAINTAINERS ]

Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10 10:43:13 -05:00
Andy Shevchenko
b51e627158 swiotlb: Mark max_segment with static keyword
Sparse is not happy about max_segment declaration:

  CHECK   kernel/dma/swiotlb.c
  kernel/dma/swiotlb.c:96:14: warning: symbol 'max_segment' was not declared. Should it be static?

Mark it static as suggested.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-09-10 09:41:31 -04:00
Andy Shevchenko
4db7b6aacc swiotlb: Use %pa to print phys_addr_t variables
There is an extension to a %p to print phys_addr_t type of variables.
Use it here.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-09-10 09:41:25 -04:00
Kan Liang
44fae179ce perf/core: Pull pmu::sched_task() into perf_event_context_sched_out()
The pmu::sched_task() is a context switch callback. It passes the
cpuctx->task_ctx as a parameter to the lower code. To find the
cpuctx->task_ctx, the current code iterates a cpuctx list.
The same context will iterated in perf_event_context_sched_out() soon.
Share the cpuctx->task_ctx can avoid the unnecessary iteration of the
cpuctx list.

The pmu::sched_task() is also required for the optimization case for
equivalent contexts.

The task_ctx_sched_out() will eventually disable and reenable the PMU
when schedule out events. Add perf_pmu_disable() and perf_pmu_enable()
around task_ctx_sched_out() don't break anything.

Drop the cpuctx->ctx.lock for the pmu::sched_task(). The lock is for
per-CPU context, which is not necessary for the per-task context
schedule.

No one uses sched_cb_entry, perf_sched_cb_usages, sched_cb_list, and
perf_pmu_sched_task() any more.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200821195754.20159-2-kan.liang@linux.intel.com
2020-09-10 11:19:34 +02:00
Kan Liang
556cccad38 perf/core: Pull pmu::sched_task() into perf_event_context_sched_in()
The pmu::sched_task() is a context switch callback. It passes the
cpuctx->task_ctx as a parameter to the lower code. To find the
cpuctx->task_ctx, the current code iterates a cpuctx list.

The same context was just iterated in perf_event_context_sched_in(),
which is invoked right before the pmu::sched_task().

Reuse the cpuctx->task_ctx from perf_event_context_sched_in() can avoid
the unnecessary iteration of the cpuctx list.

Both pmu::sched_task and perf_event_context_sched_in() have to disable
PMU. Pull the pmu::sched_task into perf_event_context_sched_in() can
also save the overhead from the PMU disable and reenable.

The new and old tasks may have equivalent contexts. The current code
optimize this case by swapping the context, which avoids the scheduling.
For this case, pmu::sched_task() is still required, e.g., restore the
LBR content.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200821195754.20159-1-kan.liang@linux.intel.com
2020-09-10 11:19:34 +02:00
Ahmed S. Darwish
249d053835 timekeeping: Use seqcount_latch_t
Latch sequence counters are a multiversion concurrency control mechanism
where the seqcount_t counter even/odd value is used to switch between
two data storage copies. This allows the seqcount_t read path to safely
interrupt its write side critical section (e.g. from NMIs).

Initially, latch sequence counters were implemented as a single write
function, raw_write_seqcount_latch(), above plain seqcount_t. The read
path was expected to use plain seqcount_t raw_read_seqcount().

A specialized read function was later added, raw_read_seqcount_latch(),
and became the standardized way for latch read paths. Having unique read
and write APIs meant that latch sequence counters are basically a data
type of their own -- just inappropriately overloading plain seqcount_t.
The seqcount_latch_t data type was thus introduced at seqlock.h.

Use that new data type instead of seqcount_raw_spinlock_t. This ensures
that only latch-safe APIs are to be used with the sequence counter.

Note that the use of seqcount_raw_spinlock_t was not very useful in the
first place. Only the "raw_" subset of seqcount_t APIs were used at
timekeeping.c. This subset was created for contexts where lockdep cannot
be used. seqcount_LOCKTYPE_t's raison d'être -- verifying that the
seqcount_t writer serialization lock is held -- cannot thus be done.

References: 0c3351d451 ("seqlock: Use raw_ prefix instead of _no_lockdep")
References: 55f3560df9 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-6-a.darwish@linutronix.de
2020-09-10 11:19:29 +02:00
Ahmed S. Darwish
a690ed0735 time/sched_clock: Use seqcount_latch_t
Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures only latching-safe seqcount APIs are
to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-5-a.darwish@linutronix.de
2020-09-10 11:19:29 +02:00
Ahmed S. Darwish
58faf20a08 time/sched_clock: Use raw_read_seqcount_latch() during suspend
sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.

Since 7fc26327b7 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it has one read memory
barrier less than the currently used raw_read_seqcount() API.

Use raw_read_seqcount_latch() for the suspend path.

Commit aadd6e5caa ("time/sched_clock: Use raw_read_seqcount_latch()")
missed changing that instance of raw_read_seqcount().

References: 1809bfa44e ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
2020-09-10 11:19:28 +02:00
Linus Torvalds
7fe10096c1 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes a regression in padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  padata: fix possible padata_works_lock deadlock
2020-09-09 19:46:22 -07:00
Valentin Schneider
848785df48 sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
The last sd_flag_debug shuffle inadvertently moved its definition within
an #ifdef CONFIG_SYSCTL region. While CONFIG_SYSCTL is indeed required to
produce the sched domain ctl interface (which uses sd_flag_debug to output
flag names), it isn't required to run any assertion on the sched_domain
hierarchy itself.

Move the definition of sd_flag_debug to a CONFIG_SCHED_DEBUG region of
topology.c.

Now at long last we have:

- sd_flag_debug declared in include/linux/sched/topology.h iff
  CONFIG_SCHED_DEBUG=y
- sd_flag_debug defined in kernel/sched/topology.c, conditioned by:
  - CONFIG_SCHED_DEBUG, with an explicit #ifdef block
  - CONFIG_SMP, as a requirement to compile topology.c

With this change, all symbols pertaining to SD flag metadata (with the
exception of __SD_FLAG_CNT) are now defined exclusively within topology.c

Fixes: 8fca9494d4 ("sched/topology: Move sd_flag_debug out of linux/sched/topology.h")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200908184956.23369-1-valentin.schneider@arm.com
2020-09-09 10:09:03 +02:00
Matthew Wilcox (Oracle)
4bd6a7353e sysctl: Convert to iter interfaces
Using the read_iter/write_iter interfaces allows for in-kernel users
to set sysctls without using set_fs().  Also, the buffer is a string,
so give it the real type of 'char *', not void *.

[AV: Christoph's fixup folded in]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-08 22:20:39 -04:00
Yonghong Song
7c69673262 bpf: Permit map_ptr arithmetic with opcode add and offset 0
Commit 41c48f3a98 ("bpf: Support access
to bpf map fields") added support to access map fields
with CORE support. For example,

            struct bpf_map {
                    __u32 max_entries;
            } __attribute__((preserve_access_index));

            struct bpf_array {
                    struct bpf_map map;
                    __u32 elem_size;
            } __attribute__((preserve_access_index));

            struct {
                    __uint(type, BPF_MAP_TYPE_ARRAY);
                    __uint(max_entries, 4);
                    __type(key, __u32);
                    __type(value, __u32);
            } m_array SEC(".maps");

            SEC("cgroup_skb/egress")
            int cg_skb(void *ctx)
            {
                    struct bpf_array *array = (struct bpf_array *)&m_array;

                    /* .. array->map.max_entries .. */
            }

In kernel, bpf_htab has similar structure,

	    struct bpf_htab {
		    struct bpf_map map;
                    ...
            }

In the above cg_skb(), to access array->map.max_entries, with CORE, the clang will
generate two builtin's.
            base = &m_array;
            /* access array.map */
            map_addr = __builtin_preserve_struct_access_info(base, 0, 0);
            /* access array.map.max_entries */
            max_entries_addr = __builtin_preserve_struct_access_info(map_addr, 0, 0);
	    max_entries = *max_entries_addr;

In the current llvm, if two builtin's are in the same function or
in the same function after inlining, the compiler is smart enough to chain
them together and generates like below:
            base = &m_array;
            max_entries = *(base + reloc_offset); /* reloc_offset = 0 in this case */
and we are fine.

But if we force no inlining for one of functions in test_map_ptr() selftest, e.g.,
check_default(), the above two __builtin_preserve_* will be in two different
functions. In this case, we will have code like:
   func check_hash():
            reloc_offset_map = 0;
            base = &m_array;
            map_base = base + reloc_offset_map;
            check_default(map_base, ...)
   func check_default(map_base, ...):
            max_entries = *(map_base + reloc_offset_max_entries);

In kernel, map_ptr (CONST_PTR_TO_MAP) does not allow any arithmetic.
The above "map_base = base + reloc_offset_map" will trigger a verifier failure.
  ; VERIFY(check_default(&hash->map, map));
  0: (18) r7 = 0xffffb4fe8018a004
  2: (b4) w1 = 110
  3: (63) *(u32 *)(r7 +0) = r1
   R1_w=invP110 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
  ; VERIFY_TYPE(BPF_MAP_TYPE_HASH, check_hash);
  4: (18) r1 = 0xffffb4fe8018a000
  6: (b4) w2 = 1
  7: (63) *(u32 *)(r1 +0) = r2
   R1_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R2_w=invP1 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
  8: (b7) r2 = 0
  9: (18) r8 = 0xffff90bcb500c000
  11: (18) r1 = 0xffff90bcb500c000
  13: (0f) r1 += r2
  R1 pointer arithmetic on map_ptr prohibited

To fix the issue, let us permit map_ptr + 0 arithmetic which will
result in exactly the same map_ptr.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200908175702.2463625-1-yhs@fb.com
2020-09-08 18:02:18 -07:00
Denis Efremov
2d9ca267a9 seccomp: Use current_pt_regs() instead of task_pt_regs(current)
As described in commit a3460a5974 ("new helper: current_pt_regs()"):
- arch versions are "optimized versions".
- some architectures have task_pt_regs() working only for traced tasks
  blocked on signal delivery. current_pt_regs() needs to work for *all*
  processes.

In preparation for adding a coccinelle rule for using current_*(), instead
of raw accesses to current members, modify seccomp_do_user_notification(),
__seccomp_filter(), __secure_computing() to use current_pt_regs().

Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20200824125921.488311-1-efremov@linux.com
[kees: Reworded commit log, add comment to populate_seccomp_data()]
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-09-08 16:26:45 -07:00
Rich Felker
4d671d922d seccomp: kill process instead of thread for unknown actions
Asynchronous termination of a thread outside of the userspace thread
library's knowledge is an unsafe operation that leaves the process in
an inconsistent, corrupt, and possibly unrecoverable state. In order
to make new actions that may be added in the future safe on kernels
not aware of them, change the default action from
SECCOMP_RET_KILL_THREAD to SECCOMP_RET_KILL_PROCESS.

Signed-off-by: Rich Felker <dalias@libc.org>
Link: https://lore.kernel.org/r/20200829015609.GA32566@brightrain.aerifal.cx
[kees: Fixed up coredump selection logic to match]
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-09-08 12:00:49 -07:00
Tycho Andersen
e839317900 seccomp: don't leave dangling ->notif if file allocation fails
Christian and Kees both pointed out that this is a bit sloppy to open-code
both places, and Christian points out that we leave a dangling pointer to
->notif if file allocation fails. Since we check ->notif for null in order
to determine if it's ok to install a filter, this means people won't be
able to install a filter if the file allocation fails for some reason, even
if they subsequently should be able to.

To fix this, let's hoist this free+null into its own little helper and use
it.

Reported-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-09-08 11:30:16 -07:00
Tycho Andersen
a566a9012a seccomp: don't leak memory when filter install races
In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize
the listener fd, then check to see if we can actually use it later in
seccomp_may_assign_mode(), which can fail if anyone else in our thread
group has installed a filter and caused some divergence. If we can't, we
partially clean up the newly allocated file: we put the fd, put the file,
but don't actually clean up the *memory* that was allocated at
filter->notif. Let's clean that up too.

To accomplish this, let's hoist the actual "detach a notifier from a
filter" code to its own helper out of seccomp_notify_release(), so that in
case anyone adds stuff to init_listener(), they only have to add the
cleanup code in one spot. This does a bit of extra locking and such on the
failure path when the filter is not attached, but it's a slow failure path
anyway.

Fixes: 51891498f2 ("seccomp: allow TSYNC and USER_NOTIF together")
Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-09-08 11:19:50 -07:00
Davidlohr Bueso
ece4ceaf2e kdb: Use newer api for tasklist scanning
This kills using the do_each_thread/while_each_thread combo to
iterate all threads and uses for_each_process_thread() instead,
maintaining semantics. while_each_thread() is ultimately racy
and deprecated;  although in this particular case there is no
concurrency so it doesn't matter. Still lets trivially get rid
of two more users.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Link: https://lore.kernel.org/r/20200907203206.21293-1-dave@stgolabs.net
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-09-08 14:36:46 +01:00
Douglas Anderson
b18b099e04 kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"
On my system the kernel processes the "kgdb_earlycon" parameter before
the "kgdbcon" parameter.  When we setup "kgdb_earlycon" we'll end up
in kgdb_register_callbacks() and "kgdb_use_con" won't have been set
yet so we'll never get around to starting "kgdbcon".  Let's remedy
this by detecting that the IO module was already registered when
setting "kgdb_use_con" and registering the console then.

As part of this, to avoid pre-declaring things, move the handling of
the "kgdbcon" further down in the file.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200630151422.1.I4aa062751ff5e281f5116655c976dff545c09a46@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-09-08 14:34:40 +01:00
Cengiz Can
fcdb84cc5b kdb: remove unnecessary null check of dbg_io_ops
`kdb_msg_write` operates on a global `struct kgdb_io *` called
`dbg_io_ops`.

It's initialized in `debug_core.c` and checked throughout the debug
flow.

There's a null check in `kdb_msg_write` which triggers static analyzers
and gives the (almost entirely wrong) impression that it can be null.

Coverity scanner caught this as CID 1465042.

I have removed the unnecessary null check and eliminated false-positive
forward null dereference warning.

Signed-off-by: Cengiz Can <cengiz@kernel.wtf>
Link: https://lore.kernel.org/r/20200630082922.28672-1-cengiz@kernel.wtf
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-09-08 14:34:40 +01:00
Masami Hiramatsu
319f0ce284 kprobes: Make local functions static
Since we unified the kretprobe trampoline handler from arch/* code,
some functions and objects do not need to be exported anymore.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/159870618256.1229682.8692046612635810882.stgit@devnote2
2020-09-08 11:52:42 +02:00
Masami Hiramatsu
b338817807 kprobes: Free kretprobe_instance with RCU callback
Free kretprobe_instance with RCU callback instead of directly
freeing the object in the kretprobe handler context.

This will make kretprobe run safer in NMI context.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/159870616685.1229682.11978742048709542226.stgit@devnote2
2020-09-08 11:52:35 +02:00
Masami Hiramatsu
e03b4a084e kprobes: Remove NMI context check
The in_nmi() check in pre_handler_kretprobe() is meant to avoid
recursion, and blindly assumes that anything NMI is recursive.

However, since commit:

  9b38cc704e ("kretprobe: Prevent triggering kretprobe from within kprobe_flush_task")

there is a better way to detect and avoid actual recursion.

By setting a dummy kprobe, any actual exceptions will terminate early
(by trying to handle the dummy kprobe), and recursion will not happen.

Employ this to avoid the kretprobe_table_lock() recursion, replacing
the over-eager in_nmi() check.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/159870615628.1229682.6087311596892125907.stgit@devnote2
2020-09-08 11:52:35 +02:00
Masami Hiramatsu
66ada2ccae kprobes: Add generic kretprobe trampoline handler
Add a generic kretprobe trampoline handler for unifying
the all cloned /arch/* kretprobe trampoline handlers.

The generic kretprobe trampoline handler is based on the
x86 implementation, because it is the latest implementation.
It has frame pointer checking, kprobe_busy_begin/end and
return address fixup for user handlers.

[ mingo: Minor edits. ]

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/159870600138.1229682.3424065380448088833.stgit@devnote2
2020-09-08 11:52:31 +02:00
John Ogness
d397820f36 printk: ringbuffer: support dataless records
With commit 896fbe20b4 ("printk: use the lockless ringbuffer"),
printk() started silently dropping messages without text because such
records are not supported by the new printk ringbuffer.

Add support for such records.

Currently dataless records are denoted by INVALID_LPOS in order
to recognize failed prb_reserve() calls. Change the ringbuffer
to instead use two different identifiers (FAILED_LPOS and
NO_LPOS) to distinguish between failed prb_reserve() records and
successful dataless records, respectively.

Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Fixes: https://lkml.kernel.org/r/20200718121053.GA691245@elver.google.com
Reported-by: Marco Elver <elver@google.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200721132528.9661-1-john.ogness@linutronix.de
2020-09-08 09:32:59 +02:00
Dave Airlie
ce5c207c6b Linux 5.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9VerweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhc4H/iHD6qLdB36gZB6K
 oc2nJyrqyWitv4ti2Mnt5PA7o4wX4l6nnr1QvoaJ4BRs5Ja1czRvb2XDmdzqAoIA
 xITGoafqaAeDfxQ91bWrJsVN0pCRKiGwddXlU7TWmqw/riAkfOqi6GYKvav4biJH
 +n1mUPQb1M2IbRFsqkAS+ebKHq3CWaRvzKOEneS88nGlL5u31S9NAru8Ru/fkxRn
 6CwGcs1XRaBPYaZAhdfIb0NuatUlpkhPC9yhNS9up6SqrWmK3m65vmFVng6H0eCF
 fwn1jVztboY/XcNAi5sM9ExpQCql6WLQEEktVikqRDojC8fVtSx6W55tPt7qeaoO
 Z6t4/DA=
 =bcA4
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc4' into drm-next

Backmerge 5.9-rc4 as there is a nasty qxl conflict
that needs to be resolved.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-09-08 14:41:40 +10:00
Wang Hai
e75ad2cc41 blktrace: make function blk_trace_bio_get_cgid() static
The sparse tool complains as follows:

kernel/trace/blktrace.c:796:5: warning:
 symbol 'blk_trace_bio_get_cgid' was not declared. Should it be static?

This function is not used outside of blktrace.c, so this commit
marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-07 20:11:15 -06:00
Linus Torvalds
015b3155c4 Misc fixes:
- Fix more generic entry code ABI fallout
  - Fix debug register handling bugs
  - Fix vmalloc mappings on 32-bit kernels
  - Fix kprobes instrumentation output on 32-bit kernels
  - Fix over-eager WARN_ON_ONCE() on !SMAP hardware
  - Fix NUMA debugging
  - Fix Clang related crash on !RETPOLINE kernels
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl9UljIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1yA//VecoyJOw4jb43LdkeKDGtUjCsPVZlt4w
 fw55nT4taqqbgl9mQjrJQlh8thtk7LvAqcsrEGk/SH+1fp/hDvBG0i3etyI1mPJ2
 t97MCVtD1bz2zyLpOtGN48tgiRxSazr4S9nZPCLTec+c75I3pmJssj44m/eJi/Z2
 hoj/syiO4J0BPa7a1ou++Jeyag6J+PgXdJTOMyjuqi99vqai1aTVKo8GdWMInext
 +fJNYd0ZQRj1FxVdMusDfzxOk7N7b8nAzvd30iJN67R6QwoEazO12K1F4IYQmHSq
 0rhHrwe0lTLtjmYdp/ef14kfzD7DRFN6Nv2gk/zyZsH+tjGflxTZConkFPnfoJEc
 33cNHfigh0V9TSVNDDhHnkRyy6dzCHkYHEf33KFuX3amC236TgrCEL7+oWE2rcNp
 9PJbPGlXCqNb2feNy2de4cY+KiZ2a1N/T4VcdMK6DEdENFh5T03EZgIChQEd0S99
 LNBYHqTWJdQEKfkzfAXlR4Bd2hX1LWLMM6rNcXxInrH7rWDXUCS0X9m3gLZR9DIs
 7/nXoK4OkaJdgH/D2CToDgwMNT5hlIiTGtVtB3H6Qz8eQQ4+fwTyboQDqpeG4Upy
 LfOH2h5Fo33FCgqnrua8IsgUKLwW2yJGdghJpcd9d0qfVUDEJuXGo6xe6SEHdSu/
 VEiQtFUf50U=
 =EhRy
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - more generic entry code ABI fallout

 - debug register handling bugfixes

 - fix vmalloc mappings on 32-bit kernels

 - kprobes instrumentation output fix on 32-bit kernels

 - fix over-eager WARN_ON_ONCE() on !SMAP hardware

 - NUMA debugging fix

 - fix Clang related crash on !RETPOLINE kernels

* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Unbreak 32bit fast syscall
  x86/debug: Allow a single level of #DB recursion
  x86/entry: Fix AC assertion
  tracing/kprobes, x86/ptrace: Fix regs argument order for i386
  x86, fakenuma: Fix invalid starting node ID
  x86/mm/32: Bring back vmalloc faulting on x86_32
  x86/cmdline: Disable jump tables for cmdline.c
2020-09-06 10:28:00 -07:00
Marc Zyngier
cd1752d34e genirq: Walk the irq_data hierarchy when resending an interrupt
On resending an interrupt, we only check the outermost irqchip for
a irq_retrigger callback. However, this callback could be implemented
at an inner level. Use irq_chip_retrigger_hierarchy() in this case.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-06 18:25:23 +01:00
Linus Torvalds
7514c0362f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "19 patches.

  Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
  checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
  hugetlb)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/log2.h: add missing () around n in roundup_pow_of_two()
  mm/khugepaged.c: fix khugepaged's request size in collapse_file
  mm/hugetlb: fix a race between hugetlb sysctl handlers
  mm/hugetlb: try preferred node first when alloc gigantic page from cma
  mm/migrate: preserve soft dirty in remove_migration_pte()
  mm/migrate: remove unnecessary is_zone_device_page() check
  mm/rmap: fixup copying of soft dirty and uffd ptes
  mm/migrate: fixup setting UFFD_WP flag
  mm: madvise: fix vma user-after-free
  checkpatch: fix the usage of capture group ( ... )
  fork: adjust sysctl_max_threads definition to match prototype
  ipc: adjust proc_ipc_sem_dointvec definition to match prototype
  mm: track page table modifications in __apply_to_page_range()
  MAINTAINERS: IA64: mark Status as Odd Fixes only
  MAINTAINERS: add LLVM maintainers
  MAINTAINERS: update Cavium/Marvell entries
  mm: slub: fix conversion of freelist_corrupted()
  mm: memcg: fix memcg reclaim soft lockup
  memcg: fix use-after-free in uncharge_batch
2020-09-05 13:28:40 -07:00
Tobias Klauser
b0daa2c73f fork: adjust sysctl_max_threads definition to match prototype
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
definition of sysctl_max_threads to match its prototype in
linux/sysctl.h which fixes the following sparse error/warning:

  kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
  kernel/fork.c:3050:47:    expected void *
  kernel/fork.c:3050:47:    got void [noderef] __user *buffer
  kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
  kernel/fork.c:3036:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
  kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
  include/linux/sysctl.h:242:5: note: previously declared as:
  include/linux/sysctl.h:242:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:29 -07:00
Jakub Kicinski
44a8c4f33c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
We got slightly different patches removing a double word
in a comment in net/ipv4/raw.c - picked the version from net.

Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
values instead of VNIC login response buffer (following what
commit 507ebe6444 ("ibmvnic: Fix use-after-free of VNIC login
response buffer") did).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-04 21:28:59 -07:00
Paul E. McKenney
6fe208f63a Merge branch 'csd.2020.09.04a' into HEAD
csd.2020.09.04a: CPU smp_call_function() torture tests.
2020-09-04 11:54:52 -07:00
Wei Yongjun
2b722160f1 smp: Make symbol 'csd_bug_count' static
The sparse tool complains as follows:

kernel/smp.c:107:10: warning:
 symbol 'csd_bug_count' was not declared. Should it be static?

Because variable is not used outside of smp.c, this commit marks it
static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-09-04 11:53:12 -07:00
Paul E. McKenney
35feb60474 kernel/smp: Provide CSD lock timeout diagnostics
This commit causes csd_lock_wait() to emit diagnostics when a CPU
fails to respond quickly enough to one of the smp_call_function()
family of function calls.  These diagnostics are enabled by a new
CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL.

This commit was inspired by an earlier patch by Josef Bacik.

[ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
[ paulmck: Fix KASAN use-after-free issue reported by Qian Cai. ]
[ paulmck: Fix botched nr_cpu_ids comparison per Dan Carpenter. ]
[ paulmck: Apply Peter Zijlstra feedback. ]
Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-04 11:52:50 -07:00
Paul E. McKenney
e48c15b796 smp: Add source and destination CPUs to __call_single_data
This commit adds a destination CPU to __call_single_data, and is inspired
by an earlier commit by Peter Zijlstra.  This version adds #ifdef to
permit use by 32-bit systems and supplying the destination CPU for all
smp_call_function*() requests, not just smp_call_function_single().

If need be, 32-bit systems could be accommodated by shrinking the flags
field to 16 bits (the atomic_t variant is currently unused) and by
providing only eight bits for CPU on such systems.

It is not clear that the addition of the fields to __call_single_node
are really needed.

[ paulmck: Apply Boqun Feng feedback on 32-bit builds. ]
Link: https://lore.kernel.org/lkml/20200615164048.GC2531@hirez.programming.kicks-ass.net/
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-04 11:50:50 -07:00
Leon Romanovsky
cfc905f158 gcov: Disable gcov build with GCC 10
GCOV built with GCC 10 doesn't initialize n_function variable.  This
produces different kernel panics as was seen by Colin in Ubuntu and me
in FC 32.

As a workaround, let's disable GCOV build for broken GCC 10 version.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891288
Link: https://lore.kernel.org/lkml/20200827133932.3338519-1-leon@kernel.org
Link: https://lore.kernel.org/lkml/CAHk-=whbijeSdSvx-Xcr0DPMj0BiwhJ+uiNnDSVZcr_h_kg7UA@mail.gmail.com/
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04 09:19:49 -07:00
Thomas Gleixner
4facb95b7a x86/entry: Unbreak 32bit fast syscall
Andy reported that the syscall treacing for 32bit fast syscall fails:

# ./tools/testing/selftests/x86/ptrace_syscall_32
...
[RUN] SYSEMU
[FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732)
...
[RUN] SYSCALL
[FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)
 
The eason is that the conversion to generic entry code moved the retrieval
of the sixth argument (EBP) after the point where the syscall entry work
runs, i.e. ptrace, seccomp, audit...

Unbreak it by providing a split up version of syscall_enter_from_user_mode().

- syscall_enter_from_user_mode_prepare() establishes state and enables
  interrupts

- syscall_enter_from_user_mode_work() runs the entry work

Replace the call to syscall_enter_from_user_mode() in the 32bit fast
syscall C-entry with the split functions and stick the EBP retrieval
between them.

Fixes: 27d6b4d14f ("x86/entry: Use generic syscall entry function")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de
2020-09-04 15:50:14 +02:00
Christian Brauner
6da73d1525
pidfd: support PIDFD_NONBLOCK in pidfd_open()
Introduce PIDFD_NONBLOCK to support non-blocking pidfd file descriptors.

Ever since the introduction of pidfds and more advanced async io various
programming languages such as Rust have grown support for async event
libraries. These libraries are created to help build epoll-based event loops
around file descriptors. A common pattern is to automatically make all file
descriptors they manage to O_NONBLOCK.

For such libraries the EAGAIN error code is treated specially. When a function
is called that returns EAGAIN the function isn't called again until the event
loop indicates the the file descriptor is ready. Supporting EAGAIN when
waiting on pidfds makes such libraries just work with little effort. In the
following patch we will extend waitid() internally to support non-blocking
pidfds.

This introduces a new flag PIDFD_NONBLOCK that is equivalent to O_NONBLOCK.
This follows the same patterns we have for other (anon inode) file descriptors
such as EFD_NONBLOCK, IN_NONBLOCK, SFD_NONBLOCK, TFD_NONBLOCK and the same for
close-on-exec flags.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
Link: https://github.com/joshtriplett/async-pidfd
Link: https://lore.kernel.org/r/20200902102130.147672-2-christian.brauner@ubuntu.com
2020-09-04 12:34:50 +02:00
Christian Brauner
ba7d25f3df
exit: support non-blocking pidfds
Passing a non-blocking pidfd to waitid() currently has no effect, i.e.  is not
supported. There are users which would like to use waitid() on pidfds that are
O_NONBLOCK and mix it with pidfds that are blocking and both pass them to
waitid().
The expected behavior is to have waitid() return -EAGAIN for non-blocking
pidfds and to block for blocking pidfds without needing to perform any
additional checks for flags set on the pidfd before passing it to waitid().
Non-blocking pidfds will return EAGAIN from waitid() when no child process is
ready yet. Returning -EAGAIN for non-blocking pidfds makes it easier for event
loops that handle EAGAIN specially.

It also makes the API more consistent and uniform. In essence, waitid() is
treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
socket.
With the addition of support for non-blocking pidfds we support the same
functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
for pidfds waitid() supports WNOHANG. Both flags are per-call options. In
contrast non-blocking pidfds and non-blocking sockets are a setting on an open
file description affecting all threads in the calling process as well as other
processes that hold file descriptors referring to the same open file
description. Both behaviors, per call and per open file description, have
genuine use-cases.

The implementation should be straightforward:
- If a non-blocking pidfd is passed and WNOHANG is not raised we simply raise
  the WNOHANG flag internally. When do_wait() returns indicating that there are
  eligible child processes but none have exited yet we set EAGAIN. If no child
  process exists we continue returning ECHILD.
- If a non-blocking pidfd is passed and WNOHANG is raised waitid() will
  continue returning 0, i.e. it will not set EAGAIN. This ensure backwards
  compatibility with applications passing WNOHANG explicitly with pidfds.

A concrete use-case that was brought on-list was Josh's async pidfd library.
Ever since the introduction of pidfds and more advanced async io various
programming languages such as Rust have grown support for async event
libraries. These libraries are created to help build epoll-based event loops
around file descriptors. A common pattern is to automatically make all file
descriptors they manage to O_NONBLOCK.

For such libraries the EAGAIN error code is treated specially. When a function
is called that returns EAGAIN the function isn't called again until the event
loop indicates the the file descriptor is ready.  Supporting EAGAIN when
waiting on pidfds makes such libraries just work with little effort.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Jann Horn <jannh@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
Link: https://github.com/joshtriplett/async-pidfd
Link: https://lore.kernel.org/r/20200902102130.147672-3-christian.brauner@ubuntu.com
2020-09-04 12:31:30 +02:00
Daniel Jordan
1b0df11fde padata: fix possible padata_works_lock deadlock
syzbot reports,

  WARNING: inconsistent lock state
  5.9.0-rc2-syzkaller #0 Not tainted
  --------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  syz-executor.0/26715 takes:
  (padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220
  {IN-SOFTIRQ-W} state was registered at:
    spin_lock include/linux/spinlock.h:354 [inline]
    padata_do_parallel kernel/padata.c:220
    ...
    __do_softirq kernel/softirq.c:298
    ...
    sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091
    asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581

   Possible unsafe locking scenario:

         CPU0
         ----
    lock(padata_works_lock);
    <Interrupt>
      lock(padata_works_lock);

padata_do_parallel() takes padata_works_lock with softirqs enabled, so a
deadlock is possible if, on the same CPU, the lock is acquired in
process context and then softirq handling done in an interrupt leads to
the same path.

Fix by leaving softirqs disabled while do_parallel holds
padata_works_lock.

Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com
Fixes: 4611ce2246 ("padata: allocate work structures for parallel jobs from a pool")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-04 17:51:55 +10:00
Linus Torvalds
3e8d3bdc2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

 2) Fix loss of RTT samples in rxrpc, from David Howells.

 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

 5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

 8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

11) Memory leak in rxkad_verify_response, from Dinghao Liu.

12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
  net/smc: fix sock refcounting in case of termination
  net/smc: reset sndbuf_desc if freed
  net/smc: set rx_off for SMCR explicitly
  net/smc: fix toleration of fake add_link messages
  tg3: Fix soft lockup when tg3_reset_task() fails.
  doc: net: dsa: Fix typo in config code sample
  net: dp83867: Fix WoL SecureOn password
  nfp: flower: fix ABI mismatch between driver and firmware
  tipc: fix shutdown() of connectionless socket
  ipv6: Fix sysctl max for fib_multipath_hash_policy
  drivers/net/wan/hdlc: Change the default of hard_header_len to 0
  net: gemini: Fix another missing clk_disable_unprepare() in probe
  net: bcmgenet: fix mask check in bcmgenet_validate_flow()
  amd-xgbe: Add support for new port mode
  net: usb: dm9601: Add USB ID of Keenetic Plus DSL
  vhost: fix typo in error message
  net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
  pktgen: fix error message with wrong function name
  net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
  cxgb4: fix thermal zone device registration
  ...
2020-09-03 18:50:48 -07:00
Yonghong Song
dc0988bbe1 bpf: Do not use bucket_lock for hashmap iterator
Currently, for hashmap, the bpf iterator will grab a bucket lock, a
spinlock, before traversing the elements in the bucket. This can ensure
all bpf visted elements are valid. But this mechanism may cause
deadlock if update/deletion happens to the same bucket of the
visited map in the program. For example, if we added bpf_map_update_elem()
call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
we will have the following deadlock:

  ============================================
  WARNING: possible recursive locking detected
  5.9.0-rc1+ #841 Not tainted
  --------------------------------------------
  test_progs/1750 is trying to acquire lock:
  ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410

  but task is already holding lock:
  ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&htab->buckets[i].raw_lock);
    lock(&htab->buckets[i].raw_lock);

   *** DEADLOCK ***
   ...
  Call Trace:
   dump_stack+0x78/0xa0
   __lock_acquire.cold.74+0x209/0x2e3
   lock_acquire+0xba/0x380
   ? htab_map_update_elem+0x1cf/0x410
   ? __lock_acquire+0x639/0x20c0
   _raw_spin_lock_irqsave+0x3b/0x80
   ? htab_map_update_elem+0x1cf/0x410
   htab_map_update_elem+0x1cf/0x410
   ? lock_acquire+0xba/0x380
   bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
   ? find_held_lock+0x34/0xa0
   bpf_iter_run_prog+0x81/0x16e
   __bpf_hash_map_seq_show+0x145/0x180
   bpf_seq_read+0xff/0x3d0
   vfs_read+0xad/0x1c0
   ksys_read+0x5f/0xe0
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ...

The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
and then grabbed again in htab_map_update_elem() in the bpf program, causing
deadlocks.

Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
 seq_ops->start():
     rcu_read_lock();
 seq_ops->next():
     rcu_read_unlock();
     /* next element */
     rcu_read_lock();
 seq_ops->stop();
     rcu_read_unlock();

Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
we may visit stale elements, miss some elements, or repeat some elements.
I think this is a reasonable compromise. For users wanting to avoid
stale, missing/repeated accesses, bpf_map batch access syscall interface
can be used.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com
2020-09-03 17:36:41 -07:00
Paul E. McKenney
7fbe67e46a Merge branch 'strictgp.2020.08.24a' into HEAD
strictgp.2020.08.24a: Strict grace periods for KASAN testing.
2020-09-03 09:47:42 -07:00
Paul E. McKenney
f511ce1424 Merge branch 'scftorture.2020.08.24a' into HEAD
scftorture.2020.08.24a: Torture tests for smp_call_function() and friends.
2020-09-03 09:47:01 -07:00
Paul E. McKenney
cfb2c1070a Merge branches 'doc.2020.08.24a', 'fixes.2020.09.03b' and 'torture.2020.08.24a' into HEAD
doc.2020.08.24a: Documentation updates.
fixes.2020.09.03b: Miscellaneous fixes.
torture.2020.08.24a: Torture-test updates.
2020-09-03 09:42:02 -07:00
Zqiang
70060b8770 rcu: Shrink each possible cpu krcp
CPUs can go offline shortly after kfree_call_rcu() has been invoked,
which can leave memory stranded until those CPUs come back online.
This commit therefore drains the kcrp of each CPU, not just the
ones that happen to be online.

Acked-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-03 09:40:13 -07:00