linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Jiri Olsa	edcaa47958	perf daemon: Add 'ping' command Add a 'ping' command to verify that the 'perf record' session is up and operational. It's used in the following patches via test code to make sure 'perf record' is ready to receive signals. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Ping all sessions: # perf daemon ping OK cycles OK sched Ping specific session: # perf daemon ping --session sched OK sched Committer notes: Fixed up bug pointed by clang: Buggy: if (!pollfd.revents & POLLIN) Correct code: if (!(pollfd.revents & POLLIN)) clang warning: builtin-daemon.c:560:6: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses] if (!pollfd.revents & POLLIN) { ^ ~ builtin-daemon.c:560:6: note: add parentheses after the '!' to evaluate the bitwise operator first Also use designated initialized with pollfd, i.e.: struct pollfd pollfd = { .events = POLLIN, }; Instead of: struct pollfd pollfd = { 0, }; To get past: builtin-daemon.c:510:30: error: missing field 'events' initializer [-Werror,-Wmissing-field-initializers] struct pollfd pollfd = { 0, }; ^ 1 error generated. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-16-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:19:52 -03:00
Jiri Olsa	6a6d1804a1	perf daemon: Set control fifo for session Setup control fifos for session and add --control option to session arguments. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start Use can list control fifos with (control and ack files): # perf daemon -v [776459:daemon] base: /opt/perfdata output: /opt/perfdata/output lock: /opt/perfdata/lock [776460:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a base: /opt/perfdata/session-cycles output: /opt/perfdata/session-cycles/output control: /opt/perfdata/session-cycles/control ack: /opt/perfdata/session-cycles/ack [776461:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a base: /opt/perfdata/session-sched output: /opt/perfdata/session-sched/output control: /opt/perfdata/session-sched/control ack: /opt/perfdata/session-sched/ack Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-15-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:19:52 -03:00
Jiri Olsa	8c98be6c36	perf daemon: Allow only one daemon over base directory Add 'lock' file under daemon base and flock it, so only one perf daemon can run on top of it. Each daemon tries to create and lock BASE/lock file, if it's successful we are sure we're the only daemon running over the BASE. Once daemon is finished, file descriptor to lock file is closed and lock is released. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start And try once more: # perf daemon start failed: another perf daemon (pid 775594) owns /opt/perfdata will end up with an error, because there's already one running on top of /opt/perfdata. Committer notes: Provide lockf(F_TLOCK) when not available, i.e. transform: lockf(fd, F_TLOCK, 0); into: flock(fd, LOCK_EX \| LOCK_NB); Which should be equivalent. Noticed when cross building to some odd Android NDK. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-14-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:16:56 -03:00
Jiri Olsa	23c5831e2e	perf daemon: Add 'stop' command Add 'perf daemon stop' command to stop daemon process and all running sessions. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Stop the daemon # perf daemon stop Daemon is not running, nothing to connect to: # perf daemon connect error: Connection refused Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-13-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	2d6914cd59	perf daemon: Add 'signal' command Allow the 'perf daemon' to send SIGUSR2 to all running sessions or just to a specific session. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Send signal to all running sessions: # perf daemon signal signal 12 sent to session 'cycles [773738]' signal 12 sent to session 'sched [773739]' Or to specific one: # perf daemon signal --session sched signal 12 sent to session 'sched [773739]' And verify signals were delivered and perf.data dumped: # cat /opt/perfdata/session-cycles/output rounding mmap pages size to 32M (8192 pages) [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2021010220382490 ] # car /opt/perfdata/session-sched/output rounding mmap pages size to 32M (8192 pages) [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2021010220382489 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2021010220393745 ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-12-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	b325f7be25	perf daemon: Add 'list' command Add a 'list' command to display all running sessions. It's the default command if no other command is specified. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start List sessions: # perf daemon [771394:daemon] base: /opt/perfdata [771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a [771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a List sessions with more info: # perf daemon -v [771394:daemon] base: /opt/perfdata output: /opt/perfdata/output [771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a base: /opt/perfdata/session-cycles output: /opt/perfdata/session-cycles/output [771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a base: /opt/perfdata/session-sched output: /opt/perfdata/session-sched/output The 'output' file is perf record output for specific session. Note you have to stop all running perf processes manually at this point, stop command is coming in following patches. Committer notes: Fixup union initialization to overcome this in multiple older systems: 22 15.74 debian:8 : FAIL gcc version 4.9.2 (Debian 4.9.2-10+deb8u2) builtin-daemon.c: In function 'send_cmd_list': builtin-daemon.c:1386:2: error: missing initializer for field 'csv_sep' of 'struct <anonymous>' [-Werror=missing-field-initializers] }; ^ builtin-daemon.c:641:8: note: 'csv_sep' declared here char csv_sep; ^ cc1: all warnings being treated as errors Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-11-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	12c1a415eb	perf daemon: Add signalfd support Use a signalfd fd to track SIGCHLD signals as notifications for perf session termination. This way we don't need to actively check for child status, being notified if there's change. Suggested-by: Alexei Budankov <abudankov@huawei.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	88adb1194c	perf daemon: Add background support Add support to put the daemon process in the background. It's now enabled by default and -f option is added to keep the daemon process on the console for debugging. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	3cda062520	perf daemon: Add config file change check Add support to detect changes to the daemon's config file triggering a re-read of the configuration when that happens. Use a inotify file descriptor plugged into the main fdarray object for polling. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a Starting the daemon: # perf daemon start Check sessions: # perf daemon [772262:daemon] base: /opt/perfdata [772263:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a Change '-m 10M' to '-m 20M', and check daemon log: # tail -f /opt/perfdata/output [2021-01-02 20:31:41.234045] daemon started (pid 772262) [2021-01-02 20:31:41.235072] reconfig: ruining session [cycles:772263]: -m 10M -e cycles --overwrite --switch-output -a [2021-01-02 20:32:08.310137] reconfig: session 'cycles' killed [2021-01-02 20:32:08.310847] reconfig: ruining session [cycles:772338]: -m 20M -e cycles --overwrite --switch-output -a And the session list: # perf daemon [772262:daemon] base: /opt/perfdata [772338:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a Note the changed '-m 20M' option is in place. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	c0666261ff	perf daemon: Add config file support Adding support to configure daemon with config file. Each client or server invocation of perf daemon needs to know the base directory, where all sessions data is stored. The base is defined with: daemon.base Base path for daemon data. All sessions data are stored under this path. The daemon allows to create record sessions. Each session is a record command spawned and monitored by perf daemon. The session is defined with: session-<NAME>.run Defines new record session for daemon. The value is record's command line without the 'record' keyword. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a The example above defines '/opt/perfdata' as the base directory and 2 record sessions. # perf daemon start [2021-01-28 19:47:33.454413] daemon started (pid 16015) [2021-01-28 19:47:33.455910] reconfig: ruining session [cycles:16016]: -m 10M -e cycles --overwrite --switch-output -a [2021-01-28 19:47:33.456599] reconfig: ruining session [sched:16017]: -m 20M -e sched:* --overwrite --switch-output -a # ps -ef \| grep perf ... perf daemon start ... /home/jolsa/.../perf record -m 20M -e cycles --overwrite --switch-output -a ... /home/jolsa/.../perf record -m 20M -e sched:* --overwrite --switch-output -a The base directory is populated with: # find /opt/perfdata/ /opt/perfdata/ /opt/perfdata/control <- control socket /opt/perfdata/session-cycles <- data for session 'cycles': /opt/perfdata/session-cycles/output <- perf record output /opt/perfdata/session-cycles/perf.data <- perf data /opt/perfdata/session-sched <- ditto for session 'sched' /opt/perfdata/session-sched/output /opt/perfdata/session-sched/perf.data Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 10:02:54 -03:00
Jiri Olsa	90b0aad8f6	perf daemon: Add client socket support Add support for client socket side that will be used to send commands to the daemon server socket. This patch adds only the core support, all commands using this functionality are coming in the following patches. Committer notes: Hat to patch patch it to deal with this in some systems: cc1: warnings being treated as errors builtin-daemon.c: In function 'send_cmd': MKDIR /tmp/build/perf/bench/ builtin-daemon.c:1368: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result MKDIR /tmp/build/perf/tests/ make[3]: *** [/tmp/build/perf/builtin-daemon.o] Error 1 And also to not leak the 'line' buffer allocated by getline(), since you initialized line to NULL and len to zero, man page says: If lineptr is set to NULL and n is set 0 before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20210208200908.1019149-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-02-11 09:52:28 -03:00
Christian König	811ee9dff5	drm/ttm: make sure pool pages are cleared The old implementation wasn't consistend on this. But it looks like we depend on this so better bring it back. Signed-off-by: Christian König <christian.koenig@amd.com> Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Fixes: `d099fc8f54` ("drm/ttm: new TT backend allocation pool v3") Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210210160549.1462-1-christian.koenig@amd.com	2021-02-11 09:35:19 +01:00
Julien Grall	c4295ab0b4	arm/xen: Don't probe xenbus as part of an early initcall After Commit `3499ba8198` ("xen: Fix event channel callback via INTX/GSI"), xenbus_probe() will be called too early on Arm. This will recent to a guest hang during boot. If the hang wasn't there, we would have ended up to call xenbus_probe() twice (the second time is in xenbus_probe_initcall()). We don't need to initialize xenbus_probe() early for Arm guest. Therefore, the call in xen_guest_init() is now removed. After this change, there is no more external caller for xenbus_probe(). So the function is turned to a static one. Interestingly there were two prototypes for it. Cc: stable@vger.kernel.org Fixes: `3499ba8198` ("xen: Fix event channel callback via INTX/GSI") Reported-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20210210170654.5377-1-julien@xen.org Signed-off-by: Juergen Gross <jgross@suse.com>	2021-02-11 07:49:37 +01:00
Palmer Dabbelt	3da3cc1b5f	Revert "dts: phy: add GPIO number and active state used for phy reset" VSC8541 phys need a special reset sequence, which the driver doesn't currentlny support. As a result enabling the reset via GPIO essentially guarnteees that the device won't work correctly. We've been relying on bootloaders to reset the device for years, with this revert we'll go back to doing so until we can sort out how to get the reset sequence into the kernel. This reverts commit `a0fa9d7270`. Fixes: `a0fa9d7270` ("dts: phy: add GPIO number and active state used for phy reset") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>	2021-02-10 16:06:14 -08:00
Thomas Gleixner	70245f86c1	x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init() Invoking x86_init.irqs.create_pci_msi_domain() before x86_init.pci.arch_init() breaks XEN PV. The XEN_PV specific pci.arch_init() function overrides the default create_pci_msi_domain() which is obviously too late. As a consequence the XEN PV PCI/MSI allocation goes through the native path which runs out of vectors and causes malfunction. Invoke it after x86_init.pci.arch_init(). Fixes: `6b15ffa07d` ("x86/irq: Initialize PCI/MSI domain at PCI init time") Reported-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87pn18djte.fsf@nanos.tec.linutronix.de	2021-02-10 22:06:47 +01:00
Linus Torvalds	291009f656	Power management fixes for 5.11-rc8 Address a performance regression related to scale-invariance on x86 that may prevent turbo CPU frequencies from being used in certain workloads on systems using acpi-cpufreq as the CPU performance scaling driver and schedutil as the scaling governor. -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmAkGloSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxCNkP/3uQly0iE4WdsxiBlWgF6zQH5PkezzSu vXY0E2NbzAlUpke3zDIZ6zkN6DfG1yAl4vpsVzy5N/kTzFnaFPbPH3ylZ2x/oKBZ Rd9yl0uz13UR4txkY49ZRF3c3vMhFoGzNfIXYjMQDGevyfarMLdpR96GFbpFTCT5 I5gZfDtOuAwpXY+Mr+UplTu7PTrmkf2jfQ/T/b+jog3NAjqODwnLT8dwIOTZnuPk vbCOhb5vsiUHaqilrKkuGS5TzGsb/KCa6k4kaf7WhoFiU99KKcaZRLca/4FlJuVj Q4rgSrtPsbvhG2vmucprunrsyt21JQMDnERqMlPcEls/c0ONgS4fMc5YJlO6KgZZ Mlu01f/oE84jQ//0Y3LVi6v6w+yOiBi1Ie9yD8wnOkn6c+r6sWWbvd5Kg/guGnwi LLSdemslw4r0ltimFmWD5I86ZXDJ1gwU9iuv+SdxoyppHHwOOAu5l/FhEgNvuWbl LeuLrl7BhYTbN40ouKivoQ8smTpI0EmZX2MRm+l5NV4hHQ+8df16Rt7hoFvAdI2L VJe/i0sgOcdCVnSovxQ8WeuSMGQtqrgFC4B/9+q6WOgAGIAFtyHQUtpHzB+XsIRo P2VAmxcdJsqrmnbtoxxopMcqov5cAYI5PPJO/4yR+Z+gOBjeBvEJaxDF/shFT2NO iaAQXEYLIPW9 =iOYP -----END PGP SIGNATURE----- Merge tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Address a performance regression related to scale-invariance on x86 that may prevent turbo CPU frequencies from being used in certain workloads on systems using acpi-cpufreq as the CPU performance scaling driver and schedutil as the scaling governor" * tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there cpufreq: ACPI: Extend frequency tables to cover boost frequencies	2021-02-10 12:03:35 -08:00
Linus Torvalds	a3961497bd	ACPI fix for 5.11-rc8 Revert a problematic ACPICA commit that changed the code to attempt to update memory regions which may be read-only on some systems (Ard Biesheuvel). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmAkGeESHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRx90EQAJvCLtbXhf+AsQbYiDhGwH0VE4EvDjqJ M3qZL8QDeXx+HRqicdCevpdukk3CAosm468yudwrEIpk7vUTtW2vgjTeVNuH3O4n Yz7Aw5EXfOO7LGXpZ+oyYPNpPZ9DK5BugpZW6C85qlBPHGzTem7IkoEOEem7lvUV QwahrwZ4D20/blgunGGt5jx8ilp9YA9wQl0FUaoiAfH7ydoA4YTbFqdcG2jEe9Xb yQvpLPoD50QTO0Gc1R86C0rehNfVQaXI9L5Nf8w5LU0UPGL2fTWkNwDFga8XvrH/ ZkIVWIFYBF6cgiOaB5TaZ/ufWw7BXe8c2X8lEdn2uJjZ4CXyKBVxAtdiqk+Bf57o mMszQDwf9hot+fXpq3TDeDhu8KwoLMpHITGxUzZ2Y50wtsb/vwkT45x9PWmUObTz fXvKbvGWZ/QFABsAGlBCAXUsTsUhkCnBuZ7icbbdG+HAqLg3/QJHvORg7pFYNCJH wlfChKpoH9n69VIo5Ae5NFxp678rW616E3N1yTF+0OK9uKhVqu7PxqRd9qWLOG4F Fm/fJB9XtIOzNaVmSqMd8YONpFv1Kf450d8uC3D0QPrCSE9OPl3b+JnqEfhxx2xl sjcT0UKewwm+H0pTzk/FdYqW1GpTV/5uWu+onhfZvVGePuJCRSJ+URw7FuhUfmiE iTAT8mV3zo6u =0hUo -----END PGP SIGNATURE----- Merge tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert a problematic ACPICA commit that changed the code to attempt to update memory regions which may be read-only on some systems (Ard Biesheuvel)" * tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"	2021-02-10 11:58:21 -08:00
Linus Torvalds	708c2e4181	dmaengine fixes-2 for v5.11 Some late fixes for dmaengine: - Core: fix channel device_node deletion - Driver fixes for: - dw: revert of runtime pm enabling - idxd: device state fix, interrupt completion and list corruption - ti: resource leak -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmAjrkQACgkQfBQHDyUj g0foZA/+Iqpi6fU0Dth4bdoJa5HO63a62G5nrhofF/vH681GMaazNj46byol3vuA Gc0/EZ2UtIkEY29ix0XaHkksQrsqn/Q5E4QK+5u9x32DHf3jvtbOblOSIBCdr//3 i+uc/K90ot4ERtNvwiPxQGWjS7rF+6BvHItRDxOaiele0Uvf18/VGn2x7fH5vNeK GqtZK47E11y5UhqpJAiwcgNAhKXC6I6s/tP0pidyWuXWeqVm+usr6Pun9YExMJQm N+kiR8eJoh5F0N9KAg3rOppxf4iEblvgh2vfMgcNC63GdeWB2x1OMgizAXjE136K HAvcp/3rQf76tUhjZkr/YZaNB7wCqzCRRcgQ/xyhSJt24yswfv9NFGVHd2ltkfx9 Yp+rl8ZC0dSvdGR3ECF9z98MzRbBPgu+TCW/50/Hh42Va0FJZbXyY45hpfR9qPe2 hiXwQkJ8IKH7C8BpDKA8vMlJc4xhbNsYW0GaSyoAUzhaStwTHcKNB4+5Xeia55e3 RR2OPJXl+y3jywcO15fmFdNIRsSvRVGYioFH0NzneaVVIlbQk5hRqADNMWelnwiA DJc21v7yurHeCh3lefn5Aml10n986S1b7XNPA7Ls+2FMmJeIt2vrKqvmKLhHVANY bvSKEXda2pAvb3zw2fCcCuPq6KUdJAfDrB0oorIlRBM3IkY+B5Y= =j/zV -----END PGP SIGNATURE----- Merge tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "Some late fixes for dmaengine: Core: - fix channel device_node deletion Driver fixes: - dw: revert of runtime pm enabling - idxd: device state fix, interrupt completion and list corruption - ti: resource leak * tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine dw: Revert "dmaengine: dw: Enable runtime PM" dmaengine: idxd: check device state before issue command dmaengine: ti: k3-udma: Fix a resource leak in an error handling path dmaengine: move channel device_node deletion to driver dmaengine: idxd: fix misc interrupt completion dmaengine: idxd: Fix list corruption in description completion	2021-02-10 11:51:25 -08:00
Jens Axboe	92c75f7594	Revert "io_uring: don't take fs for recvmsg/sendmsg" This reverts commit `10cad2c40d`. Petr reports that with this commit in place, io_uring fails the chroot test (CVE-202-29373). We do need to retain ->fs for send/recvmsg, so revert this commit. Reported-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-02-10 12:37:58 -07:00
Linus Torvalds	6016bf19b3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: "Another pile of networing fixes: 1) ath9k build error fix from Arnd Bergmann 2) dma memory leak fix in mediatec driver from Lorenzo Bianconi. 3) bpf int3 kprobe fix from Alexei Starovoitov. 4) bpf stackmap integer overflow fix from Bui Quang Minh. 5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from Christoph Schemmel. 6) Don't update deleted entry in xt_recent netfilter module, from Jazsef Kadlecsik. 7) Use after free in nftables, fix from Pablo Neira Ayuso. 8) Header checksum fix in flowtable from Sven Auhagen. 9) Validate user controlled length in qrtr code, from Sabyrzhan Tasbolatov. 10) Fix race in xen/netback, from Juergen Gross, 11) New device ID in cxgb4, from Raju Rangoju. 12) Fix ring locking in rxrpc release call, from David Howells. 13) Don't return LAPB error codes from x25_open(), from Xie He. 14) Missing error returns in gsi_channel_setup() from Alex Elder. 15) Get skb_copy_and_csum_datagram working properly with odd segment sizes, from Willem de Bruijn. 16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean. 17) Do teardown on probe failure in DSA, from Vladimir Oltean. 18) Fix compilation failures of txtimestamp selftest, from Vadim Fedorenko. 19) Limit rx per-napi gro queue size to fix latency regression, from Eric Dumazet. 20) dpaa_eth xdp fixes from Camelia Groza. 21) Missing txq mode update when switching CBS off, in stmmac driver, from Mohammad Athari Bin Ismail. 22) Failover pending logic fix in ibmvnic driver, from Sukadev Bhattiprolu. 23) Null deref fix in vmw_vsock, from Norbert Slusarek. 24) Missing verdict update in xdp paths of ena driver, from Shay Agroskin. 25) seq_file iteration fix in sctp from Neil Brown. 26) bpf 32-bit src register truncation fix on div/mod, from Daniel Borkmann. 27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann. 28) Fix locking in vsock_shutdown(), from Stefano Garzarella. 29) Various missing index bound checks in hns3 driver, from Yufeng Mo. 30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from Vladimir Oltean. 31) Don't mix up stp and mrp port states in bridge layer, from Horatiu Vultur. 32) Fix locking during netif_tx_disable(), from Edwin Peer" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) bpf: Fix 32 bit src register truncation on div/mod bpf: Fix verifier jmp32 pruning decision logic bpf: Fix verifier jsgt branch analysis on max bound vsock: fix locking in vsock_shutdown() net: hns3: add a check for index in hclge_get_rss_key() net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() net: hns3: add a check for queue_id in hclge_reset_vf_queue() net: dsa: felix: implement port flushing on .phylink_mac_link_down switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state net: watchdog: hold device global xmit lock during tx disable netfilter: nftables: relax check for stateful expressions in set definition netfilter: conntrack: skip identical origin tuple in same zone only vsock/virtio: update credit only if socket is not closed net: fix iteration for sctp transport seq_files net: ena: Update XDP verdict upon failure net/vmw_vsock: improve locking in vsock_connect_timeout() net/vmw_vsock: fix NULL pointer dereference ibmvnic: Clear failover_pending if unable to schedule net: stmmac: set TxQ mode back to DCB after disabling CBS ...	2021-02-10 11:33:39 -08:00
Linus Torvalds	4b16b656b1	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, mremap, tmpfs, selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and firmware" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: nilfs2: make splice write available again mm, slub: better heuristic for number of cpus when calculating slab order Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" MAINTAINERS: update Andrey Ryabinin's email address selftests/vm: rename file run_vmtests to run_vmtests.sh tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 mm/mremap: fix BUILD_BUG_ON() error in get_extent firmware_loader: align .builtin_fw to 8 kasan: fix stack traces dependency for HW_TAGS squashfs: add more sanity checks in xattr id lookup squashfs: add more sanity checks in inode lookup squashfs: add more sanity checks in id lookup squashfs: avoid out of bounds writes in decompressors	2021-02-10 11:22:41 -08:00
Joachim Henke	a35d8f016e	nilfs2: make splice write available again Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was caused by commit `36e2c7421f` ("fs: don't allow splice read/write without explicit ops"). This patch initializes the splice_write field in file_operations, like most file systems do, to restore the functionality. Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Joachim Henke <joachim.henke@t-systems.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-10 11:19:58 -08:00
Vlastimil Babka	3286222fc6	mm, slub: better heuristic for number of cpus when calculating slab order When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit `045ab8c948` ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in `32a6f409b6` ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of `045ab8c948` for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before `045ab8c948`. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/ [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz Fixes: `045ab8c948` ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-10 11:19:27 -08:00
Nikita Shubin	28dc10eb77	gpio: ep93xx: Fix single irqchip with multi gpiochips Fixes the following warnings which results in interrupts disabled on port B/F: gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver. gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver. - added separate irqchip for each interrupt capable gpiochip - provided unique names for each irqchip Fixes: `d2b0919615` ("gpio: ep93xx: Pass irqchip when adding gpiochip") Cc: <stable@vger.kernel.org> Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>	2021-02-10 14:47:27 +01:00
Nikita Shubin	8b81a7ab80	gpio: ep93xx: fix BUG_ON port F usage Two index spaces and ep93xx_gpio_port are confusing. Instead add a separate struct to store necessary data and remove ep93xx_gpio_port. - add struct to store IRQ related data for each IRQ capable chip - replace offset array with defined offsets - add IRQ registers offset for each IRQ capable chip into ep93xx_gpio_banks ------------[ cut here ]------------ kernel BUG at drivers/gpio/gpio-ep93xx.c:64! ---[ end trace 3f6544e133e9f5ae ]--- Fixes: `fd935fc421` ("gpio: ep93xx: Do not pingpong irq numbers") Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>	2021-02-10 14:47:16 +01:00
Geert Uytterhoeven	97c6e28d38	gpio: mxs: GPIO_MXS should not default to y unconditionally Merely enabling CONFIG_COMPILE_TEST should not enable additional code. To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS, and ask the user in case of compile-testing. Fixes: `6876ca311b` ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS") Cc: <stable@vger.kernel.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>	2021-02-10 14:25:59 +01:00
Jernej Skrabec	1926a0508d	drm/sun4i: dw-hdmi: Fix max. frequency for H6 It turns out that reasoning for lowering max. supported frequency is wrong. Scrambling works just fine. Several now fixed bugs prevented proper functioning, even with rates lower than 340 MHz. Issues were just more pronounced with higher frequencies. Fix that by allowing max. supported frequency in HW and fix the comment. Fixes: `cd9063757a` ("drm/sun4i: DW HDMI: Lower max. supported rate for H6") Reviewed-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-6-jernej.skrabec@siol.net	2021-02-10 11:20:38 +01:00
Jernej Skrabec	6a155216c4	drm/sun4i: Fix H6 HDMI PHY configuration As it turns out, vendor HDMI PHY driver for H6 has a pretty big table of predefined values for various pixel clocks. However, most of them are not useful/tested because they come from reference driver code. Vendor PHY driver is concerned with only few of those, namely 27 MHz, 74.25 MHz, 148.5 MHz, 297 MHz and 594 MHz. These are all frequencies for standard CEA modes. Fix sun50i_h6_cur_ctr and sun50i_h6_phy_config with the values only for aforementioned frequencies. Table sun50i_h6_mpll_cfg doesn't need to be changed because values are actually frequency dependent and not so much SoC dependent. See i.MX6 documentation for explanation of those values for similar PHY. Fixes: `c71c9b2fee` ("drm/sun4i: Add support for Synopsys HDMI PHY") Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-5-jernej.skrabec@siol.net	2021-02-10 11:20:13 +01:00
Jernej Skrabec	36b53581fe	drm/sun4i: dw-hdmi: always set clock rate As expected, HDMI controller clock should always match pixel clock. In the past, changing HDMI controller rate would seemingly worsen situation. However, that was the result of other bugs which are now fixed. Fix that by removing set_rate quirk and always set clock rate. Fixes: `40bb9d3147` ("drm/sun4i: Add support for H6 DW HDMI controller") Reviewed-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-4-jernej.skrabec@siol.net	2021-02-10 11:20:01 +01:00
Jernej Skrabec	50791f5d7b	drm/sun4i: tcon: set sync polarity for tcon1 channel Channel 1 has polarity bits for vsync and hsync signals but driver never sets them. It turns out that with pre-HDMI2 controllers seemingly there is no issue if polarity is not set. However, with HDMI2 controllers (H6) there often comes to de-synchronization due to phase shift. This causes flickering screen. It's safe to assume that similar issues might happen also with pre-HDMI2 controllers. Solve issue with setting vsync and hsync polarity. Note that display stacks with tcon top have polarity bits actually in tcon0 polarity register. Fixes: `9026e0d122` ("drm: Add Allwinner A10 Display Engine support") Reviewed-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-3-jernej.skrabec@siol.net	2021-02-10 11:19:56 +01:00
Ville Syrjälä	5feba0e905	drm/i915: Fix overlay frontbuffer tracking We don't have a persistent fb holding a reference to the frontbuffer object, so every time we do the get+put we throw the frontbuffer object immediately away. And so the next time around we get a pristine frontbuffer object with bits==0 even for the old vma. This confuses the frontbuffer tracking code which understandably expects the old frontbuffer to have the overlay's bit set. Fix this by hanging on to the frontbuffer reference until the next flip. And just to make this a bit more clear let's track the frontbuffer explicitly instead of just grabbing it via the old vma. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1136 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210209021918.16234-2-ville.syrjala@linux.intel.com Fixes: `8e7cb1799b` ("drm/i915: Extract intel_frontbuffer active tracking") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit `553c23bdb4`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2021-02-10 11:03:56 +02:00
Alex Deucher	cf050f96e0	Revert "drm/amd/display: Update NV1x SR latency values" This reverts commit `4a3dea8932`. This causes blank screens for some users. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1482 Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Jun Lei <Jun.Lei@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-02-09 23:23:18 -05:00
David S. Miller	b8776f14a4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2021-02-10 The following pull-request contains BPF updates for your net tree. We've added 5 non-merge commits during the last 8 day(s) which contain a total of 3 files changed, 22 insertions(+), 21 deletions(-). The main changes are: 1) Fix missed execution of kprobes BPF progs when kprobe is firing via int3, from Alexei Starovoitov. 2) Fix potential integer overflow in map max_entries for stackmap on 32 bit archs, from Bui Quang Minh. 3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> c# Please enter a commit message to explain why this merge is necessary,	2021-02-09 18:55:17 -08:00
Ronnie Sahlberg	a0f85e38a3	cifs: do not disable noperm if multiuser mount option is not provided Fixes small regression in implementation of new mount API. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reported-by: Hyunchul Lee <hyc.lee@gmail.com> Tested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-02-09 20:47:05 -06:00
Johannes Weiner	e82553c10b	Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" This reverts commit `536d3bf261`, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit `b3ff92916a` ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: `536d3bf261` ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Andrey Ryabinin	a0c2eb0a43	MAINTAINERS: update Andrey Ryabinin's email address Update my email, @virtuozzo.com will stop working shortly. Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.com Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Rong Chen	d52db80084	selftests/vm: rename file run_vmtests to run_vmtests.sh Commit `c2aa8afc36` has renamed run_vmtests in Makefile, but the file still uses the old name. The kernel test robot reported the following issue: # selftests: vm: run_vmtests.sh # Warning: file run_vmtests.sh is missing! not ok 1 selftests: vm: run_vmtests.sh Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com Fixes: `c2aa8afc36` (selftests/vm: rename run_vmtests --> run_vmtests.sh) Signed-off-by: Rong Chen <rong.a.chen@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Seth Forshee	ad69c389ec	tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, whereas passing "inode64" in the mount options will fail. This leads to erroneous behaviours such as this: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. Prevent CONFIG_TMPFS_INODE64 from being selected on alpha. Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com Fixes: `ea3271f719` ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Seth Forshee	b85a7a8bb5	tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 Currently there is an assumption in tmpfs that 64-bit architectures also have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, but passing the "inode64" mount option will fail. This leads to the following behavior: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. As mount sees "inode64" in the mount options and thus passes it in the options for the remount. So prevent CONFIG_TMPFS_INODE64 from being selected on s390. Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com Fixes: `ea3271f719` ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Hugh Dickins <hughd@google.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Arnd Bergmann	a30a29091b	mm/mremap: fix BUILD_BUG_ON() error in get_extent clang can't evaluate this function argument at compile time when the function is not inlined, which leads to a link time failure: ld.lld: error: undefined symbol: __compiletime_assert_414 >>> referenced by mremap.c >>> mremap.o:(get_extent) in archive mm/built-in.a Mark the function as __always_inline to avoid it. Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org Fixes: `9ad9718bfa` ("mm/mremap: calculate extent in one place") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Fangrui Song	793f49a87a	firmware_loader: align .builtin_fw to 8 arm64 references the start address of .builtin_fw (__start_builtin_fw) with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC relocations. The compiler is allowed to emit the R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in include/linux/firmware.h is 8-byte aligned. The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a multiple of 8, which may not be the case if .builtin_fw is empty. Unconditionally align .builtin_fw to fix the linker error. 32-bit architectures could use ALIGN(4) but that would add unnecessary complexity, so just use ALIGN(8). Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/1204 Fixes: `5658c76` ("firmware: allow firmware files to be built into kernel image") Signed-off-by: Fangrui Song <maskray@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Douglas Anderson <dianders@chromium.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Andrey Konovalov	1cc4cdb521	kasan: fix stack traces dependency for HW_TAGS Currently, whether the alloc/free stack traces collection is enabled by default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL. The intention for this dependency was to only enable collection on slow debug kernels due to a significant perf and memory impact. As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option and is enabled on many productions kernels including Android and Ubuntu. As the result, this dependency is pointless and only complicates the code and documentation. Having stack traces collection disabled by default would make the hardware mode work differently to to the software ones, which is confusing. This change removes the dependency and enables stack traces collection by default. Looking into the future, this default might makes sense for production kernels, assuming we implement a fast stack trace collection approach. Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Phillip Lougher	506220d2ba	squashfs: add more sanity checks in xattr id lookup Sysbot has reported a warning where a kmalloc() attempt exceeds the maximum limit. This has been identified as corruption of the xattr_ids count when reading the xattr id lookup table. This patch adds a number of additional sanity checks to detect this corruption and others. 1. It checks for a corrupted xattr index read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This would cause an out of bounds read. 2. It checks against corruption of the xattr_ids count. This can either lead to the above kmalloc failure, or a smaller than expected table to be read. 3. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Phillip Lougher	eabac19e40	squashfs: add more sanity checks in inode lookup Sysbot has reported an "slab-out-of-bounds read" error which has been identified as being caused by a corrupted "ino_num" value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the inodes count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large inodes count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Phillip Lougher	f37aa4c736	squashfs: add more sanity checks in id lookup Sysbot has reported a number of "slab-out-of-bounds reads" and "use-after-free read" errors which has been identified as being caused by a corrupted index value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the ids count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large ids count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Phillip Lougher	e812cbbbbb	squashfs: avoid out of bounds writes in decompressors Patch series "Squashfs: fix BIO migration regression and add sanity checks". Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block usage to BIO" patch, which has produced a number of Sysbot/Syzkaller reports. Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption issues which have produced Sysbot reports in the id, inode and xattr lookup code. Each patch has been tested against the Sysbot reproducers using the given kernel configuration. They have the appropriate "Reported-by:" lines added. Additionally, all of the reproducer filesystems are indirectly fixed by patch [4/4] due to the fact they all have xattr corruption which is now detected there. Additional testing with other configurations and architectures (32bit, big endian), and normal filesystems has also been done to trap any inadvertent regressions caused by the additional sanity checks. This patch (of 4): This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Sysbot/Syskaller has reported a number of "out of bounds writes" and "unable to handle kernel paging request in squashfs_decompress" errors which have been identified as a regression introduced by the above patch. Specifically, the patch removed the following sanity check if (length < 0 \|\| length > output->length \|\| (index + length) > msblk->bytes_used) This check did two things: 1. It ensured any reads were not beyond the end of the filesystem 2. It ensured that the "length" field read from the filesystem was within the expected maximum length. Without this any corrupted values can over-run allocated buffers. Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk Fixes: `93e72b3c61` ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Philippe Liard <pliard@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-09 17:26:44 -08:00
Linus Torvalds	ef7d0b5999	I3C fixes for 5.11 Drivers: - mipi-i3c-hci: fix compilation warning with Clang -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEycoQi/giopmpPgB12wIijOdRNOUFAmAjBgQACgkQ2wIijOdR NOXYRRAAgRgdB16WWYp1VeCtgDbBNRSeigjAEktuAbWMmsgPbNtr+HcD7N9seO7d 9+5EYIbp9GM0fTWbAWEo6ZvD6Dd0GuddCYXBscaOWohzVSrYBH+cO8ZgiW38XJbC uSjYUOxPtjgKq0aXEIva8IplLcMvga4nyapNIILKvOaxTRXWHwaeFDvZfOP2QE1o 4hdAOKIhtuOxVgQ2tNKNX7Yao4p9BzdpxD/s5fyIwqyFUCeL/lYq1whBfAXVkbIc lriBNWMv3cqA3+fL902On9XkfJnO0rZ/SB771u65L2/veAfPLX8AfM9eAedVT/NB shkxsW9814Z/s/szGp9twumS03FQcsXupM2yOcVAzzb7r5pD/pqS/rj68pqBvZ5i 9eawJ9SeBoVvKVo5Kko3BkHtbqSsICZCP0X8LKZ+4svVvMOJmIoze6Y2wdnETxKe UFI0elJZmqjHuit4Bt75Ltlr51Kfd/BZZ72VeZdYAq857D4kIP/r9l1TU6WSM20l vGKIvN+qNY7/B8Zr65HPyjL8YlkHHxDWprjE39z4cZEjm/SyUSGjE1XL/pyq/2HL me9eAn7PCzJwq7IYgQSt+LJtS26laH9zVvzL8HefrdiU9W7/z8qE/XYGH8oX2elC 2359zH6i7sgp3pZ+hp++HDiwP4+CbiVaUReTISLlAi9RAPD1zj8= =wuzi -----END PGP SIGNATURE----- Merge tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c fix from Alexandre Belloni: "A single build warning fix" * tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match	2021-02-09 17:19:56 -08:00
Daniel Borkmann	e88b2c6e5a	bpf: Fix 32 bit src register truncation on div/mod While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. Fixes: `68fda450a7` ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2021-02-10 01:32:40 +01:00
Daniel Borkmann	fd675184fc	bpf: Fix verifier jmp32 pruning decision logic Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 808464450 1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b4) w4 = 808464432 2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0 2: (9c) w4 %= w0 3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 3: (66) if w4 s> 0x30303030 goto pc+0 R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit propagating r0 from 6 to 7: safe 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 propagating r0 7: safe propagating r0 from 6 to 7: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 The underlying program was xlated as follows: # bpftool p d x i 10 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 5: (66) if w4 s> 0x30303030 goto pc+0 6: (7f) r0 >>= r0 7: (bc) w0 = w0 8: (15) if r0 == 0x0 goto pc+1 9: (9c) w4 %= w0 10: (66) if w0 s> 0x3030 goto pc+0 11: (d6) if w0 s<= 0x303030 goto pc+1 12: (05) goto pc-1 13: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we are actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking a closer look at the verifier analysis, the reason is that it misjudges its pruning decision at the first 'from 6 to 7: safe' occasion. What happens is that while both old/cur registers are marked as precise, they get misjudged for the jmp32 case as range_within() yields true, meaning that the prior verification path with a wider register bound could be verified successfully and therefore the current path with a narrower register bound is deemed safe as well whereas in reality it's not. R0 old/cur path's bounds compare as follows: old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff) cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff) old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff The 64 bit bounds generally look okay and while the information that got propagated from 32 to 64 bit looks correct as well, it's not precise enough for judging a conditional jmp32. Given the latter only operates on subregisters we also need to take these into account as well for a range_within() probe in order to be able to prune paths. Extending the range_within() constraint to both bounds will be able to tell us that the old signed 32 bit bounds are not wider than the cur signed 32 bit bounds. With the fix in place, the program will now verify the 'goto' branch case as it should have been: [...] 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit 7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: (30) r0 = (u8 )skb[808464432] BPF_LD_[ABS\|IND] uses reserved fields processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1 The bug is quite subtle in the sense that when verifier would determine that a given branch is dead code, it would (here: wrongly) remove these instructions from the program and hard-wire the taken branch for privileged programs instead of the 'goto pc-1' rewrites which will cause hard to debug problems. Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2021-02-10 01:31:46 +01:00
Daniel Borkmann	ee114dd64c	bpf: Fix verifier jsgt branch analysis on max bound Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return code for both will tell the caller whether a given conditional jump is taken or not, e.g. 1 means branch will be taken [for the involved registers] and the goto target will be executed, 0 means branch will not be taken and instead we fall-through to the next insn, and last but not least a -1 denotes that it is not known at verification time whether a branch will be taken or not. Now while the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval since the branch will also be taken for reg->s32_max_value == sval. The jgt branch analysis, for example, gets this right. Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: `4f7b3e8258` ("bpf: improve verifier branch analysis") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2021-02-10 01:31:45 +01:00

1 2 3 4 5 ...

984366 commits