1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

9093 commits

Author SHA1 Message Date
Simona Vetter
b615b9c36c Linux 6.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbeHCQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwfwH/ijnVvDWt0L1mpkE
 oIPmKV+2018CA5ww/Hh+ncToWn/aCmrczHc1SEUOk/SbZnGyXJj/6KiNEK6XpJyu
 Hb90y53D5B9jkEq8WPbSy5RtqCU598gYPeBxkczjj431jer9EsZVzqsKxGRzdAud
 2+Ft/qLiVL8AP5P8IPuU7G8CU6OE0fUL5PyuzMGDtstL3R6lPpG+kf/VrJGV1mp7
 DVZO8hKwIi5Vu+ciaTJv+9PMHzXRnMhLIGabtGIzM8nhmrQx/Kv/PMjiEl/OBkmk
 6PzafEkxVtBKDNK2Qhp+QMTQJATuPccZI8Kn6peZhqoNWYHBqx7d88Q/2iiAGj0z
 skPW5Gs=
 =orf8
 -----END PGP SIGNATURE-----

Merge v6.11-rc7 into drm-next

Thomas needs 5a498d4d06 ("drm/fbdev-dma: Only install deferred I/O
if necessary") in drm-misc, so start the backmerge cascade.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-11 09:18:15 +02:00
Daniel Vetter
4461e9e5c3 Linux 6.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbK2B8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFwkH/10QpUgzIfbFKbF+
 5hwcvaqS5myxWwJ4PjN0eR1qGE6RzVO0Tb24+TVql+7pxu+iWm1kYgC3+/T5xJsP
 ECAszdmPWSco1xaHrh2y3PyCJjaBiqFbIxdjPp7odjDpG9qarbcty8YpWs44u/gd
 RDXzHUuScEShBhEt0ZhvE1pIDL8jJ8JL3yqOMZ+XaDxtJbjaHw4GHp8efxlBWc8N
 jZKIVJi22q5NWG5T0tGtPWwzCm0ewA/JNMTEvE9leoSoAgO85NZ0ivxMC76q/tbj
 BrYk5KnzfhJs4b/n/KtIwWaLTgLyXKGqHMaMq8sbXtp410aUdgnRJO2cl3fI+1vc
 vxQfAfk=
 =RemI
 -----END PGP SIGNATURE-----

Merge v6.11-rc5 into drm-next

amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might
as well catch up with a backmerge and handle them all. Plus both misc
and intel maintainers asked for a backmerge anyway.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-08-27 14:09:45 +02:00
Christophe Leroy
6114139c3b powerpc/vdso: Don't discard rela sections
After building the VDSO, there is a verification that it contains
no dynamic relocation, see commit aff69273af ("vdso: Improve
cmd_vdso_check to check all dynamic relocations").

This verification uses readelf -r and doesn't work if rela sections
are discarded.

Fixes: 8ad57add77 ("powerpc/build: vdso linker warning for orphan sections")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/45c3e6fc76cad05ad2cac0f5b5dfb4fae86dc9d6.1724153239.git.christophe.leroy@csgroup.eu
2024-08-22 22:52:29 +10:00
Christophe Leroy
e7e846dc6c powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL
Booting with CONFIG_DEBUG_VIRTUAL leads to following warning when
passing hugepage reservation on command line:

  Kernel command line: hugepagesz=1g hugepages=1 hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1 noreboot
  HugeTLB: allocating 1 of page size 1.00 GiB failed.  Only allocated 0 hugepages.
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/io.h:948 __alloc_bootmem_huge_page+0xd4/0x284
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc6-00396-g6b0e82791bd0-dirty #936
  Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS
  NIP:  c1020240 LR: c10201d0 CTR: 00000000
  REGS: c13fdd30 TRAP: 0700   Not tainted  (6.10.0-rc6-00396-g6b0e82791bd0-dirty)
  MSR:  00021000 <CE,ME>  CR: 44084288  XER: 20000000

  GPR00: c10201d0 c13fde20 c130b560 e8000000 e8001000 00000000 00000000 c1420000
  GPR08: 00000000 00028001 00000000 00000004 44084282 01066ac0 c0eb7c9c efffe149
  GPR16: c0fc4228 0000005f ffffffff c0eb7d0c c0eb7cc0 c0eb7ce0 ffffffff 00000000
  GPR24: c1441cec efffe153 e8001000 c14240c0 00000000 c1441d64 00000000 e8000000
  NIP [c1020240] __alloc_bootmem_huge_page+0xd4/0x284
  LR [c10201d0] __alloc_bootmem_huge_page+0x64/0x284
  Call Trace:
  [c13fde20] [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 (unreliable)
  [c13fde50] [c10207b8] hugetlb_hstate_alloc_pages+0x8c/0x3e8
  [c13fdeb0] [c1021384] hugepages_setup+0x240/0x2cc
  [c13fdef0] [c1000574] unknown_bootoption+0xfc/0x280
  [c13fdf30] [c0078904] parse_args+0x200/0x4c4
  [c13fdfa0] [c1000d9c] start_kernel+0x238/0x7d0
  [c13fdff0] [c0000434] set_ivor+0x12c/0x168
  Code: 554aa33e 7c042840 3ce0c142 80a7427c 5109a016 50caa016 7c9a2378 7fdcf378 4180000c 7c052040 41810160 7c095040 <0fe00000> 38c00000 40800108 3c60c0eb
  ---[ end trace 0000000000000000 ]---

This is due to virt_addr_valid() using high_memory before it is set.

high_memory is set in mem_init() using max_low_pfn, but max_low_pfn
is available long before, it is set in mem_topology_setup(). So just
like commit daa9ada209 ("powerpc/mm: Fix boot crash with FLATMEM")
moved the setting of max_mapnr immediately after the call to
mem_topology_setup(), the same can be done for high_memory.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/62b69c4baad067093f39e7e60df0fe27a86b8d2a.1723100702.git.christophe.leroy@csgroup.eu
2024-08-12 21:50:20 +10:00
Daniel Vetter
91dae758bd drm-misc-next for v6.12:
UAPI Changes:
 
 virtio:
 - Define DRM capset
 
 Cross-subsystem Changes:
 
 dma-buf:
 - heaps: Clean up documentation
 
 printk:
 - Pass description to kmsg_dump()
 
 Core Changes:
 
 CI:
 - Update IGT tests
 - Point upstream repo to GitLab instance
 
 modesetting:
 - Introduce Power Saving Policy property for connectors
 - Add might_fault() to drm_modeset_lock priming
 - Add dynamic per-crtc vblank configuration support
 
 panic:
 - Avoid build-time interference with framebuffer console
 
 docs:
 - Document Colorspace property
 
 scheduler:
 - Remove full_recover from drm_sched_start
 
 TTM:
 - Make LRU walk restartable after dropping locks
 - Allow direct reclaim to allocate local memory
 
 Driver Changes:
 
 amdgpu:
 - Support Power Saving Policy connector property
 
 ast:
 - astdp: Support AST2600 with VGA; Clean up HPD
 
 bridge:
 - Silence error message on -EPROBE_DEFER
 - analogix: Clean aup
 - bridge-connector: Fix double free
 - lt6505: Disable interrupt when powered off
 - tc358767: Make default DP port preemphasis configurable
 
 gma500:
 - Update i2c terminology
 
 ivpu:
 - Add MODULE_FIRMWARE()
 
 lcdif:
 - Fix pixel clock
 
 loongson:
 - Use GEM refcount over TTM's
 
 mgag200:
 - Improve BMC handling
 - Support VBLANK intterupts
 
 nouveau:
 - Refactor and clean up internals
 - Use GEM refcount over TTM's
 
 panel:
 - Shutdown fixes plus documentation
 - Refactor several drivers for better code sharing
 - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
   DT; Fix porch parameter
 - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
   BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
   CMN N116BCP-EA2, CSW MNB601LS1-4
 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
 - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
 - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
   for code sharing
 
 sti:
 - Fix module owner
 
 stm:
 - Avoid UAF wih managed plane and CRTC helpers
 - Fix module owner
 - Fix error handling in probe
 - Depend on COMMON_CLK
 - ltdc: Fix transparency after disabling plane; Remove unused interrupt
 
 tegra:
 - Call drm_atomic_helper_shutdown()
 
 v3d:
 - Clean up perfmon
 
 vkms:
 - Clean up
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmareygACgkQaA3BHVML
 eiO2vwf9FirbMiq4lfHzgcbNIU1dTUtjRAZjrlwGmqk5cb9lUshAMCMBMOEQBDdg
 XMQQj/RMBvRUuxzsPGk78ObSz5FBaBLgKwFprer0V6uslQaJxj4YRsnkp0l2n+0k
 +ebhfo2rUgZOdgNOkXH326w9UhqiydIa7GaA2aq1vUzXKFDfvGXtSN75BMlEWlKP
 rTft56AiwjwcKu7zYFHGlFUMSNpKAQy7lnV3+dBXAfFNHu4zVNoI/yWGEOdR7eVo
 WhiEcpvismsOh+BfUvMNPP3RKwjXHdwMlJYb+v9XGgH27hqc50lSceWydHtoJTto
 DTXF9WQhJ+/GQR9ZGmBjos9GVbECDA==
 =L/1W
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-08-01' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.12:

UAPI Changes:

virtio:
- Define DRM capset

Cross-subsystem Changes:

dma-buf:
- heaps: Clean up documentation

printk:
- Pass description to kmsg_dump()

Core Changes:

CI:
- Update IGT tests
- Point upstream repo to GitLab instance

modesetting:
- Introduce Power Saving Policy property for connectors
- Add might_fault() to drm_modeset_lock priming
- Add dynamic per-crtc vblank configuration support

panic:
- Avoid build-time interference with framebuffer console

docs:
- Document Colorspace property

scheduler:
- Remove full_recover from drm_sched_start

TTM:
- Make LRU walk restartable after dropping locks
- Allow direct reclaim to allocate local memory

Driver Changes:

amdgpu:
- Support Power Saving Policy connector property

ast:
- astdp: Support AST2600 with VGA; Clean up HPD

bridge:
- Silence error message on -EPROBE_DEFER
- analogix: Clean aup
- bridge-connector: Fix double free
- lt6505: Disable interrupt when powered off
- tc358767: Make default DP port preemphasis configurable

gma500:
- Update i2c terminology

ivpu:
- Add MODULE_FIRMWARE()

lcdif:
- Fix pixel clock

loongson:
- Use GEM refcount over TTM's

mgag200:
- Improve BMC handling
- Support VBLANK intterupts

nouveau:
- Refactor and clean up internals
- Use GEM refcount over TTM's

panel:
- Shutdown fixes plus documentation
- Refactor several drivers for better code sharing
- boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
  DT; Fix porch parameter
- edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
  BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
  CMN N116BCP-EA2, CSW MNB601LS1-4
- himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
- ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
- jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
  for code sharing

sti:
- Fix module owner

stm:
- Avoid UAF wih managed plane and CRTC helpers
- Fix module owner
- Fix error handling in probe
- Depend on COMMON_CLK
- ltdc: Fix transparency after disabling plane; Remove unused interrupt

tegra:
- Call drm_atomic_helper_shutdown()

v3d:
- Clean up perfmon

vkms:
- Clean up

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-08-08 18:58:46 +02:00
Thomas Zimmermann
0e8655b4e8 Merge drm/drm-next into drm-misc-next
Backmerging to get a late RC of v6.10 before moving into v6.11.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-29 09:35:54 +02:00
Linus Torvalds
527eff227d - In the series "treewide: Refactor heap related implementation",
Kuan-Wei Chiu has significantly reworked the min_heap library code and
   has taught bcachefs to use the new more generic implementation.
 
 - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
   reworks the cpumask and nodemask headers to make things generally more
   rational.
 
 - Kuan-Wei Chiu has sent along some maintenance work against our sorting
   library code in the series "lib/sort: Optimizations and cleanups".
 
 - More library maintainance work from Christophe Jaillet in the series
   "Remove usage of the deprecated ida_simple_xx() API".
 
 - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
   series "nilfs2: eliminate the call to inode_attach_wb()".
 
 - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB
   command error".
 
 - Plus the usual shower of singleton patches all over the place.  Please
   see the relevant changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2GvwAKCRDdBJ7gKXxA
 jlf/AP48xP5ilIHbtpAKm2z+MvGuTxJQ5VSC0UXFacuCbc93lAEA+Yo+vOVRmh6j
 fQF2nVKyKLYfSz7yqmCyAaHWohIYLgg=
 =Stxz
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - In the series "treewide: Refactor heap related implementation",
   Kuan-Wei Chiu has significantly reworked the min_heap library code
   and has taught bcachefs to use the new more generic implementation.

 - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
   reworks the cpumask and nodemask headers to make things generally
   more rational.

 - Kuan-Wei Chiu has sent along some maintenance work against our
   sorting library code in the series "lib/sort: Optimizations and
   cleanups".

 - More library maintainance work from Christophe Jaillet in the series
   "Remove usage of the deprecated ida_simple_xx() API".

 - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
   series "nilfs2: eliminate the call to inode_attach_wb()".

 - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
   GDB command error".

 - Plus the usual shower of singleton patches all over the place. Please
   see the relevant changelogs for details.

* tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
  ia64: scrub ia64 from poison.h
  watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
  tsacct: replace strncpy() with strscpy()
  lib/bch.c: use swap() to improve code
  test_bpf: convert comma to semicolon
  init/modpost: conditionally check section mismatch to __meminit*
  init: remove unused __MEMINIT* macros
  nilfs2: Constify struct kobj_type
  nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro
  math: rational: add missing MODULE_DESCRIPTION() macro
  lib/zlib: add missing MODULE_DESCRIPTION() macro
  fs: ufs: add MODULE_DESCRIPTION()
  lib/rbtree.c: fix the example typo
  ocfs2: add bounds checking to ocfs2_check_dir_entry()
  fs: add kernel-doc comments to ocfs2_prepare_orphan_dir()
  coredump: simplify zap_process()
  selftests/fpu: add missing MODULE_DESCRIPTION() macro
  compiler.h: simplify data_race() macro
  build-id: require program headers to be right after ELF header
  resource: add missing MODULE_DESCRIPTION()
  ...
2024-07-21 17:56:22 -07:00
Linus Torvalds
fbc90c042c - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression
   (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
   Yu has a fix which I'll send along later via the hotfixes branch.
 
 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.
 
 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that.  This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches.  My bad.
 
 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"
 
 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability of
   cgroup writeback"
 
 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache index".
 
 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of the
   zeropage in MAP_SHARED mappings.  I don't see any runtime effects here -
   more a cleanup/understandability/maintainablity thing.
 
 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
   higher addresses, for aarch64.  The (poorly named) series is
   "Restructure va_high_addr_switch".
 
 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".
 
 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
   series "Enhance soft hwpoison handling and injection".
 
 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything.  Some landed in this pull.
 
 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
   simplified migration's use of hardware-offload memory copying.
 
 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".
 
 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code.  This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.
 
 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.
 
 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP.  By default this
   is a no-op - users must opt in vis sysfs controls.  Dramatic
   improvements in pagefault latency are realized.
 
 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".
 
 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".
 
 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".
 
 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".
 
 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances.  A 10x speedup is seen in a microbenchmark.
 
   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.
 
 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".
 
 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.
 
 - Is anyone reading this stuff?  If so, email me!
 
 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.
 
 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".
 
 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.
 
 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".
 
 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE".  It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.
 
 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.
 
 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large folio
   userspace copying.
 
 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers.  From SeongJae Park.
 
 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.
 
 - David Hildenbrand sends along more cleanups, this time against the
   migration code.  The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".
 
 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code.  He addresses this in the series "mm: Fix various
   readahead quirks".
 
 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's self
   testing code.
 
 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code.  The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this.  The series is marked cc:stable.
 
 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.
 
 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion.  The series (which also makes the memcg-v1 code
   Kconfigurable) are
 
   "mm: memcg: separate legacy cgroup v1 code and put under config
   option" and
   "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
 
 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.
 
 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of excessive
   correctable memory errors.  In order to permit userspace to monitor and
   handle this situation.
 
 - Kefeng Wang's series "mm: migrate: support poison recover from migrate
   folio" teaches the kernel to appropriately handle migration from
   poisoned source folios rather than simply panicing.
 
 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.
 
 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory utilization.
 
 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
   refcount increments.  So these paes can first be moved aside if they
   reside in the movable zone or a CMA block.
 
 - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
   for much faster reading of vma information.  The series is "query VMAs
   from /proc/<pid>/maps".
 
 - In the series "mm: introduce per-order mTHP split counters" Lance Yang
   improves the kernel's presentation of developer information related to
   multisize THP splitting.
 
 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)".  This permits
   userspace to use all available huge page sizes.
 
 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and not
   very useful feature from slab fault injection.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
 joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
 =z0Lf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Linus Torvalds
3c3ff7be97 powerpc updates for 6.11
- Remove support for 40x CPUs & platforms.
 
  - Add support to the 64-bit BPF JIT for cpu v4 instructions.
 
  - Fix PCI hotplug driver crash on powernv.
 
  - Fix doorbell emulation for KVM on PAPR guests (nestedv2).
 
  - Fix KVM nested guest handling of some less used SPRs.
 
  - Online NUMA nodes with no CPU/memory if they have a PCI device attached.
 
  - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels.
 
  - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE.
 
 Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King,
 Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani,
 Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski,
 Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm),
 Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy
 Pearson, Uwe Kleine-König, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaaUNITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDA+D/4o7OZ+SY0plTlMKSy3hW/SRXVj/byA
 CCKdizNY+3Rf/+K7KhuLOUPXhZOemLPE0xfKS3ND4mIEKCswzzXqmi6kjPH0qd8q
 qUhkHbt/LNpNJzZOYYw+usaklMTMdZtAl/jD9WEvGwgu2EYHgrujRIq04kEI1b0e
 OPiRnXOZcfevRBepQmYZKHvFlCRRa5vvsQcvLfY64yFqD0AsKTHgIi/48Dn33pb2
 hqHYyV1tZA3uT86Z1TgF1OG83VOSDsgc19Sb2xn14O9aJJ7lD2TOgVa4P4FfBlXA
 TXYYGQwK31ymGVWGcGfebVdC1ECeTem9n28vlk5I0NO9xNgPok/Ov4DAiZ+u1G0E
 3CXRDx9Uz2yPcGBJI2dpxfp2iw83Ad2DtBzAdukMD36xnC7xfrQz+W9SQfbcPJ8e
 I5SMAstWuLNgrX7YkjAOnXh1N41kht/mdV6KHdcMxPc7jOtAD65gUOZcgwYLeXlT
 Av17Ax0PMbiQ1BpFe2KNr/0T9Ba5k5rN7oDSKncDAq4uX8LcZKHj4bSHT9KroT1C
 q+GERspoCYp2VDMO742Jm7KTmQDHsS5y4Q+iSdOR8cQBXF613FaryDxSoJZhg2pf
 C2zIVED13RGcjIFcWlv73iA6QpBsphM+WWFz7mjULyJhxFQwm6BYt+Wy6jFu84oH
 sOgvPH8YyaK2uA==
 =eHVd
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Remove support for 40x CPUs & platforms

 - Add support to the 64-bit BPF JIT for cpu v4 instructions

 - Fix PCI hotplug driver crash on powernv

 - Fix doorbell emulation for KVM on PAPR guests (nestedv2)

 - Fix KVM nested guest handling of some less used SPRs

 - Online NUMA nodes with no CPU/memory if they have a PCI device
   attached

 - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels

 - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE

Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian
King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra,
Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna
Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler,
Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat,
Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and
Vaibhav Jain.

* tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits)
  Documentation/powerpc: Mention 40x is removed
  powerpc: Remove 40x leftovers
  macintosh/therm_windtunnel: fix module unload.
  powerpc: Check only single values are passed to CPU/MMU feature checks
  powerpc/xmon: Fix disassembly CPU feature checks
  powerpc: Drop clang workaround for builtin constant checks
  powerpc64/bpf: jit support for signed division and modulo
  powerpc64/bpf: jit support for sign extended mov
  powerpc64/bpf: jit support for sign extended load
  powerpc64/bpf: jit support for unconditional byte swap
  powerpc64/bpf: jit support for 32bit offset jmp instruction
  powerpc/pci: Hotplug driver bridge support
  pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
  powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC
  powerpc: add missing MODULE_DESCRIPTION() macros
  macintosh/mac_hid: add MODULE_DESCRIPTION()
  KVM: PPC: add missing MODULE_DESCRIPTION() macros
  powerpc/kexec: Use of_property_read_reg()
  powerpc/64s/radix/kfence: map __kfence_pool at page granularity
  powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API
  ...
2024-07-19 21:00:33 -07:00
Jocelyn Falempe
e1a261ba59 printk: Add a short description string to kmsg_dump()
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
callback.
This patch adds a new struct kmsg_dump_detail, that will hold the
reason and description, and pass it to the dump() callback.

To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
function and a macro for backward compatibility.

I've written this for drm_panic, but it can be useful for other
kmsg_dumper.
It allows to see the panic reason, like "sysrq triggered crash"
or "VFS: Unable to mount root fs on xxxx" on the drm panic screen.

v2:
 * Use a struct kmsg_dump_detail to hold the reason and description
   pointer, for more flexibility if we want to add other parameters.
   (Kees Cook)
 * Fix powerpc/nvram_64 build, as I didn't update the forward
   declaration of oops_to_nvram()

Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Kees Cook <kees@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
2024-07-17 12:35:24 +02:00
Masahiro Yamada
73db3abdca init/modpost: conditionally check section mismatch to __meminit*
This reverts commit eb8f689046 ("Use separate sections for __dev/
_cpu/__mem code/data").

Check section mismatch to __meminit* only when CONFIG_MEMORY_HOTPLUG=n.

With this change, the linker script and modpost become simpler, and we
can get rid of the __ref annotations from the memory hotplug code.

[sfr@canb.auug.org.au: remove MEM_KEEP from arch/powerpc/kernel/vmlinux.lds.S]
  Link: https://lkml.kernel.org/r/20240710093213.2aefb25f@canb.auug.org.au
Link: https://lkml.kernel.org/r/20240706160511.2331061-2-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 16:39:52 -07:00
Christophe Leroy
7c44202e36 powerpc/e500: use contiguous PMD instead of hugepd
e500 supports many page sizes among which the following size are
implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G.

On e500, TLB miss for hugepages is exclusively handled by SW even on e6500
which has HW assistance for 4k pages, so there are no constraints like on
the 8xx.

On e500/32, all are at PGD/PMD level and can be handled as cont-PMD.

On e500/64, smaller ones are on PMD while bigger ones are on PUD.  Again,
they can easily be handled as cont-PMD and cont-PUD instead of hugepd.

On e500/32, use the pagesize bits in PTE to know if it is a PMD or a leaf
entry.  This works because the pagesize bits are in the last 12 bits and
page tables are 4k aligned.

On e500/64, use highest bit which is always 1 on PxD (Because PxD contains
virtual address of a kernel memory) and always 0 on PTEs because not all
bits of RPN are used/possible.

Link: https://lkml.kernel.org/r/dd085987816ed2a0c70adb7e34966cb833fc03e1.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:18 -07:00
Christophe Leroy
dc0aa538a9 powerpc/e500: free r10 for FIND_PTE
Move r13 load after the call to FIND_PTE, and use r13 instead of r10 for
storing fault address.  This will allow using r10 freely in FIND_PTE in
following patch to handle hugepage size.

Link: https://lkml.kernel.org/r/a3ee563ad5b13c891a15d3aae6c136c44ce8aa63.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:18 -07:00
Christophe Leroy
276d5affbb powerpc/e500: don't pre-check write access on data TLB error
Don't pre-check write access on read-only pages on data TLB error.

Load the TLB anyway and take a DSI exception when it happens.  This avoids
reading SPRN_ESR at every data TLB error exception.

Link: https://lkml.kernel.org/r/8525518e1657d6032b7e980c1888102828d66950.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:18 -07:00
Christophe Leroy
6b0e82791b powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)
At the time being when CONFIG_PTE_64BIT is selected, PTE entries are 64
bits but PGD entries are still 32 bits.

In order to allow leaf PMD entries, switch the PGD to 64 bits entries.

Link: https://lkml.kernel.org/r/ca85397df02564e5edc3a3c27b55cf43af3e4ef3.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:17 -07:00
Christophe Leroy
0549e76663 powerpc/8xx: rework support for 8M pages using contiguous PTE entries
In order to fit better with standard Linux page tables layout, add support
for 8M pages using contiguous PTE entries in a standard page table.  Page
tables will then be populated with 1024 similar entries and two PMD
entries will point to that page table.

The PMD entries also get a flag to tell it is addressing an 8M page, this
is required for the HW tablewalk assistance.

Link: https://lkml.kernel.org/r/8693d9a0408371043ca63bf9e4a9c140667af63e.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:17 -07:00
Michael Ellerman
0db46aaabe powerpc/64e: drop unused TLB miss handlers
There are two possibilities for book3e_htw_mode, PPC_HTW_E6500 or
PPC_HTW_NONE.

The TLB miss handlers are patched to use, respectively:
  - exc_[data|indstruction]_tlb_miss_e6500_book3e
  - exc_[data|indstruction]_tlb_miss_bolted_book3e

Which means the default handlers are never used.  Remove those, and use
the bolted handlers (PPC_HTW_NONE) by default.

Link: https://lkml.kernel.org/r/9a670adc1771fb1871fba93ace5372f7eadc286f.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:15 -07:00
Michael Ellerman
aca69900d7 powerpc/64e: drop MMU_FTR_TYPE_FSL_E checks in 64-bit code
All 64-bit Book3E have MMU_FTR_TYPE_FSL_E, since A2 was removed, so remove
checks for it in 64-bit only code.

Link: https://lkml.kernel.org/r/2b0b0bc9752e6cece222e4e2050358da70bb631d.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12 15:52:14 -07:00
Linus Torvalds
c6653f49e4 powerpc fixes for 6.10 #4
- Fix unnecessary copy to 0 when kernel is booted at address 0.
 
  - Fix usercopy crash when dumping dtl via debugfs.
 
  - Avoid possible crash when PCI hotplug races with error handling.
 
  - Fix kexec crash caused by scv being disabled before other CPUs call-in.
 
  - Fix powerpc selftests build with USERCFLAGS set.
 
 Thanks to: Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen, Nicholas
 Piggin, Sourabh Jain, Srikar Dronamraju, Vishal Chourasia.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaJyekTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN6zD/0d8lPrWQ3TRkS+jLdhsDfHc+qMW1/N
 DuxPrVJl4qLgvYPEZWAF5+uWuhJurmbTCXNRnUQ5HHfwPtkU77pbTNiQcCAYsy2l
 W35DYE+vqnNNid9hFCgvLoSrGDA0qvcGpMVBVfqjRygOLxpWztmV7S7q9E0CvuWg
 ESXt4HNyPiRVl4ufPam12lmiEDh+PycsD24U6FSjaTxqvd4kwSTyLDLfmI+gTaqx
 1PdzKt0c3g2QhDBoR7cpRaTCRamKRPwqFHANMUAkIXm3fIdHpWOEF03lvTsA0OgA
 0ktzaEUhCPHr6kjAizbybmgXZovh/eoZc9wUd7zCWdSGNiq8FlhsmFuIuScrbQ7k
 YCYz+X/KoqNk2VbxKkDneO6/H2juzu9wzzK5OMcKsVGSWi7+DjBp9FBDiFCfb3VQ
 ZMuc71dOTtA7fDqWDnYtFMtEwrUGpTixE5xPNBzbzIVkKdSjb1H3RLd/mhu7+X/B
 eVjFOPj7mRburIX5M3UllvsdbOiLqjbg6P28JL3qG6qT/OiiQAmF5apKvf1LNvPV
 xgJHGPemlAkVNihg6Xu8+up+wcPuMi13osjA9FZkLUdLXK4O+d3q/K/Rf7TGjT2X
 rBNhCd3lRd6gmpa52ujm5X0f9czEJxMrfy0Ota3L8YFUb7hW8JK6fgcwSispXGSI
 o/JUlQ30K6JAjA==
 =B1pw
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix unnecessary copy to 0 when kernel is booted at address 0

 - Fix usercopy crash when dumping dtl via debugfs

 - Avoid possible crash when PCI hotplug races with error handling

 - Fix kexec crash caused by scv being disabled before other CPUs
   call-in

 - Fix powerpc selftests build with USERCFLAGS set

Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen,
Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia.

* tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix build with USERCFLAGS set
  powerpc/pseries: Fix scv instruction crash with kexec
  powerpc/eeh: avoid possible crash when edev->pdev changes
  powerpc/pseries: Whitelist dtl slub object for copying to userspace
  powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
2024-07-06 18:31:24 -07:00
Krishna Kumar
20ce0c247b powerpc/pci: Hotplug driver bridge support
There is an issue with the hotplug operation when it's done on the
bridge/switch slot. The bridge-port and devices behind the bridge, which
become offline by hot-unplug operation, don't get hot-plugged/enabled
by doing hot-plug operation on that slot. Only the first port of the
bridge gets enabled and the remaining port/devices remain unplugged. The
hot plug/unplug operation is done by the hotplug driver (drivers/pci/
hotplug/pnv_php.c).

This behavior is due to missing code for the switch/bridge. The existing
driver depends on pci_hp_add_devices() function for device enablement.
This function calls pci_scan_slot() on only one device-node/port of the
bridge, not on all the siblings' device-node/port.

The missing code needs to be added which will find all the sibling
device-nodes/bridge-ports and will run explicit pci_scan_slot()
on those. A new function has been added for this purpose
which is invoked from pci_hp_add_devices(). This new function
traverse_siblings_and_scan_slot() gets all the sibling bridge-ports by
traversal and explicitly invokes pci_scan_slot() on them.

Tested-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Signed-off-by: Krishna Kumar <krishnak@linux.ibm.com>
[mpe: Move the code into pci-hotplug.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240701074513.94873-3-krishnak@linux.ibm.com
2024-07-04 23:10:40 +10:00
Jeff Johnson
9c5f64734f powerpc: add missing MODULE_DESCRIPTION() macros
With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/kernel/rtas_flash.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/sysdev/rtc_cmos_setup.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/pseries/papr_scm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/spufs/spufs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_thermal.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cpufreq_spudemand.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_powerbutton.o

Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().

This includes 85xx/t1042rdb_diu.c and chrp/nvram.c which, although
they did not produce a warning with the powerpc allmodconfig
configuration, may cause this warning with other configurations.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240615-md-powerpc-arch-powerpc-v1-1-ba4956bea47a@quicinc.com
2024-07-04 22:39:20 +10:00
Nathan Lynch
7bdd1c6c87 powerpc/prom: Add CPU info to hardware description string later
cur_cpu_spec->cpu_name is appended to ppc_hw_desc before cur_cpu_spec
has taken on its final value. This is illustrated on pseries by
comparing the CPU name as reported at boot ("POWER8E (raw)") to the
contents of /proc/cpuinfo ("POWER8 (architected)"):

  $ dmesg | grep Hardware
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 \
    of:IBM,FW860.50 (SV860_146) hv:phyp pSeries

  $ grep -m 1 ^cpu /proc/cpuinfo
  cpu             : POWER8 (architected), altivec supported

Some 44x models would appear to be affected as well; see
identical_pvr_fixup().

This results in incorrect CPU information in stack dumps --
ppc_hw_desc is an input to dump_stack_set_arch_desc().

Delay gathering the CPU name until after all potential calls to
identify_cpu().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: bd649d40e0 ("powerpc: Add PVR & CPU name to hardware description")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240603-fix-cpu-hwdesc-v1-1-945f2850fcaa@linux.ibm.com
2024-06-28 22:31:00 +10:00
Nathan Lynch
0974d03eb4 powerpc/rtas: Prevent Spectre v1 gadget construction in sys_rtas()
Smatch warns:

  arch/powerpc/kernel/rtas.c:1932 __do_sys_rtas() warn: potential
  spectre issue 'args.args' [r] (local cap)

The 'nargs' and 'nret' locals come directly from a user-supplied
buffer and are used as indexes into a small stack-based array and as
inputs to copy_to_user() after they are subject to bounds checks.

Use array_index_nospec() after the bounds checks to clamp these values
for speculative execution.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240530-sys_rtas-nargs-nret-v1-1-129acddd4d89@linux.ibm.com
2024-06-28 22:28:58 +10:00
Michael Ellerman
7bf5f0562b powerpc: Replace CONFIG_4xx with CONFIG_44x
Replace 4xx usage with 44x, and replace 4xx_SOC with 44x.

Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to
just test BOOKE, because 44x always selects BOOKE.

Retain the CONFIG_4xx symbol, as there are drivers that use it to mean
4xx || 44x, those will need updating before CONFIG_4xx can be removed.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-6-mpe@ellerman.id.au
2024-06-28 22:28:48 +10:00
Michael Ellerman
002b27a51b powerpc/4xx: Remove CONFIG_BOOKE_OR_40x
Now that 40x is gone, replace CONFIG_BOOKE_OR_40x by CONFIG_BOOKE.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-5-mpe@ellerman.id.au
2024-06-28 22:28:48 +10:00
Christophe Leroy
732b32daef powerpc: Remove core support for 40x
Now that 40x platforms have gone, remove support
for 40x in the core of powerpc arch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-4-mpe@ellerman.id.au
2024-06-28 22:28:47 +10:00
Shivaprasad G Bhat
f431a8cde7 powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA
windows for PEs, their ownership transfer, create/set/unset the TCE
tables for the Dynamic DMA wundows(DDW). VFIOS uses these APIs for
support on POWER.

The commit 9d67c94335 ("powerpc/iommu: Add "borrowing"
iommu_table_group_ops") implemented partial support for this API with
"borrow" mechanism wherein the DMA windows if created already by the
host driver, they would be available for VFIO to use. Also, it didn't
have the support to control/modify the window size or the IO page
size.

The current patch implements all the necessary iommu_table_group_ops
APIs there by avoiding the "borrrowing". So, just the way it is on the
PowerNV platform, with this patch the iommu table group ownership is
transferred to the VFIO PPC subdriver, the iommu table, DMA windows
creation/deletion all driven through the APIs.

The pSeries uses the query-pe-dma-window, create-pe-dma-window and
reset-pe-dma-window RTAS calls for DMA window creation, deletion and
reset to defaul. The RTAs calls do show some minor differences to the
way things are to be handled on the pSeries which are listed below.

* On pSeries, the default DMA window size is "fixed" cannot be custom
sized as requested by the user. For non-SRIOV VFs, It is fixed at 2GB
and for SRIOV VFs, its variable sized based on the capacity assigned
to it during the VF assignment to the LPAR. So, for the  default DMA
window alone the size if requested less than tce32_size, the smaller
size is enforced using the iommu table->it_size.

* The DMA start address for 32-bit window is 0, and for the 64-bit
window in case of PowerNV is hardcoded to TVE select (bit 59) at 512PiB
offset. This address is returned at the time of create_table() API call
(even before the window is created), the subsequent set_window() call
actually opens the DMA window. On pSeries, the DMA start address for
32-bit window is known from the 'ibm,dma-window' DT property. However,
the 64-bit window start address is not known until the create-pe-dma
RTAS call is made. So, the create_table() which returns the DMA window
start address actually opens the DMA window and returns the DMA start
address as returned by the Hypervisor for the create-pe-dma RTAS call.

* The reset-pe-dma RTAS call resets the DMA windows and restores the
default DMA window, however it does not clear the TCE table entries
if there are any. In case of ownership transfer from platform domain
which used direct mapping, the patch chooses remove-pe-dma instead of
reset-pe for the 64-bit window intentionally so that the
clear_dma_window() is called.

Other than the DMA window management changes mentioned above, the
patch also brings back the userspace view for the single level TCE
as it existed before commit 090bad39b2 ("powerpc/powernv: Add
indirect levels to it_userspace") along with the relavent
refactoring.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923275958.1397.907964437142542242.stgit@linux.ibm.com
2024-06-28 17:03:40 +10:00
Shivaprasad G Bhat
35146eadcb powerpc/iommu: Move dev_has_iommu_table() to iommu.c
Move function dev_has_iommu_table() to powerpc/kernel/iommu.c
as it is going to be used by machine specific iommu code as
well in subsequent patches.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923274748.1397.6274953248403106679.stgit@linux.ibm.com
2024-06-28 17:03:40 +10:00
Shivaprasad G Bhat
b09c031d94 powerpc/iommu: Move pSeries specific functions to pseries/iommu.c
The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c.
The pSeries specific table_group_ops are sitting in the generic powerpc
file. Move it to where it actually belong(pseries/iommu.c).

The functions are currently defined even for CONFIG_PPC_POWERNV
which are unused on PowerNV.

Only code movement, no functional changes intended.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923269701.1397.15758640002786937132.stgit@linux.ibm.com
2024-06-28 17:03:38 +10:00
Arnd Bergmann
b1e31c134a powerpc: restore some missing spu syscalls
A couple of system calls were inadventently removed from the table during
a bugfix for 32-bit powerpc entry. Restore the original behavior.

Fixes: e237506238 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-06-25 15:57:26 +02:00
Arnd Bergmann
d3882564a7 syscalls: fix compat_sys_io_pgetevents_time64 usage
Using sys_io_pgetevents() as the entry point for compat mode tasks
works almost correctly, but misses the sign extension for the min_nr
and nr arguments.

This was addressed on parisc by switching to
compat_sys_io_pgetevents_time64() in commit 6431e92fc8 ("parisc:
io_pgetevents_time64() needs compat syscall in 32-bit compat mode"),
as well as by using more sophisticated system call wrappers on x86 and
s390. However, arm64, mips, powerpc, sparc and riscv still have the
same bug.

Change all of them over to use compat_sys_io_pgetevents_time64()
like parisc already does. This was clearly the intention when the
function was originally added, but it got hooked up incorrectly in
the tables.

Cc: stable@vger.kernel.org
Fixes: 48166e6ea4 ("y2038: add 64-bit time_t syscalls to all 32-bit architectures")
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-06-25 15:57:20 +02:00
Ganesh Goudar
a1216e62d0 powerpc/eeh: avoid possible crash when edev->pdev changes
If a PCI device is removed during eeh_pe_report_edev(), edev->pdev
will change and can cause a crash, hold the PCI rescan/remove lock
while taking a copy of edev->pdev->bus.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240617140240.580453-1-ganeshgr@linux.ibm.com
2024-06-23 11:54:27 +10:00
Jinglin Wen
13fc6c1759 powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
According to the code logic, when the kernel is loaded at address 0, no
copying operation should be performed, but it is currently being done.

This patch fixes the issue where the kernel code was incorrectly
duplicated to address 0 when booting from address 0.

Fixes: b270bebd34 ("powerpc/64s: Run at the kernel virtual address earlier in boot")
Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jinglin Wen <jinglin.wen@shingroup.cn>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240620024150.14857-1-jinglin.wen@shingroup.cn
2024-06-23 11:54:13 +10:00
Jeff Xu
ff388fe5c4 mseal: wire up mseal syscall
Patch series "Introduce mseal", v10.

This patchset proposes a new mseal() syscall for the Linux kernel.

In a nutshell, mseal() protects the VMAs of a given virtual memory range
against modifications, such as changes to their permission bits.

Modern CPUs support memory permissions, such as the read/write (RW) and
no-execute (NX) bits.  Linux has supported NX since the release of kernel
version 2.6.8 in August 2004 [1].  The memory permission feature improves
the security stance on memory corruption bugs, as an attacker cannot
simply write to arbitrary memory and point the code to it.  The memory
must be marked with the X bit, or else an exception will occur. 
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct).  mseal() additionally protects the
VMA itself against modifications of the selected seal type.

Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.  For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.  Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime.  A
similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
[4].  Also, Chrome wants to adopt this feature for their CFI work [2] and
this patchset has been designed to be compatible with the Chrome use case.

Two system calls are involved in sealing the map:  mmap() and mseal().

The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5].  Chrome browser in ChromeOS will be the first user of this
API.

Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications.  For example, in the
case of libc, sealing is only applied to read-only (RO) or read-execute
(RX) memory segments (such as .text and .RELRO) to prevent them from
becoming writable, the lifetime of those mappings are tied to the lifetime
of the process.

Chrome wants to seal two large address space reservations that are managed
by different allocators.  The memory is mapped RW- and RWX respectively
but write access to it is restricted using pkeys (or in the future ARM
permission overlay extensions).  The lifetime of those mappings are not
tied to the lifetime of the process, therefore, while the memory is
sealed, the allocators still need to free or discard the unused memory. 
For example, with madvise(DONTNEED).

However, always allowing madvise(DONTNEED) on this range poses a security
risk.  For example if a jump instruction crosses a page boundary and the
second page gets discarded, it will overwrite the target bytes with zeros
and change the control flow.  Checking write-permission before the discard
operation allows us to control when the operation is valid.  In this case,
the madvise will only succeed if the executing thread has PKEY write
permissions and PKRU changes are protected in software by control-flow
integrity.

Although the initial version of this patch series is targeting the Chrome
browser as its first user, it became evident during upstream discussions
that we would also want to ensure that the patch set eventually is a
complete solution for memory sealing and compatible with other use cases. 
The specific scenario currently in mind is glibc's use case of loading and
sealing ELF executables.  To this end, Stephen is working on a change to
glibc to add sealing support to the dynamic linker, which will seal all
non-writable segments at startup.  Once this work is completed, all
applications will be able to automatically benefit from these new
protections.

In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental in
shaping this patch:

Jann Horn: raising awareness and providing valuable insights on the
  destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.

To measure the performance impact of this loop, two tests are developed.
[8]

The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.

The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
    create 1000 mappings (1 page per VMA)
    start the sampling
    for (j = 0; j < 1000, j++)
        mprotect one mapping
    stop and save the sample
    delete 1000 mappings
calculates all samples.

Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.

Based on the latest upstream code:
The first test (measuring time)
syscall__	vmas	t	t_mseal	delta_ns	per_vma	%
munmap__  	1	909	944	35	35	104%
munmap__  	2	1398	1502	104	52	107%
munmap__  	4	2444	2594	149	37	106%
munmap__  	8	4029	4323	293	37	107%
munmap__  	16	6647	6935	288	18	104%
munmap__  	32	11811	12398	587	18	105%
mprotect	1	439	465	26	26	106%
mprotect	2	1659	1745	86	43	105%
mprotect	4	3747	3889	142	36	104%
mprotect	8	6755	6969	215	27	103%
mprotect	16	13748	14144	396	25	103%
mprotect	32	27827	28969	1142	36	104%
madvise_	1	240	262	22	22	109%
madvise_	2	366	442	76	38	121%
madvise_	4	623	751	128	32	121%
madvise_	8	1110	1324	215	27	119%
madvise_	16	2127	2451	324	20	115%
madvise_	32	4109	4642	534	17	113%

The second test (measuring cpu cycle)
syscall__	vmas	cpu	cmseal	delta_cpu	per_vma	%
munmap__	1	1790	1890	100	100	106%
munmap__	2	2819	3033	214	107	108%
munmap__	4	4959	5271	312	78	106%
munmap__	8	8262	8745	483	60	106%
munmap__	16	13099	14116	1017	64	108%
munmap__	32	23221	24785	1565	49	107%
mprotect	1	906	967	62	62	107%
mprotect	2	3019	3203	184	92	106%
mprotect	4	6149	6569	420	105	107%
mprotect	8	9978	10524	545	68	105%
mprotect	16	20448	21427	979	61	105%
mprotect	32	40972	42935	1963	61	105%
madvise_	1	434	497	63	63	115%
madvise_	2	752	899	147	74	120%
madvise_	4	1313	1513	200	50	115%
madvise_	8	2271	2627	356	44	116%
madvise_	16	4312	4883	571	36	113%
madvise_	32	8376	9319	943	29	111%

Based on the result, for 6.8 kernel, sealing check adds
20-40 nano seconds, or around 50-100 CPU cycles, per VMA.

In addition, I applied the sealing to 5.10 kernel:
The first test (measuring time)
syscall__	vmas	t	tmseal	delta_ns	per_vma	%
munmap__	1	357	390	33	33	109%
munmap__	2	442	463	21	11	105%
munmap__	4	614	634	20	5	103%
munmap__	8	1017	1137	120	15	112%
munmap__	16	1889	2153	263	16	114%
munmap__	32	4109	4088	-21	-1	99%
mprotect	1	235	227	-7	-7	97%
mprotect	2	495	464	-30	-15	94%
mprotect	4	741	764	24	6	103%
mprotect	8	1434	1437	2	0	100%
mprotect	16	2958	2991	33	2	101%
mprotect	32	6431	6608	177	6	103%
madvise_	1	191	208	16	16	109%
madvise_	2	300	324	24	12	108%
madvise_	4	450	473	23	6	105%
madvise_	8	753	806	53	7	107%
madvise_	16	1467	1592	125	8	108%
madvise_	32	2795	3405	610	19	122%
					
The second test (measuring cpu cycle)
syscall__	nbr_vma	cpu	cmseal	delta_cpu	per_vma	%
munmap__	1	684	715	31	31	105%
munmap__	2	861	898	38	19	104%
munmap__	4	1183	1235	51	13	104%
munmap__	8	1999	2045	46	6	102%
munmap__	16	3839	3816	-23	-1	99%
munmap__	32	7672	7887	216	7	103%
mprotect	1	397	443	46	46	112%
mprotect	2	738	788	50	25	107%
mprotect	4	1221	1256	35	9	103%
mprotect	8	2356	2429	72	9	103%
mprotect	16	4961	4935	-26	-2	99%
mprotect	32	9882	10172	291	9	103%
madvise_	1	351	380	29	29	108%
madvise_	2	565	615	49	25	109%
madvise_	4	872	933	61	15	107%
madvise_	8	1508	1640	132	16	109%
madvise_	16	3078	3323	245	15	108%
madvise_	32	5893	6704	811	25	114%

For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
CPU cycles, there is even decrease in some cases.

It might be interesting to compare 5.10 and 6.8 kernel
The first test (measuring time)
syscall__	vmas	t_5_10	t_6_8	delta_ns	per_vma	%
munmap__	1	357	909	552	552	254%
munmap__	2	442	1398	956	478	316%
munmap__	4	614	2444	1830	458	398%
munmap__	8	1017	4029	3012	377	396%
munmap__	16	1889	6647	4758	297	352%
munmap__	32	4109	11811	7702	241	287%
mprotect	1	235	439	204	204	187%
mprotect	2	495	1659	1164	582	335%
mprotect	4	741	3747	3006	752	506%
mprotect	8	1434	6755	5320	665	471%
mprotect	16	2958	13748	10790	674	465%
mprotect	32	6431	27827	21397	669	433%
madvise_	1	191	240	49	49	125%
madvise_	2	300	366	67	33	122%
madvise_	4	450	623	173	43	138%
madvise_	8	753	1110	357	45	147%
madvise_	16	1467	2127	660	41	145%
madvise_	32	2795	4109	1314	41	147%

The second test (measuring cpu cycle)
syscall__	vmas	cpu_5_10	c_6_8	delta_cpu	per_vma	%
munmap__	1	684	1790	1106	1106	262%
munmap__	2	861	2819	1958	979	327%
munmap__	4	1183	4959	3776	944	419%
munmap__	8	1999	8262	6263	783	413%
munmap__	16	3839	13099	9260	579	341%
munmap__	32	7672	23221	15549	486	303%
mprotect	1	397	906	509	509	228%
mprotect	2	738	3019	2281	1140	409%
mprotect	4	1221	6149	4929	1232	504%
mprotect	8	2356	9978	7622	953	423%
mprotect	16	4961	20448	15487	968	412%
mprotect	32	9882	40972	31091	972	415%
madvise_	1	351	434	82	82	123%
madvise_	2	565	752	186	93	133%
madvise_	4	872	1313	442	110	151%
madvise_	8	1508	2271	763	95	151%
madvise_	16	3078	4312	1234	77	140%
madvise_	32	5893	8376	2483	78	142%

From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.

In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten times
greater for munmap and mprotect.

When I discuss the mm performance with Brian Makin, an engineer who worked
on performance, it was brought to my attention that such performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service.  Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different.  It might be best to
take this data with a grain of salt.


This patch (of 5):

Wire up mseal syscall for all architectures.

Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com> [Bug #2]
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Linus Torvalds
3eb3c33c1d asm-generic cleanups for 6.10
These are a few cross-architecture cleanup patches:
 
  - Thomas Zimmermann works on separating fbdev support from the asm/video.h
    contents that may be used by either the old fbdev drivers or the
    newer drm display code.
 
  - Thorsten Blum contributes cleanups for the generic bitops code
    and asm-generic/bug.h
 
  - I remove the orphaned include/asm-generic/page.h header that used to
    included by long-removed mmu-less architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZLvewACgkQYKtH/8kJ
 UicUEQ//b5WVLOVXkFGlQvAaZkagOLEF8xSTnchA7aKrWQ/C6hSwLN6CQU6MAY7j
 Fe54jYQtjwBwpVIj3jn20xiXP/pZbQp9aldkOx4v8YoGnjNF5UWLHm5510DV1ecE
 0LF/2YIH25vIXGY6MVm6sFq+nkDgWZee6fBFNc3GsCu2y0biD1Gob9xH/ngCHjIj
 tw9KS/j6MivPy/9vJ/Ml2YeutV6+pUA9hNmSrbSVlXSWFh3Wq6IZ+j6bNEftqtZY
 xdnYwdVfReOCIayq6hSHhAgIp/uw8JOqLuE2JNwG/9sSF4zp4ZHLvTaMhqEoCpyB
 3kZYd1qQTwV3eL5PyYtRcW03KvbhfZpMPzZT+wbl9SNPUljC2MSVeSFF30Uqatgb
 yUJ9d/vlb1ynu1yQrFfTZ/kK+U0pPByydwLybcMtEIZ6Hrb1h/eRicvHhUx7bKUB
 H9z/FN/TxGY+tPradx2lqm3J1wNu0ox8DUreXjtlJijKIUZQeAkJrGJgr6i6XLBz
 crwgKzuQUClzEjBcoWzuTVUB7v19jaDuHMsaBBu8O9f1g5FnEIJlItqnXf1J0Dno
 rJy68Mxsg4Dzt4YI3lpOJGDDDPhpOTBXfgsjkuru2MrdFMgZQh+DYLl3qOkJ4DJe
 rdiEJb9PygBaGGQnoXO71oOLf5yQuenj+Fg5GIe9AQrci5fXwRQ=
 =riCs
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "These are a few cross-architecture cleanup patches:

   - separate out fbdev support from the asm/video.h contents that may
     be used by either the old fbdev drivers or the newer drm display
     code (Thomas Zimmermann)

   - cleanups for the generic bitops code and asm-generic/bug.h
     (Thorsten Blum)

   - remove the orphaned include/asm-generic/page.h header that used to
     be included by long-removed mmu-less architectures (me)"

* tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: Fix name collision with ACPI's video.o
  bug: Improve comment
  asm-generic: remove unused asm-generic/page.h
  arch: Rename fbdev header and source files
  arch: Remove struct fb_info from video helpers
  arch: Select fbdev helpers with CONFIG_VIDEO
  bitops: Change function return types from long to int
2024-05-20 15:18:34 -07:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Linus Torvalds
ff9a79307f Kbuild updates for v6.10
- Avoid 'constexpr', which is a keyword in C23
 
  - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
    'dt_binding_check'
 
  - Fix weak references to avoid GOT entries in position-independent
    code generation
 
  - Convert the last use of 'optional' property in arch/sh/Kconfig
 
  - Remove support for the 'optional' property in Kconfig
 
  - Remove support for Clang's ThinLTO caching, which does not work with
    the .incbin directive
 
  - Change the semantics of $(src) so it always points to the source
    directory, which fixes Makefile inconsistencies between upstream and
    downstream
 
  - Fix 'make tar-pkg' for RISC-V to produce a consistent package
 
  - Provide reasonable default coverage for objtool, sanitizers, and
    profilers
 
  - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
 
  - Remove the last use of tristate choice in drivers/rapidio/Kconfig
 
  - Various cleanups and fixes in Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
 msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
 GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
 inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
 lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
 JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
 Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
 y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
 orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
 ZCSnZ31h7woGfNho
 =D5B/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Avoid 'constexpr', which is a keyword in C23

 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
   'dt_binding_check'

 - Fix weak references to avoid GOT entries in position-independent code
   generation

 - Convert the last use of 'optional' property in arch/sh/Kconfig

 - Remove support for the 'optional' property in Kconfig

 - Remove support for Clang's ThinLTO caching, which does not work with
   the .incbin directive

 - Change the semantics of $(src) so it always points to the source
   directory, which fixes Makefile inconsistencies between upstream and
   downstream

 - Fix 'make tar-pkg' for RISC-V to produce a consistent package

 - Provide reasonable default coverage for objtool, sanitizers, and
   profilers

 - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.

 - Remove the last use of tristate choice in drivers/rapidio/Kconfig

 - Various cleanups and fixes in Kconfig

* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
  kconfig: use sym_get_choice_menu() in sym_check_prop()
  rapidio: remove choice for enumeration
  kconfig: lxdialog: remove initialization with A_NORMAL
  kconfig: m/nconf: merge two item_add_str() calls
  kconfig: m/nconf: remove dead code to display value of bool choice
  kconfig: m/nconf: remove dead code to display children of choice members
  kconfig: gconf: show checkbox for choice correctly
  kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
  Makefile: remove redundant tool coverage variables
  kbuild: provide reasonable defaults for tool coverage
  modules: Drop the .export_symbol section from the final modules
  kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
  kconfig: use sym_get_choice_menu() in conf_write_defconfig()
  kconfig: add sym_get_choice_menu() helper
  kconfig: turn defaults and additional prompt for choice members into error
  kconfig: turn missing prompt for choice members into error
  kconfig: turn conf_choice() into void function
  kconfig: use linked list in sym_set_changed()
  kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
  kconfig: gconf: remove debug code
  ...
2024-05-18 12:39:20 -07:00
Linus Torvalds
70a663205d Probes updates for v6.10:
- tracing/probes: Adding new pseudo-types %pd and %pD support for dumping
   dentry name from 'struct dentry *' and file name from 'struct file *'.
 
 - uprobes: Some performance optimizations have been done.
  . Speed up the BPF uprobe event by delaying the fetching of the uprobe
    event arguments that are not used in BPF.
  . Avoid locking by speculatively checking whether uprobe event is valid.
  . Reduce lock contention by using read/write_lock instead of spinlock for
    uprobe list operation. This improved BPF uprobe benchmark result 43% on
    average.
 
 - rethook: Removes non-fatal warning messages when tracing stack from BPF
   and skip rcu_is_watching() validation in rethook if possible.
 
 - objpool: Optimizing objpool (which is used by kretprobes and fprobe as
   rethook backend storage) by inlining functions and avoid caching nr_cpu_ids
   because it is a const value.
 
 - fprobe: Add entry/exit callbacks types (code cleanup)
 - kprobes: Check ftrace was killed in kprobes if it uses ftrace.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZFUxsbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+fIH/A96/SeC5WRLhXmHfTCM
 IvKUea2n0b0oV/2pVfHqfkCBTICuUZ97Opd9VH9jLtjBOTh0fUOGZ2DNVGdSYfWm
 IIkS5dhuZxHXrSHEVYykwLHI3AOL7Q6Ny9EmOg1CNMidUkPMNtBvppsBYPlFU/B/
 qQJAvOdkVOnNITCaas0+MNgepoVVKdJzdNQ1I4WrGyG8isCZBaCYKo2QcGyheCNN
 y8NXvnVHgmgHQ8nTaeE5AawclFzFnhwHfPQPe1kiyGrx15b8K+VYmaZxPKv33A1a
 KT3TKJ1Ep7s7iWFh2iPVJzIwOXCmSnvNTKfNx/MDuKtO7UVfFwytoMEaekbmv3bG
 VqM=
 =n/mW
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:

 - tracing/probes: Add new pseudo-types %pd and %pD support for dumping
   dentry name from 'struct dentry *' and file name from 'struct file *'

 - uprobes performance optimizations:
    - Speed up the BPF uprobe event by delaying the fetching of the
      uprobe event arguments that are not used in BPF
    - Avoid locking by speculatively checking whether uprobe event is
      valid
    - Reduce lock contention by using read/write_lock instead of
      spinlock for uprobe list operation. This improved BPF uprobe
      benchmark result 43% on average

 - rethook: Remove non-fatal warning messages when tracing stack from
   BPF and skip rcu_is_watching() validation in rethook if possible

 - objpool: Optimize objpool (which is used by kretprobes and fprobe as
   rethook backend storage) by inlining functions and avoid caching
   nr_cpu_ids because it is a const value

 - fprobe: Add entry/exit callbacks types (code cleanup)

 - kprobes: Check ftrace was killed in kprobes if it uses ftrace

* tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  kprobe/ftrace: bail out if ftrace was killed
  selftests/ftrace: Fix required features for VFS type test case
  objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
  objpool: enable inlining objpool_push() and objpool_pop() operations
  rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
  ftrace: make extra rcu_is_watching() validation check optional
  uprobes: reduce contention on uprobes_tree access
  rethook: Remove warning messages printed for finding return address of a frame.
  fprobe: Add entry/exit callbacks types
  selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
  selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
  Documentation: tracing: add new type '%pd' and '%pD' for kprobe
  tracing/probes: support '%pD' type for print struct file's name
  tracing/probes: support '%pd' type for print struct dentry's name
  uprobes: add speculative lockless system-wide uprobe filter check
  uprobes: prepare uprobe args buffer lazily
  uprobes: encapsulate preparation of uprobe args buffer
2024-05-17 18:29:30 -07:00
Linus Torvalds
ff2632d7d0 powerpc updates for 6.10
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
 
  - Allow per-process DEXCR (Dynamic Execution Control Register) settings via
    prctl, notably NPHIE which controls hashst/hashchk for ROP protection.
 
  - Install powerpc selftests in sub-directories. Note this changes the way
    run_kselftest.sh needs to be invoked for powerpc selftests.
 
  - Change fadump (Firmware Assisted Dump) to better handle memory add/remove.
 
  - Add support for passing additional parameters to the fadump kernel.
 
  - Add support for updating the kdump image on CPU/memory add/remove events.
 
  - Other small features, cleanups and fixes.
 
 Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann,
 Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe
 Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner,
 Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz,
 Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong,
 Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer,
 Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas
 Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta,
 Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
 Jain, Xiaowei Bao, Yang Li, Zhao Chenhui.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm
 /4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b
 aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF
 XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ
 EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9
 Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq
 CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3
 N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC
 Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf
 0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK
 zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS
 kAflTXzsIXvxBA==
 =GoCV
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.

 - Allow per-process DEXCR (Dynamic Execution Control Register) settings
   via prctl, notably NPHIE which controls hashst/hashchk for ROP
   protection.

 - Install powerpc selftests in sub-directories. Note this changes the
   way run_kselftest.sh needs to be invoked for powerpc selftests.

 - Change fadump (Firmware Assisted Dump) to better handle memory
   add/remove.

 - Add support for passing additional parameters to the fadump kernel.

 - Add support for updating the kdump image on CPU/memory add/remove
   events.

 - Other small features, cleanups and fixes.

Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.

* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
  powerpc/fadump: Fix section mismatch warning
  powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
  powerpc/fadump: update documentation about bootargs_append
  powerpc/fadump: pass additional parameters when fadump is active
  powerpc/fadump: setup additional parameters for dump capture kernel
  powerpc/pseries/fadump: add support for multiple boot memory regions
  selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
  KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
  KVM: PPC: Fix documentation for ppc mmu caps
  KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
  KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
  powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
  powerpc/code-patching: Use dedicated memory routines for patching
  powerpc/code-patching: Test patch_instructions() during boot
  powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
  powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
  powerpc: Fix typos
  powerpc/eeh: Fix spelling of the word "auxillary" and update comment
  macintosh/ams: Fix unused variable warning
  powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
  ...
2024-05-17 09:05:46 -07:00
Michael Ellerman
61700f816e powerpc/fadump: Fix section mismatch warning
With some compilers/configs fadump_setup_param_area() isn't inlined into
its caller (which is __init), leading to a section mismatch warning:

  WARNING: modpost: vmlinux: section mismatch in reference:
  fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area)
  -> memblock_phys_alloc_range (section: .init.text)

Fix it by adding an __init annotation.

Fixes: 683eab94da ("powerpc/fadump: setup additional parameters for dump capture kernel")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240515163708.3380c4d1@canb.auug.org.au/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/all/202405140922.oucLOx4Y-lkp@intel.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240516132631.347956-1-mpe@ellerman.id.au
2024-05-16 23:26:44 +10:00
Stephen Brennan
1a7d0890dd kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.

This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:

[1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer

  sudo perf probe --add commit_creds
  sudo perf trace -e probe:commit_creds
  # In another terminal
  make
  sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
  # Back to perf terminal
  # ctrl-c
  sudo perf probe --del commit_creds

After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.

Link: https://lore.kernel.org/all/20240501162956.229427-1-stephen.s.brennan@oracle.com/

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-16 07:23:30 +09:00
Masahiro Yamada
7f7f6f7ad6 Makefile: remove redundant tool coverage variables
Now Kbuild provides reasonable defaults for objtool, sanitizers, and
profilers.

Remove redundant variables.

Note:

This commit changes the coverage for some objects:

  - include arch/mips/vdso/vdso-image.o into UBSAN, GCOV, KCOV
  - include arch/sparc/vdso/vdso-image-*.o into UBSAN
  - include arch/sparc/vdso/vma.o into UBSAN
  - include arch/x86/entry/vdso/extable.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
  - include arch/x86/entry/vdso/vdso-image-*.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
  - include arch/x86/entry/vdso/vdso32-setup.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
  - include arch/x86/entry/vdso/vma.o into GCOV, KCOV
  - include arch/x86/um/vdso/vma.o into KASAN, GCOV, KCOV

I believe these are positive effects because all of them are kernel
space objects.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Roberto Sassu <roberto.sassu@huawei.com>
2024-05-14 23:35:48 +09:00
Mike Rapoport (IBM)
0a956d52e6 powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate
There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14 00:31:44 -07:00
Mike Rapoport (IBM)
0cc2dc4902 arch: make execmem setup available regardless of CONFIG_MODULES
execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14 00:31:44 -07:00
Mike Rapoport (IBM)
1b750c2fbf powerpc: extend execmem_params for kprobes allocations
powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14 00:31:43 -07:00
Mike Rapoport (IBM)
223b5e57d0 mm/execmem, arch: convert remaining overrides of module_alloc to execmem
Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
late initialization of execmem required by arm64.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Tested-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14 00:31:43 -07:00
Mike Rapoport (IBM)
12af2b83d0 mm: introduce execmem_alloc() and execmem_free()
module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14 00:31:43 -07:00
Linus Torvalds
6e5a0c30b6 Scheduler changes for v6.10:
- Add cpufreq pressure feedback for the scheduler
 
  - Rework misfit load-balancing wrt. affinity restrictions
 
  - Clean up and simplify the code around ::overutilized and
    ::overload access.
 
  - Simplify sched_balance_newidle()
 
  - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
    handling that changed the output.
 
  - Rework & clean up <asm/vtime.h> interactions wrt. arch_vtime_task_switch()
 
  - Reorganize, clean up and unify most of the higher level
    scheduler balancing function names around the sched_balance_*()
    prefix.
 
  - Simplify the balancing flag code (sched_balance_running)
 
  - Miscellaneous cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBtA0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gQEw//WiCiV7zTlWShSiG/g8GTfoAvl53QTWXF
 0jQ8TUcoIhxB5VeGgxVG1srYt8f505UXjH7L0MJLrbC3nOgRCg4NK57WiQEachKK
 HORIJHT0tMMsKIwX9D5Ovo4xYJn+j7mv7j/caB+hIlzZAbWk+zZPNWcS84p0ZS/4
 appY6RIcp7+cI7bisNMGUuNZS14+WMdWoX3TgoI6ekgDZ7Ky+kQvkwGEMBXsNElO
 qZOj6yS/QUE4Htwz0tVfd6h5svoPM/VJMIvl0yfddPGurfNw6jEh/fjcXnLdAzZ6
 9mgcosETncQbm0vfSac116lrrZIR9ygXW/yXP5S7I5dt+r+5pCrBZR2E5g7U4Ezp
 GjX1+6J9U6r6y12AMLRjadFOcDvxdwtszhZq4/wAcmS3B9dvupnH/w7zqY9ho3wr
 hTdtDHoAIzxJh7RNEHgeUC0/yQX3wJ9THzfYltDRIIjHTuvl4d5lHgsug+4Y9ClE
 pUIQm/XKouweQN9TZz2ULle4ZhRrR9sM9QfZYfirJ/RppmuKool4riWyQFQNHLCy
 mBRMjFFsTpFIOoZXU6pD4EabOpWdNrRRuND/0yg3WbDat2gBWq6jvSFv2UN1/v7i
 Un5jijTuN7t8yP5lY5Tyf47kQfLlA9bUx1v56KnF9mrpI87FyiDD3MiQVhDsvpGX
 rP96BIOrkSo=
 =obph
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Add cpufreq pressure feedback for the scheduler

 - Rework misfit load-balancing wrt affinity restrictions

 - Clean up and simplify the code around ::overutilized and
   ::overload access.

 - Simplify sched_balance_newidle()

 - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
   handling that changed the output.

 - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()

 - Reorganize, clean up and unify most of the higher level
   scheduler balancing function names around the sched_balance_*()
   prefix

 - Simplify the balancing flag code (sched_balance_running)

 - Miscellaneous cleanups & fixes

* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/pelt: Remove shift of thermal clock
  sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
  thermal/cpufreq: Remove arch_update_thermal_pressure()
  sched/cpufreq: Take cpufreq feedback into account
  cpufreq: Add a cpufreq pressure feedback for the scheduler
  sched/fair: Fix update of rd->sg_overutilized
  sched/vtime: Do not include <asm/vtime.h> header
  s390/irq,nmi: Include <asm/vtime.h> header directly
  s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
  sched/vtime: Get rid of generic vtime_task_switch() implementation
  sched/vtime: Remove confusing arch_vtime_task_switch() declaration
  sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
  sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
  sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
  sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
  sched/fair: Rename root_domain::overload to ::overloaded
  sched/fair: Use helper functions to access root_domain::overload
  sched/fair: Check root_domain::overload value before update
  sched/fair: Combine EAS check with root_domain::overutilized access
  sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
  ...
2024-05-13 17:18:51 -07:00
Hari Bathini
3416c9daa6 powerpc/fadump: pass additional parameters when fadump is active
Append the additional parameters passed/set in the dedicated parameter
area (RTAS_FADUMP_PARAM_AREA) to bootargs in fadump capture kernel.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240509115755.519982-4-hbathini@linux.ibm.com
2024-05-10 16:36:10 +10:00
Hari Bathini
683eab94da powerpc/fadump: setup additional parameters for dump capture kernel
For fadump case, passing additional parameters to dump capture kernel
helps in minimizing the memory footprint for it and also provides the
flexibility to disable components/modules, like hugepages, that are
hindering the boot process of the special dump capture environment.

Set up a dedicated parameter area to be passed to the capture kernel.
This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute
'/sys/kernel/fadump/bootargs_append' is exported to the userspace to
specify the additional parameters to be passed to the capture kernel

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240509115755.519982-3-hbathini@linux.ibm.com
2024-05-10 16:36:10 +10:00