-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A
Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/
R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C
k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo
REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M
6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD
ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3
pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7
mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO
AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl
OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK
CSeqSUK9XmXxFNA=
=xjoS
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Skip E820 checks for MCFG ECAM regions for new (2016+) machines,
since there's no requirement to describe them in E820 and some
platforms require ECAM to work (Bjorn Helgaas)
- Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien
Le Moal)
- Remove last user and pci_enable_device_io() (Heiner Kallweit)
- Wait for Link Training==0 to avoid possible race (Ilpo Järvinen)
- Skip waiting for devices that have been disconnected while
suspended (Ilpo Järvinen)
- Clear Secondary Status errors after enumeration since Master Aborts
and Unsupported Request errors are an expected part of enumeration
(Vidya Sagar)
MSI:
- Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas)
Error handling:
- Mask Genesys GL975x SD host controller Replay Timer Timeout
correctable errors caused by a hardware defect; the errors cause
interrupts that prevent system suspend (Kai-Heng Feng)
- Fix EDR-related _DSM support, which previously evaluated revision 5
but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan)
ASPM:
- Simplify link state definitions and mask calculation (Ilpo
Järvinen)
Power management:
- Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS
apparently doesn't know how to put them back in D0 (Mario
Limonciello)
CXL:
- Support resetting CXL devices; special handling required because
CXL Ports mask Secondary Bus Reset by default (Dave Jiang)
DOE:
- Support DOE Discovery Version 2 (Alexey Kardashevskiy)
Endpoint framework:
- Set endpoint BAR to be 64-bit if the driver says that's all the
device supports, in addition to doing so if the size is >2GB
(Niklas Cassel)
- Simplify endpoint BAR allocation and setting interfaces (Niklas
Cassel)
Cadence PCIe controller driver:
- Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof
Kozlowski)
Cadence PCIe endpoint driver:
- Configure endpoint BARs to be 64-bit based on the BAR type, not the
BAR value (Niklas Cassel)
Freescale Layerscape PCIe controller driver:
- Convert DT binding to YAML (Frank Li)
MediaTek MT7621 PCIe controller driver:
- Add DT binding missing 'reg' property for child Root Ports
(Krzysztof Kozlowski)
- Fix theoretical string truncation in PHY name (Sergio Paracuellos)
NVIDIA Tegra194 PCIe controller driver:
- Return success for endpoint probe instead of falling through to the
failure path (Vidya Sagar)
Renesas R-Car PCIe controller driver:
- Add DT binding missing IOMMU properties (Geert Uytterhoeven)
- Add DT binding R-Car V4H compatible for host and endpoint mode
(Yoshihiro Shimoda)
Rockchip PCIe controller driver:
- Configure endpoint BARs to be 64-bit based on the BAR type, not the
BAR value (Niklas Cassel)
- Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski)
- Set the Subsystem Vendor ID, which was previously zero because it
was masked incorrectly (Rick Wertenbroek)
Synopsys DesignWare PCIe controller driver:
- Restructure DBI register access to accommodate devices where this
requires Refclk to be active (Manivannan Sadhasivam)
- Remove the deinit() callback, which was only need by the
pcie-rcar-gen4, and do it directly in that driver (Manivannan
Sadhasivam)
- Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean
up things like eDMA (Manivannan Sadhasivam)
- Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel
to dw_pcie_ep_init() (Manivannan Sadhasivam)
- Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to
reflect the actual functionality (Manivannan Sadhasivam)
- Call dw_pcie_ep_init_registers() directly from all the glue
drivers, not just those that require active Refclk from the host
(Manivannan Sadhasivam)
- Remove the "core_init_notifier" flag, which was an obscure way for
glue drivers to indicate that they depend on Refclk from the host
(Manivannan Sadhasivam)
TI J721E PCIe driver:
- Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli)
- Add DT binding J722S SoC support (Siddharth Vadapalli)
TI Keystone PCIe controller driver:
- Add DT binding missing num-viewport, phys and phy-name properties
(Jan Kiszka)
Miscellaneous:
- Constify and annotate with __ro_after_init (Heiner Kallweit)
- Convert DT bindings to YAML (Krzysztof Kozlowski)
- Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming
Zhou)"
* tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
PCI: Do not wait for disconnected devices when resuming
x86/pci: Skip early E820 check for ECAM region
PCI: Remove unused pci_enable_device_io()
ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
PCI: Update pci_find_capability() stub return types
PCI: Remove PCI_IRQ_LEGACY
scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios
Revert "genirq/msi: Provide constants for PCI/IMS support"
Revert "x86/apic/msi: Enable PCI/IMS"
Revert "iommu/vt-d: Enable PCI/IMS"
Revert "iommu/amd: Enable PCI/IMS"
Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
...
Including:
- Core:
- IOMMU memory usage observability - This will make the memory used
for IO page tables explicitly visible.
- Simplify arch_setup_dma_ops()
- Intel VT-d:
- Consolidate domain cache invalidation
- Remove private data from page fault message
- Allocate DMAR fault interrupts locally
- Cleanup and refactoring
- ARM-SMMUv2:
- Support for fault debugging hardware on Qualcomm implementations
- Re-land support for the ->domain_alloc_paging() callback
- ARM-SMMUv3:
- Improve handling of MSI allocation failure
- Drop support for the "disable_bypass" cmdline option
- Major rework of the CD creation code, following on directly from the
STE rework merged last time around.
- Add unit tests for the new STE/CD manipulation logic
- AMD-Vi:
- Final part of SVA changes with generic IO page fault handling
- Renesas IPMMU:
- Add support for R8A779H0 hardware
- A couple smaller fixes and updates across the sub-tree
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmZHJMkACgkQK/BELZcB
GuND1Q/+M4RN5jM66XCfhqoP8QaI8I7zDlPDd14ismx0bjtOZhoiXpptKkAA8guo
7mS57MLqBw/hKYucm1mw+F1qi1HnRWSstKXiCPmzDm3UXYgZJlKkrOw6vydFeHJH
zx2ei7TmBrc0SrsybWK3NWRfVBBkO8enGZTmti0DfHL/rOFcUM0LHegY51GcDaaH
SlDr+LLDMeGynSQWhRlVNJVmEI5gpVPitY/mDUpVPoELiW9C0WGk8kPlR11z2pCR
eUNiqGJUcGasOhmfiYnpJR462eg7J41glquu+YHj8ivPbbu3C4wxgruY/tR4dmJG
8s6AMAWR53JzG2SrCCwtzyRPSXmKfvixF+VKmlB2Ksc7VAn1xA0DYnY5Tx99EtXu
qcEaR4SICMti0urmBGo/cGFdXi2TB1ccXqwoRtp1N3KiYnnOaQdLNO9qZdl9uUTI
uleXACzkCVSssSpBfGjFcPyHU4r3WjMfX0f5ZJPpFMoQmvwV1yeMX7xTEZz4Sxew
cHfBt9FAW9+4mBMTQfokBt0hZ6jwKcYl/z3Xi2oD+Ik/Qrzx5kcLA8LZLEVRXIBa
SZh2ASazq/dr8YoZ744VRmlmi+nISAIHbbQMeqQEQgYQh0HpwS9g5HtpsBzNP6aB
91RHqZSccb/zNdi8e+RH79Y7pX/G5QcuVKcW6KQUBcAAb6hAgOg=
=JUzp
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core:
- IOMMU memory usage observability - This will make the memory used
for IO page tables explicitly visible.
- Simplify arch_setup_dma_ops()
Intel VT-d:
- Consolidate domain cache invalidation
- Remove private data from page fault message
- Allocate DMAR fault interrupts locally
- Cleanup and refactoring
ARM-SMMUv2:
- Support for fault debugging hardware on Qualcomm implementations
- Re-land support for the ->domain_alloc_paging() callback
ARM-SMMUv3:
- Improve handling of MSI allocation failure
- Drop support for the "disable_bypass" cmdline option
- Major rework of the CD creation code, following on directly from
the STE rework merged last time around.
- Add unit tests for the new STE/CD manipulation logic
AMD-Vi:
- Final part of SVA changes with generic IO page fault handling
Renesas IPMMU:
- Add support for R8A779H0 hardware
... and a couple smaller fixes and updates across the sub-tree"
* tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits)
iommu/arm-smmu-v3: Make the kunit into a module
arm64: Properly clean up iommu-dma remnants
iommu/amd: Enable Guest Translation after reading IOMMU feature register
iommu/vt-d: Decouple igfx_off from graphic identity mapping
iommu/amd: Fix compilation error
iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
iommu/arm-smmu-v3: Move the CD generation for SVA into a function
iommu/arm-smmu-v3: Allocate the CD table entry in advance
iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
iommu/arm-smmu-v3: Consolidate clearing a CD table entry
iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
iommu/arm-smmu-v3: Add an ops indirection to the STE code
iommu/arm-smmu-qcom: Don't build debug features as a kernel module
iommu/amd: Add SVA domain support
iommu: Add ops->domain_alloc_sva()
iommu/amd: Initial SVA support for AMD IOMMU
iommu/amd: Add support for enable/disable IOPF
iommu/amd: Add IO page fault notifier handler
...
This reverts commit 810531a1af.
IMS (Interrupt Message Store) support appeared in v6.2, but there are no
users yet.
Remove it for now. We can add it back when a user comes along.
Link: https://lore.kernel.org/r/20240410221307.2162676-6-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
With posted MSI feature enabled on the CPU side, iommu interrupt
remapping table entries (IRTEs) for device MSI/x can be allocated,
activated, and programed in posted mode. This means that IRTEs are
linked with their respective PIDs of the target CPU.
Handlers for the posted MSI notification vector will de-multiplex
device MSI handlers. CPU notifications are coalesced if interrupts
arrive at a high frequency.
Posted interrupts are only used for device MSI and not for legacy devices
(IO/APIC, HPET).
Introduce a new irq_chip for posted MSIs, which has a dummy irq_ack()
callback as EOI is performed in the notification handler once.
When posted MSI is enabled, MSI domain/chip hierarchy will look like
this example:
domain: IR-PCI-MSIX-0000:50:00.0-12
hwirq: 0x29
chip: IR-PCI-MSIX-0000:50:00.0
flags: 0x430
IRQCHIP_SKIP_SET_WAKE
IRQCHIP_ONESHOT_SAFE
parent:
domain: INTEL-IR-10-13
hwirq: 0x2d0000
chip: INTEL-IR-POST
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x77
chip: APIC
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423174114.526704-13-jacob.jun.pan@linux.intel.com
In order to improve observability and accountability of IOMMU layer, we
must account the number of pages that are allocated by functions that
are calling directly into buddy allocator.
This is achieved by first wrapping the allocation related functions into a
separate inline functions in new file:
drivers/iommu/iommu-pages.h
Convert all page allocation calls under iommu/intel to use these new
functions.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240413002522.1101315-2-pasha.tatashin@soleen.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read
exactly once. Fold the constant in uv_program_mmr() and drop the field.
Searching for the origin of the stale HyperV comment reveals commit
a31e58e129 ("x86/apic: Switch all APICs to Fixed delivery mode") which
notes:
As a consequence of this change, the apic::irq_delivery_mode field is
now pointless, but this needs to be cleaned up in a separate patch.
6 years is long enough for this technical debt to have survived.
[ bp: Fold in
https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com
]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
Rename send_cleanup_vector() to vector_schedule_cleanup() to prepare for
replacing the vector cleanup IPI with a timer callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20230621171248.6805-2-xin3.li@intel.com
There is no need to use GFP_ATOMIC here. GFP_KERNEL is already used for
some other memory allocations just a few lines above.
Commit e3a981d61d ("iommu/vt-d: Convert allocations to GFP_KERNEL") has
changed the other memory allocation flags.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/e2a8a1019ffc8a86b4b4ed93def3623f60581274.1675542576.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2 #468 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
at: __irq_domain_alloc_irqs+0x3b/0xa0
but task is already holding lock:
ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
at: intel_iommu_init+0x58e/0x880
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (dmar_global_lock){++++}-{3:3}:
lock_acquire+0xd6/0x320
down_read+0x42/0x180
intel_irq_remapping_alloc+0xad/0x750
mp_irqdomain_alloc+0xb8/0x2b0
irq_domain_alloc_irqs_locked+0x12f/0x2d0
__irq_domain_alloc_irqs+0x56/0xa0
alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
mp_map_pin_to_irq+0x1dc/0x330
setup_IO_APIC+0x128/0x210
apic_intr_mode_init+0x67/0x110
x86_late_time_init+0x24/0x40
start_kernel+0x41e/0x7e0
secondary_startup_64_no_verify+0xe0/0xeb
-> #0 (&domain->mutex){+.+.}-{3:3}:
check_prevs_add+0x160/0xef0
__lock_acquire+0x147d/0x1950
lock_acquire+0xd6/0x320
__mutex_lock+0x9c/0xfc0
__irq_domain_alloc_irqs+0x3b/0xa0
dmar_alloc_hwirq+0x9e/0x120
iommu_pmu_register+0x11d/0x200
intel_iommu_init+0x5de/0x880
pci_iommu_init+0x12/0x40
do_one_initcall+0x65/0x350
kernel_init_freeable+0x3ca/0x610
kernel_init+0x1a/0x140
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(dmar_global_lock);
lock(&domain->mutex);
lock(dmar_global_lock);
lock(&domain->mutex);
*** DEADLOCK ***
Fixes: 9dbb8e3452 ("irqdomain: Switch to per-domain locking")
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230314051836.23817-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On x86 platforms when the HW can support interrupt remapping the iommu
driver creates an irq_domain for the IR hardware and creates a child MSI
irq_domain.
When the global irq_remapping_enabled is set, the IR MSI domain is
assigned to the PCI devices (by intel_irq_remap_add_device(), or
amd_iommu_set_pci_msi_domain()) making those devices have the isolated MSI
property.
Due to how interrupt domains work, setting IRQ_DOMAIN_FLAG_ISOLATED_MSI on
the parent IR domain will cause all struct devices attached to it to
return true from msi_device_has_isolated_msi(). This replaces the
IOMMU_CAP_INTR_REMAP flag as all places using IOMMU_CAP_INTR_REMAP also
call msi_device_has_isolated_msi()
Set the flag and delete the cap.
Link: https://lore.kernel.org/r/7-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* Randomize the per-cpu entry areas
Cleanups:
* Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
coding it
* Move to "native" set_memory_rox() helper
* Clean up pmd_get_atomic() and i386-PAE
* Remove some unused page table size macros
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
+PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
=wwVF
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Dave Hansen:
"New Feature:
- Randomize the per-cpu entry areas
Cleanups:
- Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it
- Move to "native" set_memory_rox() helper
- Clean up pmd_get_atomic() and i386-PAE
- Remove some unused page table size macros"
* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
x86/mm: Ensure forced page table splitting
x86/kasan: Populate shadow for shared chunk of the CPU entry area
x86/kasan: Add helpers to align shadow addresses up and down
x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
x86/mm: Recompute physical address for every page of per-CPU CEA mapping
x86/mm: Rename __change_page_attr_set_clr(.checkalias)
x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
x86/mm: Add a few comments
x86/mm: Fix CR3_ADDR_MASK
x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
mm: Convert __HAVE_ARCH_P..P_GET to the new style
mm: Remove pointless barrier() after pmdp_get_lockless()
x86/mm/pae: Get rid of set_64bit()
x86_64: Remove pointless set_64bit() usage
x86/mm/pae: Be consistent with pXXp_get_and_clear()
x86/mm/pae: Use WRITE_ONCE()
x86/mm/pae: Don't (ab)use atomic64
mm/gup: Fix the lockless PMD access
...
The use of set_64bit() in X86_64 only code is pretty pointless, seeing
how it's a direct assignment. Remove all this nonsense.
[nathanchance: unbreak irte]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.168036718%40infradead.org
PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.
Virtualized IOMMUs need additional support, e.g. for PASID.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.081482253@linutronix.de
Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.
The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.
None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
Now that the PCI/MSI core code does early checking for multi-MSI support
X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.
Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
Some VT-d hardware implementations invalidate all interrupt remapping
hardware translation caches as part of SIRTP flow. The VT-d spec adds
a ESIRTPS (Enhanced Set Interrupt Remap Table Pointer Support, section
11.4.2 in VT-d spec) capability bit to indicate this.
The spec also states in 11.4.4 that hardware also performs global
invalidation on all interrupt remapping caches as part of Interrupt
Remapping Disable operation if ESIRTPS capability bit is set.
This checks the ESIRTPS capability bit and skip software global cache
invalidation if it's set.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220921065741.3572495-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This header file is private to the Intel IOMMU driver. Move it to the
driver folder.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220514014322.2927339-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After commit e3beca48a4 ("irqdomain/treewide: Keep firmware node
unconditionally allocated"). For tear down scenario, fn is only freed
after fail to allocate ir_domain, though it also should be freed in case
dmar_enable_qi returns error.
Besides free fn, irq_domain and ir_msi_domain need to be removed as well
if intel_setup_irq_remapping fails to enable queued invalidation.
Improve the rewinding path by add out_free_ir_domain and out_free_fwnode
lables per Baolu's suggestion.
Fixes: e3beca48a4 ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20220119063640.16864-1-guoqing.jiang@linux.dev
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220128031002.2219155-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
VMD retransmits child device MSI-X with the VMD endpoint's requester-id.
In order to support direct interrupt remapping of VMD child devices,
ensure that the IRTE is programmed with the VMD endpoint's requester-id
using pci_real_dma_dev().
Link: https://lore.kernel.org/r/20210210161315.316097-2-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Audit IOMMU Capability/Extended Capability and check if the IOMMUs have
the consistent value for features. Report out or scale to the lowest
supported when IOMMU features have incompatibility among IOMMUs.
Report out features when below features are mismatched:
- First Level 5 Level Paging Support (FL5LP)
- First Level 1 GByte Page Support (FL1GP)
- Read Draining (DRD)
- Write Draining (DWD)
- Page Selective Invalidation (PSI)
- Zero Length Read (ZLR)
- Caching Mode (CM)
- Protected High/Low-Memory Region (PHMR/PLMR)
- Required Write-Buffer Flushing (RWBF)
- Advanced Fault Logging (AFL)
- RID-PASID Support (RPS)
- Scalable Mode Page Walk Coherency (SMPWC)
- First Level Translation Support (FLTS)
- Second Level Translation Support (SLTS)
- No Write Flag Support (NWFS)
- Second Level Accessed/Dirty Support (SLADS)
- Virtual Command Support (VCS)
- Scalable Mode Translation Support (SMTS)
- Device TLB Invalidation Throttle (DIT)
- Page Drain Support (PDS)
- Process Address Space ID Support (PASID)
- Extended Accessed Flag Support (EAFS)
- Supervisor Request Support (SRS)
- Execute Request Support (ERS)
- Page Request Support (PRS)
- Nested Translation Support (NEST)
- Snoop Control (SC)
- Pass Through (PT)
- Device TLB Support (DT)
- Queued Invalidation (QI)
- Page walk Coherency (C)
Set capability to the lowest supported when below features are mismatched:
- Maximum Address Mask Value (MAMV)
- Number of Fault Recording Registers (NFR)
- Second Level Large Page Support (SLLPS)
- Fault Recording Offset (FRO)
- Maximum Guest Address Width (MGAW)
- Supported Adjusted Guest Address Width (SAGAW)
- Number of Domains supported (NDOMS)
- Pasid Size Supported (PSS)
- Maximum Handle Mask Value (MHMV)
- IOTLB Register Offset (IRO)
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210130184452.31711-1-kyung.min.park@intel.com
Link: https://lore.kernel.org/r/20210204014401.2846425-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When irq_domain_get_irq_data() or irqd_cfg() fails
at i == 0, data allocated by kzalloc() has not been
freed before returning, which leads to memleak.
Fixes: b106ee63ab ("irq_remapping/vt-d: Enhance Intel IR driver to support hierarchical irqdomains")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210105051837.32118-1-dinghao.liu@zju.edu.cn
Signed-off-by: Will Deacon <will@kernel.org>
Now that the old get_irq_domain() method has gone, consolidate on just the
map_XXX_to_iommu() functions.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-31-dwmw2@infradead.org
The I/O-APIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense, but
now is just a bunch of bits which get passed through in some seemingly
arbitrary order.
Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let
them just do their job and generate an MSI message. The bit swizzling to
turn that MSI message into the I/O-APIC's RTE is the same in all cases,
since it's a function of the I/O-APIC hardware. The IRQ remappers have no
real need to get involved with that.
The only slight caveat is that the I/OAPIC is interpreting some of those
fields too, and it does want the 'vector' field to be unique to make EOI
work. The AMD IOMMU happens to put its IRTE index in the bits that the
I/O-APIC thinks are the vector field, and accommodates this requirement by
reserving the first 32 indices for the I/O-APIC. The Intel IOMMU doesn't
actually use the bits that the I/O-APIC thinks are the vector field, so it
fills in the 'pin' value there instead.
[ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg
bitfields and added commentry to explain the mapping magic ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-22-dwmw2@infradead.org
Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.
Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.
[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
register access into the same union and eliminating a few more
places where bits were accessed through masks and shifts.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-21-dwmw2@infradead.org
'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
the trigger type (edge/level) and the active low/high configuration. While
there are defines for initializing these variables and struct members, they
are not used consequently and the meaning of 'trigger' and 'polarity' is
opaque and confusing at best.
Rename them to 'is_level' and 'active_low' and make them boolean in various
structs so it's entirely clear what the meaning is.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-20-dwmw2@infradead.org
Use the bitfields in the x86 shadow struct to compose the MSI message.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-14-dwmw2@infradead.org
apic::irq_dest_mode is actually a boolean, but defined as u32 and named in
a way which does not explain what it means.
Make it a boolean and rename it to 'dest_mode_logical'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-9-dwmw2@infradead.org
The enum ioapic_irq_destination_types and the enumerated constants starting
with 'dest_' are gross misnomers because they describe the delivery mode.
Rename then enum and the constants so they actually make sense.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-6-dwmw2@infradead.org
Now that the domain can be retrieved through device::msi_domain the domain
search for PCI_MSI[X] is not longer required. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112334.305699301@linutronix.de
As a first step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.
This is done from dmar_pci_bus_add_dev() because it has to work even when
DMA remapping is disabled. It only overrides the irqdomain of devices which
are handled by a regular PCI/MSI irq domain which protects PCI devices
behind special busses like VMD which have their own irq domain.
No functional change. It just avoids the redirection through
arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq
domain alloc/free functions instead of having to look up the irq domain for
every single MSI interupt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112333.714566121@linutronix.de
Convert the interrupt remap drivers to retrieve the pci device from the msi
descriptor and use info::hwirq.
This is the first step to prepare x86 for using the generic MSI domain ops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112332.466405395@linutronix.de
Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112332.054367732@linutronix.de
Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN
types, consolidate the two getter functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.741909337@linutronix.de
The irq domain request mode is now indicated in irq_alloc_info::type.
Consolidate the two getter functions into one.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.530546013@linutronix.de
irq_remapping_ir_irq_domain() is used to retrieve the remapping parent
domain for an allocation type. irq_remapping_irq_domain() is for retrieving
the actual device domain for allocating interrupts for a device.
The two functions are similar and can be unified by using explicit modes
for parent irq domain retrieval.
Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu
implementations. Drop the parent domain retrieval for PCI_MSI/X as that is
unused.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.436350257@linutronix.de
The VT-d spec requires (10.4.4 Global Command Register, GCMD_REG General
Description) that:
If multiple control fields in this register need to be modified, software
must serialize the modifications through multiple writes to this register.
However, in irq_remapping.c, modifications of IRE and CFI are done in one
write. We need to do two separate writes with STS checking after each. It
also checks the status register before writing command register to avoid
unnecessary register write.
Fixes: af8d102f99 ("x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20200828000615.8281-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- Untangle the header spaghetti which causes build failures in various
situations caused by the lockdep additions to seqcount to validate that
the write side critical sections are non-preemptible.
- The seqcount associated lock debug addons which were blocked by the
above fallout.
seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict per
CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
validate that the lock is held.
This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored and
write_seqcount_begin() has a lockdep assertion to validate that the
lock is held.
Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API is
unchanged and determines the type at compile time with the help of
_Generic which is possible now that the minimal GCC version has been
moved up.
Adding this lockdep coverage unearthed a handful of seqcount bugs which
have been addressed already independent of this.
While generaly useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if the
writers are serialized by an associated lock, which leads to the well
known reader preempts writer livelock. RT prevents this by storing the
associated lock pointer independent of lockdep in the seqcount and
changing the reader side to block on the lock when a reader detects
that a writer is in the write side critical section.
- Conversion of seqcount usage sites to associated types and initializers.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
81ppn2KlvzUK8g==
=7Gj+
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
"A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in
various situations caused by the lockdep additions to seqcount to
validate that the write side critical sections are non-preemptible.
- The seqcount associated lock debug addons which were blocked by the
above fallout.
seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict
per CPU seqcounts. As the lock is not part of the seqcount, lockdep
cannot validate that the lock is held.
This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored
and write_seqcount_begin() has a lockdep assertion to validate that
the lock is held.
Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API
is unchanged and determines the type at compile time with the help
of _Generic which is possible now that the minimal GCC version has
been moved up.
Adding this lockdep coverage unearthed a handful of seqcount bugs
which have been addressed already independent of this.
While generally useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if
the writers are serialized by an associated lock, which leads to
the well known reader preempts writer livelock. RT prevents this by
storing the associated lock pointer independent of lockdep in the
seqcount and changing the reader side to block on the lock when a
reader detects that a writer is in the write side critical section.
- Conversion of seqcount usage sites to associated types and
initializers"
* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/seqlock, headers: Untangle the spaghetti monster
locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
x86/headers: Remove APIC headers from <asm/smp.h>
seqcount: More consistent seqprop names
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
seqlock: Fold seqcount_LOCKNAME_init() definition
seqlock: Fold seqcount_LOCKNAME_t definition
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
hrtimer: Use sequence counter with associated raw spinlock
kvm/eventfd: Use sequence counter with associated spinlock
userfaultfd: Use sequence counter with associated spinlock
NFSv4: Use sequence counter with associated spinlock
iocost: Use sequence counter with associated spinlock
raid5: Use sequence counter with associated spinlock
vfs: Use sequence counter with associated spinlock
timekeeping: Use sequence counter with associated raw spinlock
xfrm: policy: Use sequence counters with associated lock
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
netfilter: conntrack: Use sequence counter with associated spinlock
sched: tasks: Use sequence counter with associated spinlock
...
The APIC headers are relatively complex and bring in additional
header dependencies - while smp.h is a relatively simple header
included from high level headers.
Remove the dependency and add in the missing #include's in .c
files where they gained it indirectly before.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 711419e504 ("irqdomain: Add the missing assignment of
domain->fwnode for named fwnode") unintentionally caused a dangling pointer
page fault issue on firmware nodes that were freed after IRQ domain
allocation. Commit e3beca48a4 fixed that dangling pointer issue by only
freeing the firmware node after an IRQ domain allocation failure. That fix
no longer frees the firmware node immediately, but leaves the firmware node
allocated after the domain is removed.
The firmware node must be kept around through irq_domain_remove, but should be
freed it afterwards.
Add the missing free operations after domain removal where where appropriate.
Fixes: e3beca48a4 ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
creating the irqdomain. The only purpose of these FW nodes is to convey
name information. When this was introduced the core code did not store the
pointer to the node in the irqdomain. A recent change stored the firmware
node pointer in irqdomain for other reasons and missed to notice that the
usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
are broken by this. Storing a dangling pointer is dangerous itself, but in
case that the domain is destroyed later on this leads to a double free.
Remove the freeing of the firmware node after creating the irqdomain from
all affected call sites to cure this.
Fixes: 711419e504 ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de