linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Sean Christopherson	e4553b4976	KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was moved to common x86 code. No functional change intended. Fixes: `37486135d3` ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-23 06:01:29 -04:00
Marcelo Tosatti	26769f96e6	KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Without TSC scaling, the current kernel interface will either return an error (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode. This change enables the following TSC tolerance check to accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default). /* * Compute the variation in TSC rate which is acceptable * within the range of tolerance and decide if the * rate being applied is within that bounds of the hardware * rate. If so, no scaling or compensation need be done. */ thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm); thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm); if (user_tsc_khz < thresh_lo \|\| user_tsc_khz > thresh_hi) { pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi); use_scaling = 1; } NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616114741.GA298183@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-23 05:55:17 -04:00
Xiaoyao Li	bf10bd0be5	KVM: X86: Fix MSR range of APIC registers in X2APIC mode Only MSR address range 0x800 through 0x8ff is architecturally reserved and dedicated for accessing APIC registers in x2APIC mode. Fixes: `0105d1a526` ("KVM: x2apic interface to lapic") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-23 05:49:45 -04:00
Lu Baolu	48f0bcfb7a	iommu/vt-d: Fix misuse of iommu_domain_identity_map() The iommu_domain_identity_map() helper takes start/end PFN as arguments. Fix a misuse case where the start and end addresses are passed. Fixes: `e70b081c6f` ("iommu/vt-d: Remove IOVA handling code from the non-dma_ops path") Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Tom Murphy <murphyt7@tcd.ie> Link: https://lore.kernel.org/r/20200622231345.29722-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-23 10:08:32 +02:00
Lu Baolu	04c00956ee	iommu/vt-d: Update scalable mode paging structure coherency The Scalable-mode Page-walk Coherency (SMPWC) field in the VT-d extended capability register indicates the hardware coherency behavior on paging structures accessed through the pasid table entry. This is ignored in current code and using ECAP.C instead which is only valid in legacy mode. Fix this so that paging structure updates could be manually flushed from the cache line if hardware page walking is not snooped. Fixes: `765b6a98c1` ("iommu/vt-d: Enumerate the scalable mode capability") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-23 10:08:32 +02:00
Lu Baolu	50310600eb	iommu/vt-d: Enable PCI ACS for platform opt in hint PCI ACS is disabled if Intel IOMMU is off by default or intel_iommu=off is used in command line. Unfortunately, Intel IOMMU will be forced on if there're devices sitting on an external facing PCI port that is marked as untrusted (for example, thunderbolt peripherals). That means, PCI ACS is disabled while Intel IOMMU is forced on to isolate those devices. As the result, the devices of an MFD will be grouped by a single group even the ACS is supported on device. [ 0.691263] pci 0000:00:07.1: Adding to iommu group 3 [ 0.691277] pci 0000:00:07.2: Adding to iommu group 3 [ 0.691292] pci 0000:00:07.3: Adding to iommu group 3 Fix it by requesting PCI ACS when Intel IOMMU is detected with platform opt in hint. Fixes: `89a6079df7` ("iommu/vt-d: Force IOMMU on for platform opt in hint") Co-developed-by: Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com> Signed-off-by: Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-23 10:08:32 +02:00
Rajat Jain	67e8a5b18d	iommu/vt-d: Don't apply gfx quirks to untrusted devices Currently, an external malicious PCI device can masquerade the VID:PID of faulty gfx devices, and thus apply iommu quirks to effectively disable the IOMMU restrictions for itself. Thus we need to ensure that the device we are applying quirks to, is indeed an internal trusted device. Signed-off-by: Rajat Jain <rajatja@google.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-23 10:08:32 +02:00
Lu Baolu	16ecf10e81	iommu/vt-d: Set U/S bit in first level page table by default When using first-level translation for IOVA, currently the U/S bit in the page table is cleared which implies DMA requests with user privilege are blocked. As the result, following error messages might be observed when passing through a device to user level: DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read] Request device [41:00.0] PASID 1 fault addr 7ecdcd000 [fault reason 129] SM: U/S set 0 for first-level translation with user privilege This fixes it by setting U/S bit in the first level page table and makes IOVA over first level compatible with previous second-level translation. Fixes: `b802d070a5` ("iommu/vt-d: Use iova over first level") Reported-by: Xin Zeng <xin.zeng@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-23 10:08:31 +02:00
Lu Baolu	9486727f59	iommu/vt-d: Make Intel SVM code 64-bit only Current Intel SVM is designed by setting the pgd_t of the processor page table to FLPTR field of the PASID entry. The first level translation only supports 4 and 5 level paging structures, hence it's infeasible for the IOMMU to share a processor's page table when it's running in 32-bit mode. Let's disable 32bit support for now and claim support only when all the missing pieces are ready in the future. Fixes: `1c4f88b7f1` ("iommu/vt-d: Shared virtual address in scalable mode") Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-23 10:08:31 +02:00
Alexander Usyskin	8c289ea064	mei: me: add tiger lake point device ids for H platforms. Add Tiger Lake device ids H for HECI1. TGH_H is also used in Tatlow SPS platform we need to disable the mei interface there. Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20200619165121.2145330-7-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-23 07:55:47 +02:00
Tomas Winkler	f76d77f50b	mei: me: disable mei interface on Mehlow server platforms For SPS firmware versions 5.0 and newer the way detection has changed. The detection is done now via PCI_CFG_HFS_3 register. To prevent conflict the previous method will get sps_4 suffix Disable both CNP_H and CNP_H_3 interfaces. CNP_H_3 requires a separate configuration as it doesn't support DMA. Cc: <stable@vger.kernel.org> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20200619165121.2145330-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-23 07:55:47 +02:00
Todd Kjos	d35d3660e0	binder: fix null deref of proc->context The binder driver makes the assumption proc->context pointer is invariant after initialization (as documented in the kerneldoc header for struct proc). However, in commit `f0fe2c0f05` ("binder: prevent UAF for binderfs devices II") proc->context is set to NULL during binder_deferred_release(). Another proc was in the middle of setting up a transaction to the dying process and crashed on a NULL pointer deref on "context" which is a local set to &proc->context: new_ref->data.desc = (node == context->binder_context_mgr_node) ? 0 : 1; Here's the stack: [ 5237.855435] Call trace: [ 5237.855441] binder_get_ref_for_node_olocked+0x100/0x2ec [ 5237.855446] binder_inc_ref_for_node+0x140/0x280 [ 5237.855451] binder_translate_binder+0x1d0/0x388 [ 5237.855456] binder_transaction+0x2228/0x3730 [ 5237.855461] binder_thread_write+0x640/0x25bc [ 5237.855466] binder_ioctl_write_read+0xb0/0x464 [ 5237.855471] binder_ioctl+0x30c/0x96c [ 5237.855477] do_vfs_ioctl+0x3e0/0x700 [ 5237.855482] __arm64_sys_ioctl+0x78/0xa4 [ 5237.855488] el0_svc_common+0xb4/0x194 [ 5237.855493] el0_svc_handler+0x74/0x98 [ 5237.855497] el0_svc+0x8/0xc The fix is to move the kfree of the binder_device to binder_free_proc() so the binder_device is freed when we know there are no references remaining on the binder_proc. Fixes: `f0fe2c0f05` ("binder: prevent UAF for binderfs devices II") Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Todd Kjos <tkjos@google.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200622200715.114382-1-tkjos@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-06-23 07:54:46 +02:00
Aiden Leong	26ac10be3c	GUE: Fix a typo Fix a typo in gue.h Signed-off-by: Aiden Leong <aiden.leong@aibsd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:12:44 -07:00
David S. Miller	8af7b4525a	Merge branch 'net-atlantic-additional-A2-features' Igor Russkikh says: ==================== net: atlantic: additional A2 features This patchset adds more features to A2: * half duplex rates; * EEE; * flow control; * link partner capabilities reporting; * phy loopback. Feature-wise A2 is almost on-par with A1 save for WoL and filtering, which will be submitted as separate follow-up patchset(s). ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Dmitry Bogdanov	ecab78703f	net: atlantic: A2: phy loopback support This patch adds the phy loopback support on A2. Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Dmitry Bogdanov	2b53b04de3	net: atlantic: A2: report link partner capabilities This patch adds link partner capabilities reporting support on A2. In particular, the following capabilities are available for reporting: * link rate; * EEE; * flow control. Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Igor Russkikh	3e168de529	net: atlantic: A2: flow control support This patch adds flow control support on A2. Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Nikita Danilov	ce6a690ccc	net: atlantic: A2: EEE support This patch adds EEE support on A2. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Co-developed-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Nikita Danilov	e61b28686b	net: atlantic: remove baseX usage This patch removes 2.5G baseX wrong usage/reporting, since it shouldn't have been mixed with baseT. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Igor Russkikh	071a02046c	net: atlantic: A2: half duplex support This patch adds support for 10M/100M/1G half duplex rates, which are supported by A2 in additional to full duplex rates supported by A1. Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:10:22 -07:00
Geliang Tang	b562f58bbc	mptcp: drop sndr_key in mptcp_syn_options In RFC 8684, we don't need to send sndr_key in SYN package anymore, so drop it. Fixes: `cc7972ea19` ("mptcp: parse and emit MP_CAPABLE option according to v1 spec") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 21:06:39 -07:00
Stephen Rothwell	29cb9868fb	net/core/devlink.c: remove new uninitialized_var() usage Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:56:30 -07:00
Gaurav Singh	c5efcf17bf	tcindex_change: Remove redundant null check arg cannot be NULL since its already being dereferenced before. Remove the redundant NULL check. Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:55:09 -07:00
Gaurav Singh	21a739c64d	ethtool: Fix check in ethtool_rx_flow_rule_create Fix check in ethtool_rx_flow_rule_create Fixes: `eca4205f9e` ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:48:12 -07:00
Russell King	75674e3159	net: mtk_eth_soc: use resolved link config in mac_link_up() Convert the mtk_eth_soc driver to use the finalised link parameters in mac_link_up() rather than the parameters in mac_config(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:45:53 -07:00
Taehee Yoo	de0083c7ed	hsr: avoid to create proc file after unregister When an interface is being deleted, "/proc/net/dev_snmp6/<interface name>" is deleted. The function for this is addrconf_ifdown() in the addrconf_notify() and it is called by notification, which is NETDEV_UNREGISTER. But, if NETDEV_CHANGEMTU is triggered after NETDEV_UNREGISTER, this proc file will be created again. This recreated proc file will be deleted by netdev_wati_allrefs(). Before netdev_wait_allrefs() is called, creating a new HSR interface routine can be executed and It tries to create a proc file but it will find an un-deleted proc file. At this point, it warns about it. To avoid this situation, it can use ->dellink() instead of ->ndo_uninit() to release resources because ->dellink() is called before NETDEV_UNREGISTER. So, a proc file will not be recreated. Test commands ip link add dummy0 type dummy ip link add dummy1 type dummy ip link set dummy0 mtu 1300 #SHELL1 while : do ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 done #SHELL2 while : do ip link del hsr0 done Splat looks like: [ 9888.980852][ T2752] proc_dir_entry 'dev_snmp6/hsr0' already registered [ 9888.981797][ C2] WARNING: CPU: 2 PID: 2752 at fs/proc/generic.c:372 proc_register+0x2d5/0x430 [ 9888.981798][ C2] Modules linked in: hsr dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6x [ 9888.981814][ C2] CPU: 2 PID: 2752 Comm: ip Tainted: G W 5.8.0-rc1+ #616 [ 9888.981815][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 9888.981816][ C2] RIP: 0010:proc_register+0x2d5/0x430 [ 9888.981818][ C2] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 65 01 00 00 49 8b b5 e0 00 00 00 48 89 ea 40 [ 9888.981819][ C2] RSP: 0018:ffff8880628dedf0 EFLAGS: 00010286 [ 9888.981821][ C2] RAX: dffffc0000000008 RBX: ffff888028c69170 RCX: ffffffffaae09a62 [ 9888.981822][ C2] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f75ac [ 9888.981823][ C2] RBP: ffff888028c693f4 R08: ffffed100d9401bd R09: ffffed100d9401bd [ 9888.981824][ C2] R10: ffffffffaddf406f R11: 0000000000000001 R12: ffff888028c69308 [ 9888.981825][ C2] R13: ffff8880663584c8 R14: dffffc0000000000 R15: ffffed100518d27e [ 9888.981827][ C2] FS: 00007f3876b3b0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 9888.981828][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9888.981829][ C2] CR2: 00007f387601a8c0 CR3: 000000004101a002 CR4: 00000000000606e0 [ 9888.981830][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9888.981831][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9888.981832][ C2] Call Trace: [ 9888.981833][ C2] ? snmp6_seq_show+0x180/0x180 [ 9888.981834][ C2] proc_create_single_data+0x7c/0xa0 [ 9888.981835][ C2] snmp6_register_dev+0xb0/0x130 [ 9888.981836][ C2] ipv6_add_dev+0x4b7/0xf60 [ 9888.981837][ C2] addrconf_notify+0x684/0x1ca0 [ 9888.981838][ C2] ? __mutex_unlock_slowpath+0xd0/0x670 [ 9888.981839][ C2] ? kasan_unpoison_shadow+0x30/0x40 [ 9888.981840][ C2] ? wait_for_completion+0x250/0x250 [ 9888.981841][ C2] ? inet6_ifinfo_notify+0x100/0x100 [ 9888.981842][ C2] ? dropmon_net_event+0x227/0x410 [ 9888.981843][ C2] ? notifier_call_chain+0x90/0x160 [ 9888.981844][ C2] ? inet6_ifinfo_notify+0x100/0x100 [ 9888.981845][ C2] notifier_call_chain+0x90/0x160 [ 9888.981846][ C2] register_netdevice+0xbe5/0x1070 [ ... ] Reported-by: syzbot+1d51c8b74efa4c44adeb@syzkaller.appspotmail.com Fixes: `e0a4b99773` ("hsr: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:42:23 -07:00
David S. Miller	864cefeea0	Merge branch 'Multicast-improvement-in-Ocelot-and-Felix-drivers' Vladimir Oltean says: ==================== Multicast improvement in Ocelot and Felix drivers This series makes some basic multicast forwarding functionality work for Felix DSA and for Ocelot switchdev. IGMP/MLD snooping in Felix is still missing, and there are other improvements to be made in the general area of multicast address filtering towards the CPU, but let's get these hardware-specific fixes out of the way first. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:41:05 -07:00
Vladimir Oltean	9403c158b8	net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries The current procedure for installing a multicast address is hardcoded for IPv4. But, in the ocelot hardware, there are 3 different procedures for IPv4, IPv6 and for regular L2 multicast. For IPv6 (33-33-xx-xx-xx-xx), it's the same as for IPv4 (01-00-5e-xx-xx-xx), except that the destination port mask is stuffed into first 2 bytes of the MAC address except into first 3 bytes. For plain Ethernet multicast, there's no port-in-address stuffing going on, instead the DEST_IDX (pointer to PGID) is used there, just as for unicast. So we have to use one of the nonreserved multicast PGIDs that the hardware has allocated for this purpose. This patch classifies the type of multicast address based on its first bytes, then redirects to one of the 3 different hardware procedures. Note that this gives us a really better way of redirecting PTP frames sent at 01-1b-19-00-00-00 to the CPU. Previously, Yangbo Lu tried to add a trapping rule for PTP EtherType but got a lot of pushback: https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/ But right now, that isn't needed at all. The application stack (ptp4l) does this for the PTP multicast addresses it's interested in (which are configurable, and include 01-1b-19-00-00-00): memset(&mreq, 0, sizeof(mreq)); mreq.mr_ifindex = index; mreq.mr_type = PACKET_MR_MULTICAST; mreq.mr_alen = MAC_LEN; memcpy(mreq.mr_address, addr1, MAC_LEN); err1 = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); Into the kernel, this translates into a dev_mc_add on the switch network interfaces, and our drivers know that it means they should translate it into a host MDB address (make the CPU port be the destination). Previously, this was broken because all mdb addresses were treated as IPv4 (which 01-1b-19-00-00-00 obviously is not). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:41:05 -07:00
Vladimir Oltean	96b029b004	net: mscc: ocelot: introduce macros for iterating over PGIDs The current iterators are impossible to understand at first glance without switching back and forth between the definitions and their actual use in the for loops. So introduce some convenience names to help readability. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:41:05 -07:00
Vladimir Oltean	209edf95da	net: dsa: felix: call port mdb operations from ocelot This adds the mdb hooks in felix and exports the mdb functions from ocelot. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:41:05 -07:00
Vladimir Oltean	471beb11c4	net: mscc: ocelot: make the NPI port a proper target for FDB and MDB When used in DSA mode (as seen in Felix), the DEST_IDX in the MAC table should point to the PGID for the CPU port (PGID_CPU) and not for the Ethernet port where the CPU queues are redirected to (also known as Node Processor Interface - NPI). Because for Felix this distinction shouldn't really matter (from DSA perspective, the NPI port _is_ the CPU port), make the ocelot library act upon the CPU port when NPI mode is enabled. This has no effect for the mscc_ocelot driver for VSC7514, because that does not use NPI (and ocelot->npi is -1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:41:05 -07:00
Vladimir Oltean	0897ecf753	net: mscc: ocelot: fix encoding destination ports into multicast IPv4 address The ocelot hardware designers have made some hacks to support multicast IPv4 and IPv6 addresses. Normally, the MAC table matches on MAC addresses and the destination ports are selected through the DEST_IDX field of the respective MAC table entry. The DEST_IDX points to a Port Group ID (PGID) which contains the bit mask of ports that frames should be forwarded to. But there aren't a lot of PGIDs (only 80 or so) and there are clearly many more IP multicast addresses than that, so it doesn't scale to use this PGID mechanism, so something else was done. Since the first portion of the MAC address is known, the hack they did was to use a single PGID for _flooding_ unknown IPv4 multicast (PGID_MCIPV4 == 62), but for known IP multicast, embed the destination ports into the first 3 bytes of the MAC address recorded in the MAC table. The VSC7514 datasheet explains it like this: 3.9.1.5 IPv4 Multicast Entries MAC table entries with the ENTRY_TYPE = 2 settings are interpreted as IPv4 multicast entries. IPv4 multicasts entries match IPv4 frames, which are classified to the specified VID, and which have DMAC = 0x01005Exxxxxx, where xxxxxx is the lower 24 bits of the MAC address in the entry. Instead of a lookup in the destination mask table (PGID), the destination set is programmed as part of the entry MAC address. This is shown in the following table. Table 78: IPv4 Multicast Destination Mask Destination Ports Record Bit Field --------------------------------------------- Ports 10-0 MAC[34-24] Example: All IPv4 multicast frames in VLAN 12 with MAC 01005E112233 are to be forwarded to ports 3, 8, and 9. This is done by inserting the following entry in the MAC table entry: VALID = 1 VID = 12 MAC = 0x000308112233 ENTRY_TYPE = 2 DEST_IDX = 0 But this procedure is not at all what's going on in the driver. In fact, the code that embeds the ports into the MAC address looks like it hasn't actually been tested. This patch applies the procedure described in the datasheet. Since there are many other fixes to be made around multicast forwarding until it works properly, there is no real reason for this patch to be backported to stable trees, or considered a real fix of something that should have worked. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 20:41:05 -07:00
Frieder Schrempf	d22a16cc92	ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain The WDOG_ANY signal is connected to the RESET_IN signal of the SoM and baseboard. It is currently configured as push-pull, which means that if some external device like a programmer wants to assert the RESET_IN signal by pulling it to ground, it drives against the high level WDOG_ANY output of the SoC. To fix this we set the WDOG_ANY signal to open-drain configuration. That way we make sure that the RESET_IN can be asserted by the watchdog as well as by external devices. Fixes: `1ea4b76cdf` ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards") Cc: stable@vger.kernel.org Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>	2020-06-23 11:39:35 +08:00
Frieder Schrempf	04a2c05179	ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM The watchdog's WDOG_ANY signal is used to trigger a POR of the SoC, if a soft reset is issued. As the SoM hardware connects the WDOG_ANY and the POR signals, the watchdog node itself and the pin configuration should be part of the common SoM devicetree. Let's move it from the baseboard's devicetree to its proper place. Fixes: `1ea4b76cdf` ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards") Cc: stable@vger.kernel.org Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>	2020-06-23 11:39:21 +08:00
Dave Chinner	c7f87f3984	xfs: fix use-after-free on CIL context on shutdown xlog_wait() on the CIL context can reference a freed context if the waiter doesn't get scheduled before the CIL context is freed. This can happen when a task is on the hard throttle and the CIL push aborts due to a shutdown. This was detected by generic/019: thread 1 thread 2 __xfs_trans_commit xfs_log_commit_cil <CIL size over hard throttle limit> xlog_wait schedule xlog_cil_push_work wake_up_all <shutdown aborts commit> xlog_cil_committed kmem_free remove_wait_queue spin_lock_irqsave --> UAF Fix it by moving the wait queue to the CIL rather than keeping it in in the CIL context that gets freed on push completion. Because the wait queue is now independent of the CIL context and we might have multiple contexts in flight at once, only wake the waiters on the push throttle when the context we are pushing is over the hard throttle size threshold. Fixes: `0e7ab7efe7` ("xfs: Throttle commits on delayed background CIL push") Reported-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-06-22 19:22:57 -07:00
Sean Christopherson	bf09fb6cba	KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL Remove support for context switching between the guest's and host's desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't required for correct functionality, e.g. KVM intercepts reads and writes to the MSR, and the latency effects of the settings controlled by the MSR are not architecturally visible. As a general rule, KVM should not allow the guest to control power management settings unless explicitly enabled by userspace, e.g. see KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2 can improve the performance of SMT siblings. A devious guest could disable C0.2 so as to improve the performance of their workloads at the detriment to workloads running in the host or on other VMs. Wholesale removal of UMWAIT_CONTROL context switching also fixes a race condition where updates from the host may cause KVM to enter the guest with the incorrect value. Because updates are are propagated to all CPUs via IPI (SMP function callback), the value in hardware may be stale with respect to the cached value and KVM could enter the guest with the wrong value in hardware. As above, the guest can't observe the bad value, but it's a weird and confusing wart in the implementation. Removal also fixes the unnecessary usage of VMX's atomic load/store MSR lists. Using the lists is only necessary for MSRs that are required for correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on old hardware, or for MSRs that need to-the-uop precision, e.g. perf related MSRs. For UMWAIT_CONTROL, the effects are only visible in the kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in vcpu_vmx_run(). Using the atomic lists is undesirable as they are more expensive than direct RDMSR/WRMSR. Furthermore, even if giving the guest control of the MSR is legitimate, e.g. in pass-through scenarios, it's not clear that the benefits would outweigh the overhead. E.g. saving and restoring an MSR across a VMX roundtrip costs ~250 cycles, and if the guest diverged from the host that cost would be paid on every run of the guest. In other words, if there is a legitimate use case then it should be enabled by a new per-VM capability. Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can correctly expose other WAITPKG features to the guest, e.g. TPAUSE, UMWAIT and UMONITOR. Fixes: `6e3ba4abce` ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Cc: Jingqi Liu <jingqi.liu@intel.com> Cc: Tao Xu <tao3.xu@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-22 20:54:57 -04:00
Alexei Starovoitov	b3eece09e2	Merge branch 'bpftool-show-pid' Andrii Nakryiko says: ==================== This patch set implements libbpf support for a second kind of special externs, kernel symbols, in addition to existing Kconfig externs. Right now, only untyped (const void) externs are supported, which, in C language, allow only to take their address. In the future, with kernel BTF getting type info about its own global and per-cpu variables, libbpf will extend this support with BTF type info, which will allow to also directly access variable's contents and follow its internal pointers, similarly to how it's possible today in fentry/fexit programs. As a first practical use of this functionality, bpftool gained ability to show PIDs of processes that have open file descriptors for BPF map/program/link/BTF object. It relies on iter/task_file BPF iterator program to extract this information efficiently. There was a bunch of bpftool refactoring (especially Makefile) necessary to generalize bpftool's internal BPF program use. This includes generalization of BPF skeletons support, addition of a vmlinux.h generation, extracting and building minimal subset of bpftool for bootstrapping. v2->v3: - fix sec_btf_id check (Hao); v1->v2: - docs fixes (Quentin); - dual GPL/BSD license for pid_inter.bpf.c (Quentin); - NULL-init kcfg_data (Hao Luo); rfc->v1: - show pids, if supported by kernel, always (Alexei); - switched iter output to binary to support showing process names; - update man pages; - fix few minor bugs in libbpf w.r.t. extern iteration. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-06-22 17:02:32 -07:00
Andrii Nakryiko	075c776658	tools/bpftool: Add documentation and sample output for process info Add statements about bpftool being able to discover process info, holding reference to BPF map, prog, link, or BTF. Show example output as well. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-10-andriin@fb.com	2020-06-22 17:01:49 -07:00
Andrii Nakryiko	d53dee3fe0	tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs Add bpf_iter-based way to find all the processes that hold open FDs against BPF object (map, prog, link, btf). bpftool always attempts to discover this, but will silently give up if kernel doesn't yet support bpf_iter BPF programs. Process name and PID are emitted for each process (task group). Sample output for each of 4 BPF objects: $ sudo ./bpftool prog show 2694: cgroup_device tag 8c42dee26e8cd4c2 gpl loaded_at 2020-06-16T15:34:32-0700 uid 0 xlated 648B jited 409B memlock 4096B pids systemd(1) 2907: cgroup_skb name egress tag 9ad187367cf2b9e8 gpl loaded_at 2020-06-16T18:06:54-0700 uid 0 xlated 48B jited 59B memlock 4096B map_ids 2436 btf_id 1202 pids test_progs(2238417), test_progs(`2238445`) $ sudo ./bpftool map show 2436: array name test_cgr.bss flags 0x400 key 4B value 8B max_entries 1 memlock 8192B btf_id 1202 pids test_progs(2238417), test_progs(`2238445`) 2445: array name pid_iter.rodata flags 0x480 key 4B value 4B max_entries 1 memlock 8192B btf_id 1214 frozen pids bpftool(2239612) $ sudo ./bpftool link show 61: cgroup prog 2908 cgroup_id 375301 attach_type egress pids test_progs(2238417), test_progs(`2238445`) 62: cgroup prog 2908 cgroup_id 375344 attach_type egress pids test_progs(2238417), test_progs(`2238445`) $ sudo ./bpftool btf show 1202: size 1527B prog_ids 2908,2907 map_ids 2436 pids test_progs(2238417), test_progs(`2238445`) 1242: size 34684B pids bpftool(2258892) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-9-andriin@fb.com	2020-06-22 17:01:49 -07:00
Andrii Nakryiko	bd9bedf84b	libbpf: Wrap source argument of BPF_CORE_READ macro in parentheses Wrap source argument of BPF_CORE_READ family of macros into parentheses to allow uses like this: BPF_CORE_READ((struct cast_struct *)src, a, b, c); Fixes: `7db3822ab9` ("libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200619231703.738941-8-andriin@fb.com	2020-06-22 17:01:48 -07:00
Andrii Nakryiko	05aca6da3b	tools/bpftool: Generalize BPF skeleton support and generate vmlinux.h Adapt Makefile to support BPF skeleton generation beyond single profiler.bpf.c case. Also add vmlinux.h generation and switch profiler.bpf.c to use it. clang-bpf-global-var feature is extended and renamed to clang-bpf-co-re to check for support of preserve_access_index attribute, which, together with BTF for global variables, is the minimum requirement for modern BPF programs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-7-andriin@fb.com	2020-06-22 17:01:48 -07:00
Andrii Nakryiko	16e9b187ab	tools/bpftool: Minimize bootstrap bpftool Build minimal "bootstrap mode" bpftool to enable skeleton (and, later, vmlinux.h generation), instead of building almost complete, but slightly different (w/o skeletons, etc) bpftool to bootstrap complete bpftool build. Current approach doesn't scale well (engineering-wise) when adding more BPF programs to bpftool and other complicated functionality, as it requires constant adjusting of the code to work in both bootstrapped mode and normal mode. So it's better to build only minimal bpftool version that supports only BPF skeleton code generation and BTF-to-C conversion. Thankfully, this is quite easy to accomplish due to internal modularity of bpftool commands. This will also allow to keep adding new functionality to bpftool in general, without the need to care about bootstrap mode for those new parts of bpftool. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-6-andriin@fb.com	2020-06-22 17:01:48 -07:00
Andrii Nakryiko	a479b8ce4e	tools/bpftool: Move map/prog parsing logic into common Move functions that parse map and prog by id/tag/name/etc outside of map.c/prog.c, respectively. These functions are used outside of those files and are generic enough to be in common. This also makes heavy-weight map.c and prog.c more decoupled from the rest of bpftool files and facilitates more lightweight bootstrap bpftool variant. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-5-andriin@fb.com	2020-06-22 17:01:48 -07:00
Andrii Nakryiko	b7ddfab20a	selftests/bpf: Add __ksym extern selftest Validate libbpf is able to handle weak and strong kernel symbol externs in BPF code correctly. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-4-andriin@fb.com	2020-06-22 17:01:48 -07:00
Andrii Nakryiko	1c0c7074fe	libbpf: Add support for extracting kernel symbol addresses Add support for another (in addition to existing Kconfig) special kind of externs in BPF code, kernel symbol externs. Such externs allow BPF code to "know" kernel symbol address and either use it for comparisons with kernel data structures (e.g., struct file's f_op pointer, to distinguish different kinds of file), or, with the help of bpf_probe_user_kernel(), to follow pointers and read data from global variables. Kernel symbol addresses are found through /proc/kallsyms, which should be present in the system. Currently, such kernel symbol variables are typeless: they have to be defined as `extern const void <symbol>` and the only operation you can do (in C code) with them is to take its address. Such extern should reside in a special section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong vs weak semantics stays the same as with Kconfig externs. If symbol is not found in /proc/kallsyms, this will be a failure for strong (non-weak) extern, but will be defaulted to 0 for weak externs. If the same symbol is defined multiple times in /proc/kallsyms, then it will be error if any of the associated addresses differs. In that case, address is ambiguous, so libbpf falls on the side of caution, rather than confusing user with randomly chosen address. In the future, once kernel is extended with variables BTF information, such ksym externs will be supported in a typed version, which will allow BPF program to read variable's contents directly, similarly to how it's done for fentry/fexit input arguments. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-3-andriin@fb.com	2020-06-22 17:01:48 -07:00
Andrii Nakryiko	2e33efe32e	libbpf: Generalize libbpf externs support Switch existing Kconfig externs to be just one of few possible kinds of more generic externs. This refactoring is in preparation for ksymbol extern support, added in the follow up patch. There are no functional changes intended. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20200619231703.738941-2-andriin@fb.com	2020-06-22 17:01:48 -07:00
Tuomas Tynkkynen	b835a71ef6	usbnet: smsc95xx: Fix use-after-free after removal Syzbot reports an use-after-free in workqueue context: BUG: KASAN: use-after-free in mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737 mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737 __smsc95xx_mdio_read drivers/net/usb/smsc95xx.c:217 [inline] smsc95xx_mdio_read+0x583/0x870 drivers/net/usb/smsc95xx.c:278 check_carrier+0xd1/0x2e0 drivers/net/usb/smsc95xx.c:644 process_one_work+0x777/0xf90 kernel/workqueue.c:2274 worker_thread+0xa8f/0x1430 kernel/workqueue.c:2420 kthread+0x2df/0x300 kernel/kthread.c:255 It looks like that smsc95xx_unbind() is freeing the structures that are still in use by the concurrently running workqueue callback. Thus switch to using cancel_delayed_work_sync() to ensure the work callback really is no longer active. Reported-by: syzbot+29dc7d4ae19b703ff947@syzkaller.appspotmail.com Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:34:31 -07:00
David S. Miller	19430ede90	Merge branch 'mlxsw-Offload-TC-action-pedit-munge-tcp-udp-sport-dport' Ido Schimmel says: ==================== mlxsw: Offload TC action pedit munge tcp/udp sport/dport Petr says: On Spectrum-2 and Spectrum-3, it is possible to overwrite L4 port number of a TCP or UDP packet in the ACL engine. That corresponds to the pedit munges of tcp and udp sport resp. dport fields. Offload these munges on the systems where they are supported. The current offloading code assumes that all systems support the same set of fields. This now changes, so in patch #1 first split handling of pedit munges by chip type. The analysis of which packet field a given munge describes is kept generic. Patch #2 introduces the new flexible action fields. Patch #3 then adds the new pedit fields, and dispatches on them on Spectrum>1. Patch #4 adds a forwarding selftest for pedit dsfield, applicable to SW as well as HW datapaths. ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Petr Machata	13bd5d0256	selftests: forwarding: Add a test for pedit munge tcp, udp sport, dport Add a test that checks that pedit adjusts port numbers of tcp and udp packets. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Petr Machata	ce10d7d4ad	mlxsw: spectrum_acl: Support FLOW_ACTION_MANGLE for TCP, UDP ports Spectrum-2 supports an ACL action L4_PORT, which allows TCP and UDP source and destination port number change. Offload suitable mangles to this action. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00

... 30 31 32 33 34 ...

935158 commits