Linus 5.3-rc1

-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0006weHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGaDUIAJ4oTyVWpMRZkfG6 vVY8qVMU3zlzEqRiyLYjkXoe/mGpuU/UVTyyStllxZ+Gg9da0mGwlugScKriPJof 4KRUDDTGX5DrfEOo+0brKvM+PYh9uGViPgKXzyv7i6BrnX2z3JdBR4bKNuEYlAJ9 N93Qg7v05SBHIq2Gfp3klrdWbsTTW2EaDTLbcgifXLnfKyFr47kwsmXAHPlTFP0p dYsZHHmf14Y9n1+ToZeVINgjQFr6mFn6ygY/PqTnd6vCgEEfP9eENJ4BZCtN1ZL/ V0BO9MyJ5iZV0AfwSEKydk+kDEvO16TG/nyDrECVuur7AXsBx18ZplVc787f6GK+ dyCQJ3U= =XLAF -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXTYRHQAKCRDj7w1vZxhR xY5IAQC0H/r62rlFq+JpbmksutMqvIferowP7HUk6yOaAKdVawD/c1qsTk/xxI0x StrxRCDqeGA7D2R/ZNb/4sobnn7+oAM= =k9CF -----END PGP SIGNATURE----- Merge v5.3-rc1 into drm-misc-next Noralf needs some SPI patches in 5.3 to merge some work on tinydrm. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2025-03-06 20:59:54 +01:00 · 2019-07-22 21:24:10 +02:00 · 2019-07-22 21:24:10 +02:00 · 03b0f2ce73
commit 03b0f2ce73
parent e4f86e4371 5f9e832c13
16695 changed files with 979854 additions and 436075 deletions
--- a/.gitignore
+++ b/.gitignore
@ -30,6 +30,7 @@
 *.lz4
 *.lzma
 *.lzo
 *.mod
 *.mod.c
 *.o
 *.o.*
--- a/.mailmap
+++ b/.mailmap
@ -81,6 +81,7 @@ Greg Kroah-Hartman <greg@echidna.(none)>
 Greg Kroah-Hartman <gregkh@suse.de>
 Greg Kroah-Hartman <greg@kroah.com>
 Gregory CLEMENT <gregory.clement@bootlin.com> <gregory.clement@free-electrons.com>
 Hanjun Guo <guohanjun@huawei.com> <hanjun.guo@linaro.org>
 Henk Vergonet <Henk.Vergonet@gmail.com>
 Henrik Kretzschmar <henne@nachtwindheim.de>
 Henrik Rydberg <rydberg@bitmath.org>
@ -238,6 +239,7 @@ Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>
 Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@virtuozzo.com>
 Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@parallels.com>
 Takashi YOSHII <takashi.yoshii.zj@renesas.com>
 Will Deacon <will@kernel.org> <will.deacon@arm.com>
 Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
 Yusuke Goda <goda.yusuke@renesas.com>
 Gustavo Padovan <gustavo@las.ic.unicamp.br>
--- a/5
+++ b/5
@ -1770,7 +1770,6 @@ S: USA
 N: Dave Jones
 E: davej@codemonkey.org.uk
 W: http://www.codemonkey.org.uk
 D: Assorted VIA x86 support.
 D: 2.5 AGPGART overhaul.
 D: CPUFREQ maintenance.
@ -1800,7 +1799,7 @@ S: 2300 Copenhagen S.
 S: Denmark
 N: Jozsef Kadlecsik
-E: kadlec@blackhole.kfki.hu
+E: kadlec@netfilter.org
 P: 1024D/470DB964 4CB3 1A05 713E 9BF7 FAC5  5809 DD8C B7B1 470D B964
 D: netfilter: TCP window tracking code
 D: netfilter: raw table
@ -3120,7 +3119,7 @@ S: France
 N: Rik van Riel
 E: riel@redhat.com
 W: http://www.surriel.com/
-D: Linux-MM site, Documentation/sysctl/*, swap/mm readaround
+D: Linux-MM site, Documentation/admin-guide/sysctl/*, swap/mm readaround
 D: kswapd fixes, random kernel hacker, rmap VM,
 D: nl.linux.org administrator, minor scheduler additions
 S: Red Hat Boston
--- a/Documentation/ABI/obsolete/sysfs-driver-hid-roccat-pyra
+++ b/Documentation/ABI/obsolete/sysfs-driver-hid-roccat-pyra
@ -5,7 +5,7 @@ Description:	It is possible to switch the cpi setting of the mouse with the
 		press of a button.
 		When read, this file returns the raw number of the actual cpi
 		setting reported by the mouse. This number has to be further
-		processed to receive the real dpi value.
+		processed to receive the real dpi value:
 		VALUE DPI
 		1     400
--- a/Documentation/ABI/obsolete/sysfs-gpio
+++ b/Documentation/ABI/obsolete/sysfs-gpio
@ -11,7 +11,7 @@ Description:
  Kernel code may export it for complete or partial access.
  GPIOs are identified as they are inside the kernel, using integers in
-  the range 0..INT_MAX.  See Documentation/gpio for more information.
+  the range 0..INT_MAX.  See Documentation/admin-guide/gpio for more information.
    /sys/class/gpio
 	/export ... asks the kernel to export a GPIO to userspace
--- a/Documentation/ABI/removed/sysfs-class-rfkill
+++ b/Documentation/ABI/removed/sysfs-class-rfkill
@ -1,6 +1,6 @@
 rfkill - radio frequency (RF) connector kill switch support
-For details to this subsystem look at Documentation/rfkill.txt.
+For details to this subsystem look at Documentation/driver-api/rfkill.rst.
 What:		/sys/class/rfkill/rfkill[0-9]+/claim
 Date:		09-Jul-2007
--- a/Documentation/ABI/stable/sysfs-class-infiniband
+++ b/Documentation/ABI/stable/sysfs-class-infiniband
@ -423,23 +423,6 @@ Description:
 		(e.g. driver restart on the VM which owns the VF).
 sysfs interface for NetEffect RNIC Low-Level iWARP driver (nes)
 ---------------------------------------------------------------
 What:		/sys/class/infiniband/nesX/hw_rev
 What:		/sys/class/infiniband/nesX/hca_type
 What:		/sys/class/infiniband/nesX/board_id
 Date:		Feb, 2008
 KernelVersion:	v2.6.25
 Contact:	linux-rdma@vger.kernel.org
 Description:
 		hw_rev:		(RO) Hardware revision number
 		hca_type:	(RO) Host Channel Adapter type (NEX020)
 		board_id:	(RO) Manufacturing board id
 sysfs interface for Chelsio T4/T5 RDMA driver (cxgb4)
 -----------------------------------------------------
--- a/Documentation/ABI/stable/sysfs-class-rfkill
+++ b/Documentation/ABI/stable/sysfs-class-rfkill
@ -1,6 +1,6 @@
 rfkill - radio frequency (RF) connector kill switch support
-For details to this subsystem look at Documentation/rfkill.txt.
+For details to this subsystem look at Documentation/driver-api/rfkill.rst.
 For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in
 Documentation/ABI/removed/sysfs-class-rfkill.
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@ -61,7 +61,7 @@ Date:		October 2002
 Contact:	Linux Memory Management list <linux-mm@kvack.org>
 Description:
 		The node's hit/miss statistics, in units of pages.
-		See Documentation/numastat.txt
+		See Documentation/admin-guide/numastat.rst
 What:		/sys/devices/system/node/nodeX/distance
 Date:		October 2002
--- a/Documentation/ABI/stable/sysfs-driver-mlxreg-io
+++ b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
@ -1,5 +1,4 @@
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/asic_health
 							asic_health
 Date:		June 2018
 KernelVersion:	4.19
@ -9,9 +8,8 @@ Description:	This file shows ASIC health status. The possible values are:
 		The files are read only.
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld1_version
-							cpld1_version
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld2_version
 							cpld2_version
 Date:		June 2018
 KernelVersion:	4.19
 Contact:	Vadim Pasternak <vadimpmellanox.com>
@ -20,8 +18,7 @@ Description:	These files show with which CPLD versions have been burned
 		The files are read only.
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/fan_dir
 							fan_dir
 Date:		December 2018
 KernelVersion:	5.0
@ -32,8 +29,7 @@ Description:	This file shows the system fans direction:
 		The files are read only.
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable
 							jtag_enable
 Date:		November 2018
 KernelVersion:	5.0
@ -43,8 +39,7 @@ Description:	These files show with which CPLD versions have been burned
 		The files are read only.
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable
 							jtag_enable
 Date:		November 2018
 KernelVersion:	5.0
@ -87,16 +82,15 @@ Description:	These files allow asserting system power cycling, switching
 		The files are write only.
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_aux_pwr_or_ref
-							reset_aux_pwr_or_ref
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_asic_thermal
-							reset_asic_thermal
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_hotswap_or_halt
-							reset_hotswap_or_halt
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_hotswap_or_wd
-							reset_hotswap_or_wd
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_fw_reset
-							reset_fw_reset
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_long_pb
-							reset_long_pb
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_main_pwr_fail
-							reset_main_pwr_fail
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_short_pb
-							reset_short_pb
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sw_reset
 							reset_sw_reset
 Date:		June 2018
 KernelVersion:	4.19
 Contact:	Vadim Pasternak <vadimpmellanox.com>
@ -110,11 +104,10 @@ Description:	These files show the system reset cause, as following: power
 		The files are read only.
-What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_pwr_fail
-						reset_comex_pwr_fail
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_comex
-						reset_from_comex
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_system
-						reset_system
+What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_voltmon_upgrade_fail
 						reset_voltmon_upgrade_fail
 Date:		November 2018
 KernelVersion:	5.0
@ -127,3 +120,23 @@ Description:	These files show the system reset cause, as following: ComEx
 		the last reset cause.
 		The files are read only.
 Date:		June 2019
 KernelVersion:	5.3
 Contact:	Vadim Pasternak <vadimpmellanox.com>
 Description:	These files show the system reset cause, as following:
 		COMEX thermal shutdown; wathchdog power off or reset was derived
 		by one of the next components: COMEX, switch board or by Small Form
 		Factor mezzanine, reset requested from ASIC, reset cuased by BIOS
 		reload. Value 1 in file means this is reset cause, 0 - otherwise.
 		Only one of the above causes could be 1 at the same time, representing
 		only last reset cause.
 		The files are read only.
 What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal
 What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd
 What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic
 What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios
 What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd
 What:		/sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd
--- a/Documentation/ABI/testing/debugfs-cec-error-inj
+++ b/Documentation/ABI/testing/debugfs-cec-error-inj
@ -1,6 +1,6 @@
 What:		/sys/kernel/debug/cec/*/error-inj
 Date:		March 2018
-Contact:	Hans Verkuil <hans.verkuil@cisco.com>
+Contact:	Hans Verkuil <hverkuil-cisco@xs4all.nl>
 Description:
 The CEC Framework allows for CEC error injection commands through
--- a/Documentation/ABI/testing/debugfs-cros-ec
+++ b/Documentation/ABI/testing/debugfs-cros-ec
@ -0,0 +1,56 @@
 What:		/sys/kernel/debug/<cros-ec-device>/console_log
 Date:		September 2017
 KernelVersion:	4.13
 Description:
 		If the EC supports the CONSOLE_READ command type, this file
 		can be used to grab the EC logs. The kernel polls for the log
 		and keeps its own buffer but userspace should grab this and
 		write it out to some logs.
 What:		/sys/kernel/debug/<cros-ec-device>/panicinfo
 Date:		September 2017
 KernelVersion:	4.13
 Description:
 		This file dumps the EC panic information from the previous
 		reboot. This file will only exist if the PANIC_INFO command
 		type is supported by the EC.
 What:		/sys/kernel/debug/<cros-ec-device>/pdinfo
 Date:		June 2018
 KernelVersion:	4.17
 Description:
 		This file provides the port role, muxes and power debug
 		information for all the USB PD/type-C ports available. If
 		the are no ports available, this file will be just an empty
 		file.
 What:		/sys/kernel/debug/<cros-ec-device>/uptime
 Date:		June 2019
 KernelVersion:	5.3
 Description:
 		A u32 providing the time since EC booted in ms. This is
 		is used for synchronizing the AP host time with the EC
 		log. An error is returned if the command is not supported
 		by the EC or there is a communication problem.
 What:		/sys/kernel/debug/<cros-ec-device>/last_resume_result
 Date:		June 2019
 KernelVersion:	5.3
 Description:
 		Some ECs have a feature where they will track transitions to
 		the (Intel) processor's SLP_S0 line, in order to detect cases
 		where a system failed to go into S0ix. When the system resumes,
 		an EC with this feature will return a summary of SLP_S0
 		transitions that occurred. The last_resume_result file returns
 		the most recent response from the AP's resume message to the EC.
 		The bottom 31 bits contain a count of the number of SLP_S0
 		transitions that occurred since the suspend message was
 		received. Bit 31 is set if the EC attempted to wake the
 		system due to a timeout when watching for SLP_S0 transitions.
 		Callers can use this to detect a wake from the EC due to
 		S0ix timeouts. The result will be zero if no suspend
 		transitions have been attempted, or the EC does not support
 		this feature.
 		Output will be in the format: "0x%08x\n".
--- a/Documentation/ABI/testing/debugfs-driver-habanalabs
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs
@ -3,7 +3,10 @@ Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
 Description:    Sets the device address to be used for read or write through
-                PCI bar. The acceptable value is a string that starts with "0x"
+                PCI bar, or the device VA of a host mapped memory to be read or
                written directly from the host. The latter option is allowed
                only when the IOMMU is disabled.
                The acceptable value is a string that starts with "0x"
 What:           /sys/kernel/debug/habanalabs/hl<n>/command_buffers
 Date:           Jan 2019
@ -33,10 +36,12 @@ Contact:        oded.gabbay@gmail.com
 Description:    Allows the root user to read or write directly through the
                device's PCI bar. Writing to this file generates a write
                transaction while reading from the file generates a read
-                transcation. This custom interface is needed (instead of using
+                transaction. This custom interface is needed (instead of using
                the generic Linux user-space PCI mapping) because the DDR bar
                is very small compared to the DDR memory and only the driver can
-                move the bar before and after the transaction
+                move the bar before and after the transaction.
                If the IOMMU is disabled, it also allows the root user to read
                or write from the host a device VA of a host mapped memory
 What:           /sys/kernel/debug/habanalabs/hl<n>/device
 Date:           Jan 2019
@ -46,6 +51,13 @@ Description:    Enables the root user to set the device to specific state.
                Valid values are "disable", "enable", "suspend", "resume".
                User can read this property to see the valid values
 What:           /sys/kernel/debug/habanalabs/hl<n>/engines
 Date:           Jul 2019
 KernelVersion:  5.3
 Contact:        oded.gabbay@gmail.com
 Description:    Displays the status registers values of the device engines and
                their derived idle status
 What:           /sys/kernel/debug/habanalabs/hl<n>/i2c_addr
 Date:           Jan 2019
 KernelVersion:  5.1
--- a/Documentation/ABI/testing/debugfs-wilco-ec
+++ b/Documentation/ABI/testing/debugfs-wilco-ec
@ -23,11 +23,9 @@ Description:
 		For writing, bytes 0-1 indicate the message type, one of enum
 		wilco_ec_msg_type. Byte 2+ consist of the data passed in the
-		request, starting at MBOX[0]
+		request, starting at MBOX[0]. At least three bytes are required
-
+		for writing, two for the type and at least a single byte of
-		At least three bytes are required for writing, two for the type
+		data.
 		and at least a single byte of data. Only the first
 		EC_MAILBOX_DATA_SIZE bytes of MBOX will be used.
 		Example:
 		// Request EC info type 3 (EC firmware build date)
@ -40,7 +38,7 @@ Description:
 		$ cat /sys/kernel/debug/wilco_ec/raw
 		00 00 31 32 2f 32 31 2f 31 38 00 38 00 01 00 2f 00  ..12/21/18.8...
-		Note that the first 32 bytes of the received MBOX[] will be
+		Note that the first 16 bytes of the received MBOX[] will be
-		printed, even if some of the data is junk. It is up to you to
+		printed, even if some of the data is junk, and skipping bytes
-		know how many of the first bytes of data are the actual
+		17 to 32. It is up to you to know how many of the first bytes of
-		response.
+		data are the actual response.
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@ -24,11 +24,11 @@ Description:
 				[euid=] [fowner=] [fsname=]]
 			lsm:	[[subj_user=] [subj_role=] [subj_type=]
 				 [obj_user=] [obj_role=] [obj_type=]]
-			option:	[[appraise_type=]] [permit_directio]
+			option:	[[appraise_type=]] [template=] [permit_directio]
 		base: 	func:= [BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK]
 				[FIRMWARE_CHECK]
 				[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
 				[KEXEC_CMDLINE]
 			mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
 			       [[^]MAY_EXEC]
 			fsmagic:= hex value
@ -38,6 +38,8 @@ Description:
 			fowner:= decimal value
 		lsm:  	are LSM specific
 		option:	appraise_type:= [imasig]
 			template:= name of a defined IMA template type
 			(eg, ima-ng). Only valid when action is "measure".
 			pcr:= decimal value
 		default policy:
--- a/Documentation/ABI/testing/procfs-diskstats
+++ b/Documentation/ABI/testing/procfs-diskstats
@ -29,4 +29,4 @@ Description:
 		17 - sectors discarded
 		18 - time spent discarding
-		For more details refer to Documentation/iostats.txt
+		For more details refer to Documentation/admin-guide/iostats.rst
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@ -3,18 +3,28 @@ Date:		August 2017
 Contact:	Daniel Colascione <dancol@google.com>
 Description:
 		This file provides pre-summed memory information for a
-		process.  The format is identical to /proc/pid/smaps,
+		process.  The format is almost identical to /proc/pid/smaps,
 		except instead of an entry for each VMA in a process,
 		smaps_rollup has a single entry (tagged "[rollup]")
 		for which each field is the sum of the corresponding
 		fields from all the maps in /proc/pid/smaps.
-		For more details, see the procfs man page.
+		Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
 		are not present in /proc/pid/smaps.  These fields represent
 		the sum of the Pss field of each type (anon, file, shmem).
 		For more details, see Documentation/filesystems/proc.txt
 		and the procfs man page.
 		Typical output looks like this:
 		00100000-ff709000 ---p 00000000 00:00 0		 [rollup]
 		Size:               1192 kB
 		KernelPageSize:        4 kB
 		MMUPageSize:           4 kB
 		Rss:		     884 kB
 		Pss:		     385 kB
 		Pss_Anon:	     301 kB
 		Pss_File:	      80 kB
 		Pss_Shmem:	       4 kB
 		Shared_Clean:	     696 kB
 		Shared_Dirty:	       0 kB
 		Private_Clean:	     120 kB
--- a/Documentation/ABI/testing/pstore
+++ b/Documentation/ABI/testing/pstore
@ -1,6 +1,6 @@
-Where:		/sys/fs/pstore/... (or /dev/pstore/...)
+What:		/sys/fs/pstore/... (or /dev/pstore/...)
 Date:		March 2011
-Kernel Version: 2.6.39
+KernelVersion: 2.6.39
 Contact:	tony.luck@intel.com
 Description:	Generic interface to platform dependent persistent storage.
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@ -15,7 +15,7 @@ Description:
 		 9 - I/Os currently in progress
 		10 - time spent doing I/Os (ms)
 		11 - weighted time spent doing I/Os (ms)
-		For more details refer Documentation/iostats.txt
+		For more details refer Documentation/admin-guide/iostats.rst
 What:		/sys/block/<disk>/<part>/stat
--- a/Documentation/ABI/testing/sysfs-block-device
+++ b/Documentation/ABI/testing/sysfs-block-device
@ -45,7 +45,7 @@ Description:
 		- Values below -2 are rejected with -EINVAL
 		For more information, see
-		Documentation/laptops/disk-shock-protection.txt
+		Documentation/admin-guide/laptops/disk-shock-protection.rst
 What:		/sys/block/*/device/ncq_prio_enable
--- a/Documentation/ABI/testing/sysfs-bus-css
+++ b/Documentation/ABI/testing/sysfs-bus-css
@ -33,3 +33,26 @@ Description:	Contains the PIM/PAM/POM values, as reported by the
 		in sync with the values current in the channel subsystem).
 		Note: This is an I/O-subchannel specific attribute.
 Users:		s390-tools, HAL
 What:		/sys/bus/css/devices/.../driver_override
 Date:		June 2019
 Contact:	Cornelia Huck <cohuck@redhat.com>
 		linux-s390@vger.kernel.org
 Description:	This file allows the driver for a device to be specified. When
 		specified, only a driver with a name matching the value written
 		to driver_override will have an opportunity to bind to the
 		device. The override is specified by writing a string to the
 		driver_override file (echo vfio-ccw > driver_override) and
 		may be cleared with an empty string (echo > driver_override).
 		This returns the device to standard matching rules binding.
 		Writing to driver_override does not automatically unbind the
 		device from its current driver or make any attempt to
 		automatically load the specified driver.  If no driver with a
 		matching name is currently loaded in the kernel, the device
 		will not bind to any driver.  This also allows devices to
 		opt-out of driver binding using a driver_override name such as
 		"none".  Only a single driver may be specified in the override,
 		there is no support for parsing delimiters.
 		Note that unlike the mechanism of the same name for pci, this
 		file does not allow to override basic matching rules. I.e.,
 		the driver must still match the subchannel type of the device.
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
@ -1,6 +1,6 @@
-Where:		/sys/bus/event_source/devices/<dev>/format
+What:		/sys/bus/event_source/devices/<dev>/format
 Date:		January 2012
-Kernel Version: 3.3
+KernelVersion: 3.3
 Contact:	Jiri Olsa <jolsa@redhat.com>
 Description:
 		Attribute group to describe the magic bits that go into
--- a/Documentation/ABI/testing/sysfs-bus-i2c-devices-hm6352
+++ b/Documentation/ABI/testing/sysfs-bus-i2c-devices-hm6352
@ -1,20 +1,20 @@
-Where:		/sys/bus/i2c/devices/.../heading0_input
+What:		/sys/bus/i2c/devices/.../heading0_input
 Date:		April 2010
-Kernel Version: 2.6.36?
+KernelVersion: 2.6.36?
 Contact:	alan.cox@intel.com
 Description:	Reports the current heading from the compass as a floating
 		point value in degrees.
-Where:		/sys/bus/i2c/devices/.../power_state
+What:		/sys/bus/i2c/devices/.../power_state
 Date:		April 2010
-Kernel Version: 2.6.36?
+KernelVersion: 2.6.36?
 Contact:	alan.cox@intel.com
 Description:	Sets the power state of the device. 0 sets the device into
 		sleep mode, 1 wakes it up.
-Where:		/sys/bus/i2c/devices/.../calibration
+What:		/sys/bus/i2c/devices/.../calibration
 Date:		April 2010
-Kernel Version: 2.6.36?
+KernelVersion: 2.6.36?
 Contact:	alan.cox@intel.com
 Description:	Sets the calibration on or off (1 = on, 0 = off). See the
 		chip data sheet.
--- a/Documentation/ABI/testing/sysfs-bus-iio
+++ b/Documentation/ABI/testing/sysfs-bus-iio
@ -61,8 +61,11 @@ What:		/sys/bus/iio/devices/triggerX/sampling_frequency_available
 KernelVersion:	2.6.35
 Contact:	linux-iio@vger.kernel.org
 Description:
-		When the internal sampling clock can only take a small
+		When the internal sampling clock can only take a specific set of
-		discrete set of values, this file lists those available.
+		frequencies, we can specify the available values with:
 		- a small discrete set of values like "0 2 4 6 8"
 		- a range with minimum, step and maximum frequencies like
 		  "[min step max]"
 What:		/sys/bus/iio/devices/iio:deviceX/oversampling_ratio
 KernelVersion:	2.6.38
--- a/Documentation/ABI/testing/sysfs-bus-iio-cros-ec
+++ b/Documentation/ABI/testing/sysfs-bus-iio-cros-ec
@ -18,11 +18,11 @@ Description:
 		values are 'base' and 'lid'.
 What:		/sys/bus/iio/devices/iio:deviceX/id
-Date:		Septembre 2017
+Date:		September 2017
 KernelVersion:	4.14
 Contact:	linux-iio@vger.kernel.org
 Description:
-		This attribute is exposed by the CrOS EC legacy accelerometer
+		This attribute is exposed by the CrOS EC sensors driver and
-		driver and represents the sensor ID as exposed by the EC. This
+		represents the sensor ID as exposed by the EC. This ID is used
-		ID is used by the Android sensor service hardware abstraction
+		by the Android sensor service hardware abstraction layer (sensor
-		layer (sensor HAL) through the Android container on ChromeOS.
+		HAL) through the Android container on ChromeOS.
--- a/Documentation/ABI/testing/sysfs-bus-iio-distance-srf08
+++ b/Documentation/ABI/testing/sysfs-bus-iio-distance-srf08
@ -1,4 +1,4 @@
-What		/sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
+What:		/sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
 Date:		January 2017
 KernelVersion:	4.11
 Contact:	linux-iio@vger.kernel.org
@ -6,7 +6,7 @@ Description:
 		Show or set the gain boost of the amp, from 0-31 range.
 		default 31
-What		/sys/bus/iio/devices/iio:deviceX/sensor_max_range
+What:		/sys/bus/iio/devices/iio:deviceX/sensor_max_range
 Date:		January 2017
 KernelVersion:	4.11
 Contact:	linux-iio@vger.kernel.org
--- a/Documentation/ABI/testing/sysfs-bus-iio-frequency-adf4371
+++ b/Documentation/ABI/testing/sysfs-bus-iio-frequency-adf4371
@ -0,0 +1,44 @@
 What:		/sys/bus/iio/devices/iio:deviceX/out_altvoltageY_frequency
 KernelVersion:
 Contact:	linux-iio@vger.kernel.org
 Description:
 		Stores the PLL frequency in Hz for channel Y.
 		Reading returns the actual frequency in Hz.
 		The ADF4371 has an integrated VCO with fundamendal output
 		frequency ranging from 4000000000 Hz 8000000000 Hz.
 		out_altvoltage0_frequency:
 			A divide by 1, 2, 4, 8, 16, 32 or circuit generates
 			frequencies from 62500000 Hz to 8000000000 Hz.
 		out_altvoltage1_frequency:
 			This channel duplicates the channel 0 frequency
 		out_altvoltage2_frequency:
 			A frequency doubler generates frequencies from
 			8000000000 Hz to 16000000000 Hz.
 		out_altvoltage3_frequency:
 			A frequency quadrupler generates frequencies from
 			16000000000 Hz to 32000000000 Hz.
 		Note: writes to one of the channels will affect the frequency of
 		all the other channels, since it involves changing the VCO
 		fundamental output frequency.
 What:		/sys/bus/iio/devices/iio:deviceX/out_altvoltageY_name
 KernelVersion:
 Contact:	linux-iio@vger.kernel.org
 Description:
 		Reading returns the datasheet name for channel Y:
 		out_altvoltage0_name: RF8x
 		out_altvoltage1_name: RFAUX8x
 		out_altvoltage2_name: RF16x
 		out_altvoltage3_name: RF32x
 What:		/sys/bus/iio/devices/iio:deviceX/out_altvoltageY_powerdown
 KernelVersion:
 Contact:	linux-iio@vger.kernel.org
 Description:
 		This attribute allows the user to power down the PLL and it's
 		RFOut buffers.
 		Writing 1 causes the specified channel to power down.
 		Clearing returns to normal operation.
--- a/Documentation/ABI/testing/sysfs-bus-iio-proximity-as3935
+++ b/Documentation/ABI/testing/sysfs-bus-iio-proximity-as3935
@ -1,4 +1,4 @@
-What		/sys/bus/iio/devices/iio:deviceX/in_proximity_input
+What:		/sys/bus/iio/devices/iio:deviceX/in_proximity_input
 Date:		March 2014
 KernelVersion:	3.15
 Contact:	Matt Ranostay <matt.ranostay@konsulko.com>
@ -6,7 +6,7 @@ Description:
 		Get the current distance in meters of storm (1km steps)
 		1000-40000 = distance in meters
-What		/sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
+What:		/sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
 Date:		March 2014
 KernelVersion:	3.15
 Contact:	Matt Ranostay <matt.ranostay@konsulko.com>
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
@ -9,9 +9,9 @@ errors may be "seen" / reported by the link partner and not the
 problematic endpoint itself (which may report all counters as 0 as it never
 saw any problems).
-Where:		/sys/bus/pci/devices/<dev>/aer_dev_correctable
+What:		/sys/bus/pci/devices/<dev>/aer_dev_correctable
 Date:		July 2018
-Kernel Version: 4.19.0
+KernelVersion: 4.19.0
 Contact:	linux-pci@vger.kernel.org, rajatja@google.com
 Description:	List of correctable errors seen and reported by this
 		PCI device using ERR_COR. Note that since multiple errors may
@ -31,9 +31,9 @@ Header Log Overflow 0
 TOTAL_ERR_COR 2
 -------------------------------------------------------------------------
-Where:		/sys/bus/pci/devices/<dev>/aer_dev_fatal
+What:		/sys/bus/pci/devices/<dev>/aer_dev_fatal
 Date:		July 2018
-Kernel Version: 4.19.0
+KernelVersion: 4.19.0
 Contact:	linux-pci@vger.kernel.org, rajatja@google.com
 Description:	List of uncorrectable fatal errors seen and reported by this
 		PCI device using ERR_FATAL. Note that since multiple errors may
@ -62,9 +62,9 @@ TLP Prefix Blocked Error 0
 TOTAL_ERR_FATAL 0
 -------------------------------------------------------------------------
-Where:		/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
+What:		/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
 Date:		July 2018
-Kernel Version: 4.19.0
+KernelVersion: 4.19.0
 Contact:	linux-pci@vger.kernel.org, rajatja@google.com
 Description:	List of uncorrectable nonfatal errors seen and reported by this
 		PCI device using ERR_NONFATAL. Note that since multiple errors
@ -103,20 +103,20 @@ collectors) that are AER capable. These indicate the number of error messages as
 device, so these counters include them and are thus cumulative of all the error
 messages on the PCI hierarchy originating at that root port.
-Where:		/sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor
+What:		/sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor
 Date:		July 2018
-Kernel Version: 4.19.0
+KernelVersion: 4.19.0
 Contact:	linux-pci@vger.kernel.org, rajatja@google.com
 Description:	Total number of ERR_COR messages reported to rootport.
-Where:	    /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal
+What:	    /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal
 Date:		July 2018
-Kernel Version: 4.19.0
+KernelVersion: 4.19.0
 Contact:	linux-pci@vger.kernel.org, rajatja@google.com
 Description:	Total number of ERR_FATAL messages reported to rootport.
-Where:	    /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal
+What:	    /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal
 Date:		July 2018
-Kernel Version: 4.19.0
+KernelVersion: 4.19.0
 Contact:	linux-pci@vger.kernel.org, rajatja@google.com
 Description:	Total number of ERR_NONFATAL messages reported to rootport.
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@ -1,68 +1,68 @@
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/model
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/model
 Date:		March 2009
-Kernel Version: 2.6.30
+KernelVersion: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	Displays the SCSI INQUIRY page 0 model for logical drive
 		Y of controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
 Date:		March 2009
-Kernel Version: 2.6.30
+KernelVersion: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	Displays the SCSI INQUIRY page 0 revision for logical
 		drive Y of controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
 Date:		March 2009
-Kernel Version: 2.6.30
+KernelVersion: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	Displays the SCSI INQUIRY page 83 serial number for logical
 		drive Y of controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
 Date:		March 2009
-Kernel Version: 2.6.30
+KernelVersion: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	Displays the SCSI INQUIRY page 0 vendor for logical drive
 		Y of controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
 Date:		March 2009
-Kernel Version: 2.6.30
+KernelVersion: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	A symbolic link to /sys/block/cciss!cXdY
-Where:		/sys/bus/pci/devices/<dev>/ccissX/rescan
+What:		/sys/bus/pci/devices/<dev>/ccissX/rescan
 Date:		August 2009
-Kernel Version:	2.6.31
+KernelVersion:	2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Kicks of a rescan of the controller to discover logical
 		drive topology changes.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/lunid
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/lunid
 Date:		August 2009
-Kernel Version: 2.6.31
+KernelVersion: 2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Displays the 8-byte LUN ID used to address logical
 		drive Y of controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/raid_level
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/raid_level
 Date:		August 2009
-Kernel Version: 2.6.31
+KernelVersion: 2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Displays the RAID level of logical drive Y of
 		controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/usage_count
+What:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/usage_count
 Date:		August 2009
-Kernel Version: 2.6.31
+KernelVersion: 2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Displays the usage count (number of opens) of logical drive Y
 		of controller X.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/resettable
+What:		/sys/bus/pci/devices/<dev>/ccissX/resettable
 Date:		February 2011
-Kernel Version:	2.6.38
+KernelVersion:	2.6.38
 Contact:	iss_storagedev@hp.com
 Description:	Value of 1 indicates the controller can honor the reset_devices
 		kernel parameter.  Value of 0 indicates reset_devices cannot be
@ -71,9 +71,9 @@ Description:	Value of 1 indicates the controller can honor the reset_devices
 		a dump device, as kdump requires resetting the device in order
 		to work reliably.
-Where:		/sys/bus/pci/devices/<dev>/ccissX/transport_mode
+What:		/sys/bus/pci/devices/<dev>/ccissX/transport_mode
 Date:		July 2011
-Kernel Version:	3.0
+KernelVersion:	3.0
 Contact:	iss_storagedev@hp.com
 Description:	Value of "simple" indicates that the controller has been placed
 		in "simple mode". Value of "performant" indicates that the
--- a/Documentation/ABI/testing/sysfs-bus-siox
+++ b/Documentation/ABI/testing/sysfs-bus-siox
@ -1,6 +1,6 @@
 What:		/sys/bus/siox/devices/siox-X/active
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		On reading represents the current state of the bus. If it
 		contains a "0" the bus is stopped and connected devices are
@ -12,7 +12,7 @@ Description:
 What:		/sys/bus/siox/devices/siox-X/device_add
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Write-only file. Write
@ -27,13 +27,13 @@ Description:
 What:		/sys/bus/siox/devices/siox-X/device_remove
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Write-only file. A single write removes the last device in the siox chain.
 What:		/sys/bus/siox/devices/siox-X/poll_interval_ns
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Defines the interval between two poll cycles in nano seconds.
 		Note this is rounded to jiffies on writing. On reading the current value
@ -41,33 +41,33 @@ Description:
 What:		/sys/bus/siox/devices/siox-X-Y/connected
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Read-only value. "0" means the Yth device on siox bus X isn't "connected" i.e.
 		communication with it is not ensured. "1" signals a working connection.
 What:		/sys/bus/siox/devices/siox-X-Y/inbytes
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Read-only value reporting the inbytes value provided to siox-X/device_add
 What:		/sys/bus/siox/devices/siox-X-Y/status_errors
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Counts the number of time intervals when the read status byte doesn't yield the
 		expected value.
 What:		/sys/bus/siox/devices/siox-X-Y/type
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Read-only value reporting the type value provided to siox-X/device_add.
 What:		/sys/bus/siox/devices/siox-X-Y/watchdog
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Read-only value reporting if the watchdog of the siox device is
 		active. "0" means the watchdog is not active and the device is expected to
@ -75,13 +75,13 @@ Description:
 What:		/sys/bus/siox/devices/siox-X-Y/watchdog_errors
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Read-only value reporting the number to time intervals when the
 		watchdog was active.
 What:		/sys/bus/siox/devices/siox-X-Y/outbytes
 KernelVersion:	4.16
-Contact:	Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Contact:	Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
 Description:
 		Read-only value reporting the outbytes value provided to siox-X/device_add.
--- a/Documentation/ABI/testing/sysfs-bus-usb-devices-usbsevseg
+++ b/Documentation/ABI/testing/sysfs-bus-usb-devices-usbsevseg
@ -1,14 +1,14 @@
-Where:		/sys/bus/usb/.../powered
+What:		/sys/bus/usb/.../powered
 Date:		August 2008
-Kernel Version:	2.6.26
+KernelVersion:	2.6.26
 Contact:	Harrison Metzger <harrisonmetz@gmail.com>
 Description:	Controls whether the device's display will powered.
 		A value of 0 is off and a non-zero value is on.
-Where:		/sys/bus/usb/.../mode_msb
+What:		/sys/bus/usb/.../mode_msb
-Where:		/sys/bus/usb/.../mode_lsb
+What:		/sys/bus/usb/.../mode_lsb
 Date:		August 2008
-Kernel Version:	2.6.26
+KernelVersion:	2.6.26
 Contact:	Harrison Metzger <harrisonmetz@gmail.com>
 Description:	Controls the devices display mode.
 		For a 6 character display the values are
@ -16,24 +16,24 @@ Description:	Controls the devices display mode.
 		for an 8 character display the values are
 			MSB 0x08; LSB 0xFF.
-Where:		/sys/bus/usb/.../textmode
+What:		/sys/bus/usb/.../textmode
 Date:		August 2008
-Kernel Version:	2.6.26
+KernelVersion:	2.6.26
 Contact:	Harrison Metzger <harrisonmetz@gmail.com>
 Description:	Controls the way the device interprets its text buffer.
 		raw:	each character controls its segment manually
 		hex:	each character is between 0-15
 		ascii:	each character is between '0'-'9' and 'A'-'F'.
-Where:		/sys/bus/usb/.../text
+What:		/sys/bus/usb/.../text
 Date:		August 2008
-Kernel Version:	2.6.26
+KernelVersion:	2.6.26
 Contact:	Harrison Metzger <harrisonmetz@gmail.com>
 Description:	The text (or data) for the device to display
-Where:		/sys/bus/usb/.../decimals
+What:		/sys/bus/usb/.../decimals
 Date:		August 2008
-Kernel Version:	2.6.26
+KernelVersion:	2.6.26
 Contact:	Harrison Metzger <harrisonmetz@gmail.com>
 Description:	Controls the decimal places on the device.
 		To set the nth decimal place, give this field
--- a/Documentation/ABI/testing/sysfs-class-backlight-driver-lm3533
+++ b/Documentation/ABI/testing/sysfs-class-backlight-driver-lm3533
@ -4,7 +4,7 @@ KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
 		Get the ALS output channel used as input in
-		ALS-current-control mode (0, 1), where
+		ALS-current-control mode (0, 1), where:
 		0 - out_current0 (backlight 0)
 		1 - out_current1 (backlight 1)
@ -28,7 +28,7 @@ Date:		April 2012
 KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
-		Set the brightness-mapping mode (0, 1), where
+		Set the brightness-mapping mode (0, 1), where:
 		0 - exponential mode
 		1 - linear mode
@ -38,7 +38,7 @@ Date:		April 2012
 KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
-		Set the PWM-input control mask (5 bits), where
+		Set the PWM-input control mask (5 bits), where:
 		bit 5 - PWM-input enabled in Zone 4
 		bit 4 - PWM-input enabled in Zone 3
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@ -1,6 +1,6 @@
-Note: Attributes that are shared between devices are stored in the directory
+Please note that attributes that are shared between devices are stored in
-pointed to by the symlink device/.
+the directory pointed to by the symlink device/.
-Example: The real path of the attribute /sys/class/cxl/afu0.0s/irqs_max is
+For example, the real path of the attribute /sys/class/cxl/afu0.0s/irqs_max is
 /sys/class/cxl/afu0.0s/device/irqs_max, i.e. /sys/class/cxl/afu0.0/irqs_max.
--- a/Documentation/ABI/testing/sysfs-class-devfreq
+++ b/Documentation/ABI/testing/sysfs-class-devfreq
@ -47,7 +47,7 @@ Description:
 What:		/sys/class/devfreq/.../trans_stat
 Date:		October 2012
 Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
-Descrtiption:
+Description:
 		This ABI shows the statistics of devfreq behavior on a
 		specific device. It shows the time spent in each state and
 		the number of transitions between states.
--- a/Documentation/ABI/testing/sysfs-class-led-driver-lm3533
+++ b/Documentation/ABI/testing/sysfs-class-led-driver-lm3533
@ -4,7 +4,7 @@ KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
 		Set the ALS output channel to use as input in
-		ALS-current-control mode (1, 2), where
+		ALS-current-control mode (1, 2), where:
 		1 - out_current1
 		2 - out_current2
@ -22,7 +22,7 @@ Date:		April 2012
 KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
-		Set the pattern generator fall and rise times (0..7), where
+		Set the pattern generator fall and rise times (0..7), where:
 		0 - 2048 us
 		1 - 262 ms
@ -45,7 +45,7 @@ Date:		April 2012
 KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
-		Set the brightness-mapping mode (0, 1), where
+		Set the brightness-mapping mode (0, 1), where:
 		0 - exponential mode
 		1 - linear mode
@ -55,7 +55,7 @@ Date:		April 2012
 KernelVersion:	3.5
 Contact:	Johan Hovold <jhovold@gmail.com>
 Description:
-		Set the PWM-input control mask (5 bits), where
+		Set the PWM-input control mask (5 bits), where:
 		bit 5 - PWM-input enabled in Zone 4
 		bit 4 - PWM-input enabled in Zone 3
--- a/Documentation/ABI/testing/sysfs-class-leds-gt683r
+++ b/Documentation/ABI/testing/sysfs-class-leds-gt683r
@ -5,7 +5,7 @@ Contact:	Janne Kanniainen <janne.kanniainen@gmail.com>
 Description:
 		Set the mode of LEDs. You should notice that changing the mode
 		of one LED will update the mode of its two sibling devices as
-		well.
+		well. Possible values are:
 		0 - normal
 		1 - audio
@ -13,4 +13,4 @@ Description:
 		Normal: LEDs are fully on when enabled
 		Audio:  LEDs brightness depends on sound level
-		Breathing: LEDs brightness varies at human breathing rate
+		Breathing: LEDs brightness varies at human breathing rate
--- a/Documentation/ABI/testing/sysfs-class-net-phydev
+++ b/Documentation/ABI/testing/sysfs-class-net-phydev
@ -41,3 +41,11 @@ Description:
 		xgmii, moca, qsgmii, trgmii, 1000base-x, 2500base-x, rxaui,
 		xaui, 10gbase-kr, unknown
 What:		/sys/class/mdio_bus/<bus>/<device>/phy_standalone
 Date:		May 2019
 KernelVersion:	5.3
 Contact:	netdev@vger.kernel.org
 Description:
 		Boolean value indicating whether the PHY device is used in
 		standalone mode, without a net_device associated, by PHYLINK.
 		Attribute created only when this is the case.
--- a/Documentation/ABI/testing/sysfs-class-net-qmi
+++ b/Documentation/ABI/testing/sysfs-class-net-qmi
@ -29,7 +29,7 @@ Contact:	Bjørn Mork <bjorn@mork.no>
 Description:
 		Unsigned integer.
-		Write a number ranging from 1 to 127 to add a qmap mux
+		Write a number ranging from 1 to 254 to add a qmap mux
 		based network device, supported by recent Qualcomm based
 		modems.
@ -46,5 +46,5 @@ Contact:	Bjørn Mork <bjorn@mork.no>
 Description:
 		Unsigned integer.
-		Write a number ranging from 1 to 127 to delete a previously
+		Write a number ranging from 1 to 254 to delete a previously
 		created qmap mux based network device.
--- a/Documentation/ABI/testing/sysfs-class-power
+++ b/Documentation/ABI/testing/sysfs-class-power
@ -376,10 +376,42 @@ Description:
 		supply. Normally this is configured based on the type of
 		connection made (e.g. A configured SDP should output a maximum
 		of 500mA so the input current limit is set to the same value).
 		Use preferably input_power_limit, and for problems that can be
 		solved using power limit use input_current_limit.
 		Access: Read, Write
 		Valid values: Represented in microamps
 What:		/sys/class/power_supply/<supply_name>/input_voltage_limit
 Date:		May 2019
 Contact:	linux-pm@vger.kernel.org
 Description:
 		This entry configures the incoming VBUS voltage limit currently
 		set in the supply. Normally this is configured based on
 		system-level knowledge or user input (e.g. This is part of the
 		Pixel C's thermal management strategy to effectively limit the
 		input power to 5V when the screen is on to meet Google's skin
 		temperature targets). Note that this feature should not be
 		used for safety critical things.
 		Use preferably input_power_limit, and for problems that can be
 		solved using power limit use input_voltage_limit.
 		Access: Read, Write
 		Valid values: Represented in microvolts
 What:		/sys/class/power_supply/<supply_name>/input_power_limit
 Date:		May 2019
 Contact:	linux-pm@vger.kernel.org
 Description:
 		This entry configures the incoming power limit currently set
 		in the supply. Normally this is configured based on
 		system-level knowledge or user input. Use preferably this
 		feature to limit the incoming power and use current/voltage
 		limit only for problems that can be solved using power limit.
 		Access: Read, Write
 		Valid values: Represented in microwatts
 What:		/sys/class/power_supply/<supply_name>/online,
 Date:		May 2007
 Contact:	linux-pm@vger.kernel.org
--- a/Documentation/ABI/testing/sysfs-class-power-wilco
+++ b/Documentation/ABI/testing/sysfs-class-power-wilco
@ -0,0 +1,30 @@
 What:		/sys/class/power_supply/wilco-charger/charge_type
 Date:		April 2019
 KernelVersion:	5.2
 Description:
 		What charging algorithm to use:
 		Standard: Fully charges battery at a standard rate.
 		Adaptive: Battery settings adaptively optimized based on
 			typical battery usage pattern.
 		Fast: Battery charges over a shorter period.
 		Trickle: Extends battery lifespan, intended for users who
 			primarily use their Chromebook while connected to AC.
 		Custom: A low and high threshold percentage is specified.
 			Charging begins when level drops below
 			charge_control_start_threshold, and ceases when
 			level is above charge_control_end_threshold.
 What:		/sys/class/power_supply/wilco-charger/charge_control_start_threshold
 Date:		April 2019
 KernelVersion:	5.2
 Description:
 		Used when charge_type="Custom", as described above. Measured in
 		percentages. The valid range is [50, 95].
 What:		/sys/class/power_supply/wilco-charger/charge_control_end_threshold
 Date:		April 2019
 KernelVersion:	5.2
 Description:
 		Used when charge_type="Custom", as described above. Measured in
 		percentages. The valid range is [55, 100].
--- a/Documentation/ABI/testing/sysfs-class-powercap
+++ b/Documentation/ABI/testing/sysfs-class-powercap
@ -5,7 +5,7 @@ Contact:	linux-pm@vger.kernel.org
 Description:
 		The powercap/ class sub directory belongs to the power cap
 		subsystem. Refer to
-		Documentation/power/powercap/powercap.txt for details.
+		Documentation/power/powercap/powercap.rst for details.
 What:		/sys/class/powercap/<control type>
 Date:		September 2013
@ -147,6 +147,6 @@ What:		/sys/class/powercap/.../<power zone>/enabled
 Date:		September 2013
 KernelVersion:	3.13
 Contact:	linux-pm@vger.kernel.org
-Description
+Description:
 		This allows to enable/disable power capping at power zone level.
 		This applies to current power zone and its children.
--- a/Documentation/ABI/testing/sysfs-class-switchtec
+++ b/Documentation/ABI/testing/sysfs-class-switchtec
@ -1,6 +1,6 @@
 switchtec - Microsemi Switchtec PCI Switch Management Endpoint
-For details on this subsystem look at Documentation/switchtec.txt.
+For details on this subsystem look at Documentation/driver-api/switchtec.rst.
 What: 		/sys/class/switchtec
 Date:		05-Jan-2017
--- a/Documentation/ABI/testing/sysfs-class-uwb_rc
+++ b/Documentation/ABI/testing/sysfs-class-uwb_rc
@ -125,12 +125,6 @@ Description:
                The EUI-48 of this device in colon separated hex
                octets.
 What:           /sys/class/uwb_rc/uwbN/<EUI-48>/BPST
 Date:           July 2008
 KernelVersion:  2.6.27
 Contact:        linux-usb@vger.kernel.org
 Description:
 What:           /sys/class/uwb_rc/uwbN/<EUI-48>/IEs
 Date:           July 2008
 KernelVersion:  2.6.27
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@ -34,7 +34,7 @@ Description:	CPU topology files that describe kernel limits related to
 		present: cpus that have been identified as being present in
 		the system.
-		See Documentation/cputopology.txt for more information.
+		See Documentation/admin-guide/cputopology.rst for more information.
 What:		/sys/devices/system/cpu/probe
@ -103,7 +103,7 @@ Description:	CPU topology files that describe a logical CPU's relationship
 		thread_siblings_list: human-readable list of cpu#'s hardware
 		threads within the same core as cpu#
-		See Documentation/cputopology.txt for more information.
+		See Documentation/admin-guide/cputopology.rst for more information.
 What:		/sys/devices/system/cpu/cpuidle/current_driver
@ -137,7 +137,8 @@ Description:	Discover cpuidle policy and mechanism
 		current_governor: (RW) displays current idle policy. Users can
 		switch the governor at runtime by writing to this file.
-		See files in Documentation/cpuidle/ for more information.
+		See Documentation/admin-guide/pm/cpuidle.rst and
 		Documentation/driver-api/pm/cpuidle.rst for more information.
 What:		/sys/devices/system/cpu/cpuX/cpuidle/stateN/name
@ -538,3 +539,26 @@ Description:	Intel Energy and Performance Bias Hint (EPB)
 		This attribute is present for all online CPUs supporting the
 		Intel EPB feature.
 What:		/sys/devices/system/cpu/umwait_control
 		/sys/devices/system/cpu/umwait_control/enable_c02
 		/sys/devices/system/cpu/umwait_control/max_time
 Date:		May 2019
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Umwait control
 		enable_c02: Read/write interface to control umwait C0.2 state
 			Read returns C0.2 state status:
 				0: C0.2 is disabled
 				1: C0.2 is enabled
 			Write 'y' or '1'  or 'on' to enable C0.2 state.
 			Write 'n' or '0'  or 'off' to disable C0.2 state.
 			The interface is case insensitive.
 		max_time: Read/write interface to control umwait maximum time
 			  in TSC-quanta that the CPU can reside in either C0.1
 			  or C0.2 state. The time is an unsigned 32-bit number.
 			  Note that a value of zero means there is no limit.
 			  Low order two bits must be zero.
--- a/Documentation/ABI/testing/sysfs-driver-altera-cvp
+++ b/Documentation/ABI/testing/sysfs-driver-altera-cvp
@ -1,6 +1,6 @@
 What:		/sys/bus/pci/drivers/altera-cvp/chkcfg
 Date:		May 2017
-Kernel Version:	4.13
+KernelVersion:	4.13
 Contact:	Anatolij Gustschin <agust@denx.de>
 Description:
 		Contains either 1 or 0 and controls if configuration
--- a/Documentation/ABI/testing/sysfs-driver-habanalabs
+++ b/Documentation/ABI/testing/sysfs-driver-habanalabs
@ -62,18 +62,20 @@ What:           /sys/class/habanalabs/hl<n>/ic_clk
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
-Description:    Allows the user to set the maximum clock frequency of the
+Description:    Allows the user to set the maximum clock frequency, in Hz, of
-                Interconnect fabric. Writes to this parameter affect the device
+                the Interconnect fabric. Writes to this parameter affect the
-                only when the power management profile is set to "manual" mode.
+                device only when the power management profile is set to "manual"
-                The device IC clock might be set to lower value then the
+                mode. The device IC clock might be set to lower value than the
                maximum. The user should read the ic_clk_curr to see the actual
-                frequency value of the IC
+                frequency value of the IC. This property is valid only for the
                Goya ASIC family
 What:           /sys/class/habanalabs/hl<n>/ic_clk_curr
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
-Description:    Displays the current clock frequency of the Interconnect fabric
+Description:    Displays the current clock frequency, in Hz, of the Interconnect
                fabric. This property is valid only for the Goya ASIC family
 What:           /sys/class/habanalabs/hl<n>/infineon_ver
 Date:           Jan 2019
@ -92,18 +94,20 @@ What:           /sys/class/habanalabs/hl<n>/mme_clk
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
-Description:    Allows the user to set the maximum clock frequency of the
+Description:    Allows the user to set the maximum clock frequency, in Hz, of
-                MME compute engine. Writes to this parameter affect the device
+                the MME compute engine. Writes to this parameter affect the
-                only when the power management profile is set to "manual" mode.
+                device only when the power management profile is set to "manual"
-                The device MME clock might be set to lower value then the
+                mode. The device MME clock might be set to lower value than the
                maximum. The user should read the mme_clk_curr to see the actual
-                frequency value of the MME
+                frequency value of the MME. This property is valid only for the
                Goya ASIC family
 What:           /sys/class/habanalabs/hl<n>/mme_clk_curr
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
-Description:    Displays the current clock frequency of the MME compute engine
+Description:    Displays the current clock frequency, in Hz, of the MME compute
                engine. This property is valid only for the Goya ASIC family
 What:           /sys/class/habanalabs/hl<n>/pci_addr
 Date:           Jan 2019
@ -163,18 +167,20 @@ What:           /sys/class/habanalabs/hl<n>/tpc_clk
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
-Description:    Allows the user to set the maximum clock frequency of the
+Description:    Allows the user to set the maximum clock frequency, in Hz, of
-                TPC compute engines. Writes to this parameter affect the device
+                the TPC compute engines. Writes to this parameter affect the
-                only when the power management profile is set to "manual" mode.
+                device only when the power management profile is set to "manual"
-                The device TPC clock might be set to lower value then the
+                mode. The device TPC clock might be set to lower value than the
                maximum. The user should read the tpc_clk_curr to see the actual
-                frequency value of the TPC
+                frequency value of the TPC. This property is valid only for
                Goya ASIC family
 What:           /sys/class/habanalabs/hl<n>/tpc_clk_curr
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        oded.gabbay@gmail.com
-Description:    Displays the current clock frequency of the TPC compute engines
+Description:    Displays the current clock frequency, in Hz, of the TPC compute
                engines. This property is valid only for the Goya ASIC family
 What:           /sys/class/habanalabs/hl<n>/uboot_ver
 Date:           Jan 2019
--- a/Documentation/ABI/testing/sysfs-driver-hid
+++ b/Documentation/ABI/testing/sysfs-driver-hid
@ -1,6 +1,6 @@
-What:		For USB devices	: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
+What:		/sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
-		For BT devices	: /sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
+What:		/sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
-		Symlink		: /sys/class/hidraw/hidraw<num>/device/report_descriptor
+What:		/sys/class/hidraw/hidraw<num>/device/report_descriptor
 Date:		Jan 2011
 KernelVersion:	2.0.39
 Contact:	Alan Ott <alan@signal11.us>
@ -9,9 +9,9 @@ Description:	When read, this file returns the device's raw binary HID
 		This file cannot be written.
 Users:		HIDAPI library (http://www.signal11.us/oss/hidapi)
-What:		For USB devices	: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
+What:		/sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
-		For BT devices	: /sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
+What:		/sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
-		Symlink		: /sys/class/hidraw/hidraw<num>/device/country
+What:		/sys/class/hidraw/hidraw<num>/device/country
 Date:		February 2015
 KernelVersion:	3.19
 Contact:	Olivier Gay <ogay@logitech.com>
--- a/Documentation/ABI/testing/sysfs-driver-hid-roccat-kone
+++ b/Documentation/ABI/testing/sysfs-driver-hid-roccat-kone
@ -5,7 +5,7 @@ Description:	It is possible to switch the dpi setting of the mouse with the
 		press of a button.
 		When read, this file returns the raw number of the actual dpi
 		setting reported by the mouse. This number has to be further
-		processed to receive the real dpi value.
+		processed to receive the real dpi value:
 		VALUE DPI
 		1     800
--- a/Documentation/ABI/testing/sysfs-driver-ppi
+++ b/Documentation/ABI/testing/sysfs-driver-ppi
@ -1,6 +1,6 @@
 What:		/sys/class/tpm/tpmX/ppi/
 Date:		August 2012
-Kernel Version:	3.6
+KernelVersion:	3.6
 Contact:	xiaoyan.zhang@intel.com
 Description:
 		This folder includes the attributes related with PPI (Physical
--- a/Documentation/ABI/testing/sysfs-driver-st
+++ b/Documentation/ABI/testing/sysfs-driver-st
@ -1,6 +1,6 @@
 What:		/sys/bus/scsi/drivers/st/debug_flag
 Date:		October 2015
-Kernel Version:	?.?
+KernelVersion:	?.?
 Contact:	shane.seymour@hpe.com
 Description:
 		This file allows you to turn debug output from the st driver
--- a/Documentation/ABI/testing/sysfs-driver-wacom
+++ b/Documentation/ABI/testing/sysfs-driver-wacom
@ -1,6 +1,6 @@
 What:		/sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/speed
 Date:		April 2010
-Kernel Version:	2.6.35
+KernelVersion:	2.6.35
 Contact:	linux-bluetooth@vger.kernel.org
 Description:
 		The /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/speed file
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@ -243,3 +243,11 @@ Description:
 		 - Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
 		 - [h] means add/del hot file extension
 		 - [c] means add/del cold file extension
 What:		/sys/fs/f2fs/<disk>/unusable
 Date		April 2019
 Contact:	"Daniel Rosenberg" <drosen@google.com>
 Description:
 		If checkpoint=disable, it displays the number of blocks that are unusable.
                If checkpoint=enable it displays the enumber of blocks that would be unusable
                if checkpoint=disable were to be set.
--- a/Documentation/ABI/testing/sysfs-kernel-fscaps
+++ b/Documentation/ABI/testing/sysfs-kernel-fscaps
@ -2,7 +2,7 @@ What:		/sys/kernel/fscaps
 Date:		February 2011
 KernelVersion:	2.6.38
 Contact:	Ludwig Nussel <ludwig.nussel@suse.de>
-Description
+Description:
 		Shows whether file system capabilities are honored
 		when executing a binary
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@ -24,3 +24,12 @@ Description:    /sys/kernel/iommu_groups/reserved_regions list IOVA
 		region is described on a single line: the 1st field is
 		the base IOVA, the second is the end IOVA and the third
 		field describes the type of the region.
 What:		/sys/kernel/iommu_groups/reserved_regions
 Date: 		June 2019
 KernelVersion:  v5.3
 Contact: 	Eric Auger <eric.auger@redhat.com>
 Description:    In case an RMRR is used only by graphics or USB devices
 		it is now exposed as "direct-relaxable" instead of "direct".
 		In device assignment use case, for instance, those RMRR
 		are considered to be relaxable and safe.
--- a/Documentation/ABI/testing/sysfs-kernel-uids
+++ b/Documentation/ABI/testing/sysfs-kernel-uids
@ -11,4 +11,4 @@ Description:
 		example would be, if User A has shares = 1024 and user
 		B has shares = 2048, User B will get twice the CPU
 		bandwidth user A will. For more details refer
-		Documentation/scheduler/sched-design-CFS.txt
+		Documentation/scheduler/sched-design-CFS.rst
--- a/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo
+++ b/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo
@ -4,7 +4,7 @@ KernelVersion:	2.6.24
 Contact:	Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
 		Kexec Mailing List <kexec@lists.infradead.org>
 		Vivek Goyal <vgoyal@redhat.com>
-Description
+Description:
 		Shows physical address and size of vmcoreinfo ELF note.
 		First value contains physical address of note in hex and
 		second value contains the size of note in hex. This ELF
--- a/Documentation/ABI/testing/sysfs-platform-asus-laptop
+++ b/Documentation/ABI/testing/sysfs-platform-asus-laptop
@ -31,7 +31,7 @@ Description:
 		To control the LED display, use the following :
 		    echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
 		where T control the 3 letters display, and DDD the 3 digits display.
-		The DDD table can be found in Documentation/laptops/asus-laptop.txt
+		The DDD table can be found in Documentation/admin-guide/laptops/asus-laptop.rst
 What:		/sys/devices/platform/asus_laptop/bluetooth
 Date:		January 2007
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@ -36,3 +36,13 @@ KernelVersion:	3.5
 Contact:	"AceLan Kao" <acelan.kao@canonical.com>
 Description:
 		Resume on lid open. 1 means on, 0 means off.
 What:		/sys/devices/platform/<platform>/fan_boost_mode
 Date:		Sep 2019
 KernelVersion:	5.3
 Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
 Description:
 		Fan boost mode:
 			* 0 - normal,
 			* 1 - overboost,
 			* 2 - silent
--- a/Documentation/ABI/testing/sysfs-platform-i2c-demux-pinctrl
+++ b/Documentation/ABI/testing/sysfs-platform-i2c-demux-pinctrl
@ -1,7 +1,7 @@
 What:		/sys/devices/platform/<i2c-demux-name>/available_masters
 Date:		January 2016
 KernelVersion:	4.6
-Contact:	Wolfram Sang <wsa@the-dreams.de>
+Contact:	Wolfram Sang <wsa+renesas@sang-engineering.com>
 Description:
 		Reading the file will give you a list of masters which can be
 		selected for a demultiplexed bus. The format is
@ -12,7 +12,7 @@ Description:
 What:		/sys/devices/platform/<i2c-demux-name>/current_master
 Date:		January 2016
 KernelVersion:	4.6
-Contact:	Wolfram Sang <wsa@the-dreams.de>
+Contact:	Wolfram Sang <wsa+renesas@sang-engineering.com>
 Description:
 		This file selects/shows the active I2C master for a demultiplexed
 		bus. It uses the <index> value from the file 'available_masters'.
--- a/Documentation/ABI/testing/sysfs-platform-wilco-ec
+++ b/Documentation/ABI/testing/sysfs-platform-wilco-ec
@ -0,0 +1,40 @@
 What:		/sys/bus/platform/devices/GOOG000C\:00/boot_on_ac
 Date:		April 2019
 KernelVersion:	5.3
 Description:
 		Boot on AC is a policy which makes the device boot from S5
 		when AC power is connected. This is useful for users who
 		want to run their device headless or with a dock.
 		Input should be parseable by kstrtou8() to 0 or 1.
 What:          /sys/bus/platform/devices/GOOG000C\:00/build_date
 Date:          May 2019
 KernelVersion: 5.3
 Description:
               Display Wilco Embedded Controller firmware build date.
               Output will a MM/DD/YY string.
 What:          /sys/bus/platform/devices/GOOG000C\:00/build_revision
 Date:          May 2019
 KernelVersion: 5.3
 Description:
               Display Wilco Embedded Controller build revision.
               Output will a version string be similar to the example below:
               d2592cae0
 What:          /sys/bus/platform/devices/GOOG000C\:00/model_number
 Date:          May 2019
 KernelVersion: 5.3
 Description:
               Display Wilco Embedded Controller model number.
               Output will a version string be similar to the example below:
               08B6
 What:          /sys/bus/platform/devices/GOOG000C\:00/version
 Date:          May 2019
 KernelVersion: 5.3
 Description:
               Display Wilco Embedded Controller firmware version.
               The format of the string is x.y.z. Where x is major, y is minor
               and z is the build number. For example: 95.00.06
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@ -300,4 +300,4 @@ Description:
 		attempt.
 		Using this sysfs file will override any values that were
-		set using the kernel command line for disk offset.
+		set using the kernel command line for disk offset.
--- a/Documentation/COPYING-logo
+++ b/Documentation/COPYING-logo
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@ -212,7 +212,7 @@ The standard 64-bit addressing device would do something like this::
 If the device only supports 32-bit addressing for descriptors in the
 coherent allocations, but supports full 64-bits for streaming mappings
-it would look like this:
+it would look like this::
 	if (dma_set_mask(dev, DMA_BIT_MASK(64))) {
 		dev_warn(dev, "mydev: No suitable DMA available\n");
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@ -198,7 +198,7 @@ call to set the mask to the value returned.
 ::
 	size_t
-	dma_direct_max_mapping_size(struct device *dev);
+	dma_max_mapping_size(struct device *dev);
 Returns the maximum size of a mapping for the device. The size parameter
 of the mapping functions like dma_map_single(), dma_map_page() and
--- a/Documentation/EDID/HOWTO.txt
+++ b/Documentation/EDID/HOWTO.txt
@ -1,49 +0,0 @@
 In the good old days when graphics parameters were configured explicitly
 in a file called xorg.conf, even broken hardware could be managed.
 Today, with the advent of Kernel Mode Setting, a graphics board is
 either correctly working because all components follow the standards -
 or the computer is unusable, because the screen remains dark after
 booting or it displays the wrong area. Cases when this happens are:
 - The graphics board does not recognize the monitor.
 - The graphics board is unable to detect any EDID data.
 - The graphics board incorrectly forwards EDID data to the driver.
 - The monitor sends no or bogus EDID data.
 - A KVM sends its own EDID data instead of querying the connected monitor.
 Adding the kernel parameter "nomodeset" helps in most cases, but causes
 restrictions later on.
 As a remedy for such situations, the kernel configuration item
 CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
 individually prepared or corrected EDID data set in the /lib/firmware
 directory from where it is loaded via the firmware interface. The code
 (see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
 commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
 1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
 not contain code to create these data. In order to elucidate the origin
 of the built-in binary EDID blobs and to facilitate the creation of
 individual data for a specific misbehaving monitor, commented sources
 and a Makefile environment are given here.
 To create binary EDID and C source code files from the existing data
 material, simply type "make".
 If you want to create your own EDID file, copy the file 1024x768.S,
 replace the settings with your own data and add a new target to the
 Makefile. Please note that the EDID data structure expects the timing
 values in a different way as compared to the standard X11 format.
 X11:
 HTimings:  hdisp hsyncstart hsyncend htotal
 VTimings:  vdisp vsyncstart vsyncend vtotal
 EDID:
 #define XPIX hdisp
 #define XBLANK htotal-hdisp
 #define XOFFSET hsyncstart-hdisp
 #define XPULSE hsyncend-hsyncstart
 #define YPIX vdisp
 #define YBLANK vtotal-vdisp
 #define YOFFSET vsyncstart-vdisp
 #define YPULSE vsyncend-vsyncstart
--- a/Documentation/Kconfig
+++ b/Documentation/Kconfig
@ -0,0 +1,13 @@
 config WARN_MISSING_DOCUMENTS
 	bool "Warn if there's a missing documentation file"
 	depends on COMPILE_TEST
 	help
 	   It is not uncommon that a document gets renamed.
 	   This option makes the Kernel to check for missing dependencies,
 	   warning when something is missing. Works only if the Kernel
 	   is built from a git tree.
 	   If unsure, select 'N'.
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@ -4,6 +4,11 @@
 subdir-y := devicetree/bindings/
 # Check for broken documentation file references
 ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y)
 $(shell $(srctree)/scripts/documentation-file-ref-check --warn)
 endif
 # You can set these variables from the command line.
 SPHINXBUILD   = sphinx-build
 SPHINXOPTS    =
@ -23,11 +28,13 @@ ifeq ($(HAVE_SPHINX),0)
 .DEFAULT:
 	$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
 	@echo
-	@./scripts/sphinx-pre-install
+	@$(srctree)/scripts/sphinx-pre-install
 	@echo "  SKIP    Sphinx $@ target."
 else # HAVE_SPHINX
 export SPHINXOPTS = $(shell perl -e 'open IN,"sphinx-build --version 2>&1 |"; while (<IN>) { if (m/([\d\.]+)/) { print "-jauto" if ($$1 >= "1.7") } ;} close IN')
 # User-friendly check for pdflatex and latexmk
 HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
 HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
@ -70,12 +77,14 @@ quiet_cmd_sphinx = SPHINX  $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
 	$(abspath $(BUILDDIR)/$3/$4)
 htmldocs:
 	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
 linkcheckdocs:
 	@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
 latexdocs:
 	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
 ifeq ($(HAVE_PDFLATEX),0)
@ -87,14 +96,17 @@ pdfdocs:
 else # HAVE_PDFLATEX
 pdfdocs: latexdocs
 	@$(srctree)/scripts/sphinx-pre-install --version-check
 	$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
 endif # HAVE_PDFLATEX
 epubdocs:
 	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
 xmldocs:
 	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
 endif # HAVE_SPHINX
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@ -1,270 +0,0 @@
 		The MSI Driver Guide HOWTO
 	Tom L Nguyen tom.l.nguyen@intel.com
 			10/03/2003
 	Revised Feb 12, 2004 by Martine Silbermann
 		email: Martine.Silbermann@hp.com
 	Revised Jun 25, 2004 by Tom L Nguyen
 	Revised Jul  9, 2008 by Matthew Wilcox <willy@linux.intel.com>
 		Copyright 2003, 2008 Intel Corporation
 1. About this guide
 This guide describes the basics of Message Signaled Interrupts (MSIs),
 the advantages of using MSI over traditional interrupt mechanisms, how
 to change your driver to use MSI or MSI-X and some basic diagnostics to
 try if a device doesn't support MSIs.
 2. What are MSIs?
 A Message Signaled Interrupt is a write from the device to a special
 address which causes an interrupt to be received by the CPU.
 The MSI capability was first specified in PCI 2.2 and was later enhanced
 in PCI 3.0 to allow each interrupt to be masked individually.  The MSI-X
 capability was also introduced with PCI 3.0.  It supports more interrupts
 per device than MSI and allows interrupts to be independently configured.
 Devices may support both MSI and MSI-X, but only one can be enabled at
 a time.
 3. Why use MSIs?
 There are three reasons why using MSIs can give an advantage over
 traditional pin-based interrupts.
 Pin-based PCI interrupts are often shared amongst several devices.
 To support this, the kernel must call each interrupt handler associated
 with an interrupt, which leads to reduced performance for the system as
 a whole.  MSIs are never shared, so this problem cannot arise.
 When a device writes data to memory, then raises a pin-based interrupt,
 it is possible that the interrupt may arrive before all the data has
 arrived in memory (this becomes more likely with devices behind PCI-PCI
 bridges).  In order to ensure that all the data has arrived in memory,
 the interrupt handler must read a register on the device which raised
 the interrupt.  PCI transaction ordering rules require that all the data
 arrive in memory before the value may be returned from the register.
 Using MSIs avoids this problem as the interrupt-generating write cannot
 pass the data writes, so by the time the interrupt is raised, the driver
 knows that all the data has arrived in memory.
 PCI devices can only support a single pin-based interrupt per function.
 Often drivers have to query the device to find out what event has
 occurred, slowing down interrupt handling for the common case.  With
 MSIs, a device can support more interrupts, allowing each interrupt
 to be specialised to a different purpose.  One possible design gives
 infrequent conditions (such as errors) their own interrupt which allows
 the driver to handle the normal interrupt handling path more efficiently.
 Other possible designs include giving one interrupt to each packet queue
 in a network card or each port in a storage controller.
 4. How to use MSIs
 PCI devices are initialised to use pin-based interrupts.  The device
 driver has to set up the device to use MSI or MSI-X.  Not all machines
 support MSIs correctly, and for those machines, the APIs described below
 will simply fail and the device will continue to use pin-based interrupts.
 4.1 Include kernel support for MSIs
 To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
 option enabled.  This option is only available on some architectures,
 and it may depend on some other options also being set.  For example,
 on x86, you must also enable X86_UP_APIC or SMP in order to see the
 CONFIG_PCI_MSI option.
 4.2 Using MSI
 Most of the hard work is done for the driver in the PCI layer.  The driver
 simply has to request that the PCI layer set up the MSI capability for this
 device.
 To automatically use MSI or MSI-X interrupt vectors, use the following
 function:
  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		unsigned int max_vecs, unsigned int flags);
 which allocates up to max_vecs interrupt vectors for a PCI device.  It
 returns the number of vectors allocated or a negative error.  If the device
 has a requirements for a minimum number of vectors the driver can pass a
 min_vecs argument set to this limit, and the PCI core will return -ENOSPC
 if it can't meet the minimum number of vectors.
 The flags argument is used to specify which type of interrupt can be used
 by the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX).
 A convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for
 any possible kind of interrupt.  If the PCI_IRQ_AFFINITY flag is set,
 pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
 To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
 vectors, use the following function:
  int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
 Any allocated resources should be freed before removing the device using
 the following function:
  void pci_free_irq_vectors(struct pci_dev *dev);
 If a device supports both MSI-X and MSI capabilities, this API will use the
 MSI-X facilities in preference to the MSI facilities.  MSI-X supports any
 number of interrupts between 1 and 2048.  In contrast, MSI is restricted to
 a maximum of 32 interrupts (and must be a power of two).  In addition, the
 MSI interrupt vectors must be allocated consecutively, so the system might
 not be able to allocate as many vectors for MSI as it could for MSI-X.  On
 some platforms, MSI interrupts must all be targeted at the same set of CPUs
 whereas MSI-X interrupts can all be targeted at different CPUs.
 If a device supports neither MSI-X or MSI it will fall back to a single
 legacy IRQ vector.
 The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
 as possible, likely up to the limit supported by the device.  If nvec is
 larger than the number supported by the device it will automatically be
 capped to the supported limit, so there is no need to query the number of
 vectors supported beforehand:
 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
 	if (nvec < 0)
 		goto out_err;
 If a driver is unable or unwilling to deal with a variable number of MSI
 interrupts it can request a particular number of interrupts by passing that
 number to pci_alloc_irq_vectors() function as both 'min_vecs' and
 'max_vecs' parameters:
 	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
 	if (ret < 0)
 		goto out_err;
 The most notorious example of the request type described above is enabling
 the single MSI mode for a device.  It could be done by passing two 1s as
 'min_vecs' and 'max_vecs':
 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
 	if (ret < 0)
 		goto out_err;
 Some devices might not support using legacy line interrupts, in which case
 the driver can specify that only MSI or MSI-X is acceptable:
 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
 	if (nvec < 0)
 		goto out_err;
 4.3 Legacy APIs
 The following old APIs to enable and disable MSI or MSI-X interrupts should
 not be used in new code:
  pci_enable_msi()		/* deprecated */
  pci_disable_msi()		/* deprecated */
  pci_enable_msix_range()	/* deprecated */
  pci_enable_msix_exact()	/* deprecated */
  pci_disable_msix()		/* deprecated */
 Additionally there are APIs to provide the number of supported MSI or MSI-X
 vectors: pci_msi_vec_count() and pci_msix_vec_count().  In general these
 should be avoided in favor of letting pci_alloc_irq_vectors() cap the
 number of vectors.  If you have a legitimate special use case for the count
 of vectors we might have to revisit that decision and add a
 pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
 4.4 Considerations when using MSIs
 4.4.1 Spinlocks
 Most device drivers have a per-device spinlock which is taken in the
 interrupt handler.  With pin-based interrupts or a single MSI, it is not
 necessary to disable interrupts (Linux guarantees the same interrupt will
 not be re-entered).  If a device uses multiple interrupts, the driver
 must disable interrupts while the lock is held.  If the device sends
 a different interrupt, the driver will deadlock trying to recursively
 acquire the spinlock.  Such deadlocks can be avoided by using
 spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
 and acquire the lock (see Documentation/kernel-hacking/locking.rst).
 4.5 How to tell whether MSI/MSI-X is enabled on a device
 Using 'lspci -v' (as root) may show some devices with "MSI", "Message
 Signalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities
 has an 'Enable' flag which is followed with either "+" (enabled)
 or "-" (disabled).
 5. MSI quirks
 Several PCI chipsets or devices are known not to support MSIs.
 The PCI stack provides three ways to disable MSIs:
 1. globally
 2. on all devices behind a specific bridge
 3. on a single device
 5.1. Disabling MSIs globally
 Some host chipsets simply don't support MSIs properly.  If we're
 lucky, the manufacturer knows this and has indicated it in the ACPI
 FADT table.  In this case, Linux automatically disables MSIs.
 Some boards don't include this information in the table and so we have
 to detect them ourselves.  The complete list of these is found near the
 quirk_disable_all_msi() function in drivers/pci/quirks.c.
 If you have a board which has problems with MSIs, you can pass pci=nomsi
 on the kernel command line to disable MSIs on all devices.  It would be
 in your best interests to report the problem to linux-pci@vger.kernel.org
 including a full 'lspci -v' so we can add the quirks to the kernel.
 5.2. Disabling MSIs below a bridge
 Some PCI bridges are not able to route MSIs between busses properly.
 In this case, MSIs must be disabled on all devices behind the bridge.
 Some bridges allow you to enable MSIs by changing some bits in their
 PCI configuration space (especially the Hypertransport chipsets such
 as the nVidia nForce and Serverworks HT2000).  As with host chipsets,
 Linux mostly knows about them and automatically enables MSIs if it can.
 If you have a bridge unknown to Linux, you can enable
 MSIs in configuration space using whatever method you know works, then
 enable MSIs on that bridge by doing:
       echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
 where $bridge is the PCI address of the bridge you've enabled (eg
 0000:00:0e.0).
 To disable MSIs, echo 0 instead of 1.  Changing this value should be
 done with caution as it could break interrupt handling for all devices
 below this bridge.
 Again, please notify linux-pci@vger.kernel.org of any bridges that need
 special handling.
 5.3. Disabling MSIs on a single device
 Some devices are known to have faulty MSI implementations.  Usually this
 is handled in the individual device driver, but occasionally it's necessary
 to handle this with a quirk.  Some drivers have an option to disable use
 of MSI.  While this is a convenient workaround for the driver author,
 it is not good practice, and should not be emulated.
 5.4. Finding why MSIs are disabled on a device
 From the above three sections, you can see that there are many reasons
 why MSIs may not be enabled for a given device.  Your first step should
 be to examine your dmesg carefully to determine whether MSIs are enabled
 for your machine.  You should also check your .config to be sure you
 have enabled CONFIG_PCI_MSI.
 Then, 'lspci -t' gives the list of bridges above a device.  Reading
 /sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1)
 or disabled (0).  If 0 is found in any of the msi_bus files belonging
 to bridges between the PCI root and the device, MSIs are disabled.
 It is also worth checking the device driver to see whether it supports MSIs.
 For example, it may contain calls to pci_irq_alloc_vectors() with the
 PCI_IRQ_MSI or PCI_IRQ_MSIX flags.
--- a/Documentation/PCI/PCIEBUS-HOWTO.txt
+++ b/Documentation/PCI/PCIEBUS-HOWTO.txt
@ -1,198 +0,0 @@
 		The PCI Express Port Bus Driver Guide HOWTO
 	Tom L Nguyen tom.l.nguyen@intel.com
 			11/03/2004
 1. About this guide
 This guide describes the basics of the PCI Express Port Bus driver
 and provides information on how to enable the service drivers to
 register/unregister with the PCI Express Port Bus Driver.
 2. Copyright 2004 Intel Corporation
 3. What is the PCI Express Port Bus Driver
 A PCI Express Port is a logical PCI-PCI Bridge structure. There
 are two types of PCI Express Port: the Root Port and the Switch
 Port. The Root Port originates a PCI Express link from a PCI Express
 Root Complex and the Switch Port connects PCI Express links to
 internal logical PCI buses. The Switch Port, which has its secondary
 bus representing the switch's internal routing logic, is called the
 switch's Upstream Port. The switch's Downstream Port is bridging from
 switch's internal routing bus to a bus representing the downstream
 PCI Express link from the PCI Express Switch.
 A PCI Express Port can provide up to four distinct functions,
 referred to in this document as services, depending on its port type.
 PCI Express Port's services include native hotplug support (HP),
 power management event support (PME), advanced error reporting
 support (AER), and virtual channel support (VC). These services may
 be handled by a single complex driver or be individually distributed
 and handled by corresponding service drivers.
 4. Why use the PCI Express Port Bus Driver?
 In existing Linux kernels, the Linux Device Driver Model allows a
 physical device to be handled by only a single driver. The PCI
 Express Port is a PCI-PCI Bridge device with multiple distinct
 services. To maintain a clean and simple solution each service
 may have its own software service driver. In this case several
 service drivers will compete for a single PCI-PCI Bridge device.
 For example, if the PCI Express Root Port native hotplug service
 driver is loaded first, it claims a PCI-PCI Bridge Root Port. The
 kernel therefore does not load other service drivers for that Root
 Port. In other words, it is impossible to have multiple service
 drivers load and run on a PCI-PCI Bridge device simultaneously
 using the current driver model.
 To enable multiple service drivers running simultaneously requires
 having a PCI Express Port Bus driver, which manages all populated
 PCI Express Ports and distributes all provided service requests
 to the corresponding service drivers as required. Some key
 advantages of using the PCI Express Port Bus driver are listed below:
 	- Allow multiple service drivers to run simultaneously on
 	  a PCI-PCI Bridge Port device.
 	- Allow service drivers implemented in an independent
 	  staged approach.
 	- Allow one service driver to run on multiple PCI-PCI Bridge
 	  Port devices.
 	- Manage and distribute resources of a PCI-PCI Bridge Port
 	  device to requested service drivers.
 5. Configuring the PCI Express Port Bus Driver vs. Service Drivers
 5.1 Including the PCI Express Port Bus Driver Support into the Kernel
 Including the PCI Express Port Bus driver depends on whether the PCI
 Express support is included in the kernel config. The kernel will
 automatically include the PCI Express Port Bus driver as a kernel
 driver when the PCI Express support is enabled in the kernel.
 5.2 Enabling Service Driver Support
 PCI device drivers are implemented based on Linux Device Driver Model.
 All service drivers are PCI device drivers. As discussed above, it is
 impossible to load any service driver once the kernel has loaded the
 PCI Express Port Bus Driver. To meet the PCI Express Port Bus Driver
 Model requires some minimal changes on existing service drivers that
 imposes no impact on the functionality of existing service drivers.
 A service driver is required to use the two APIs shown below to
 register its service with the PCI Express Port Bus driver (see
 section 5.2.1 & 5.2.2). It is important that a service driver
 initializes the pcie_port_service_driver data structure, included in
 header file /include/linux/pcieport_if.h, before calling these APIs.
 Failure to do so will result an identity mismatch, which prevents
 the PCI Express Port Bus driver from loading a service driver.
 5.2.1 pcie_port_service_register
 int pcie_port_service_register(struct pcie_port_service_driver *new)
 This API replaces the Linux Driver Model's pci_register_driver API. A
 service driver should always calls pcie_port_service_register at
 module init. Note that after service driver being loaded, calls
 such as pci_enable_device(dev) and pci_set_master(dev) are no longer
 necessary since these calls are executed by the PCI Port Bus driver.
 5.2.2 pcie_port_service_unregister
 void pcie_port_service_unregister(struct pcie_port_service_driver *new)
 pcie_port_service_unregister replaces the Linux Driver Model's
 pci_unregister_driver. It's always called by service driver when a
 module exits.
 5.2.3 Sample Code
 Below is sample service driver code to initialize the port service
 driver data structure.
 static struct pcie_port_service_id service_id[] = { {
 	.vendor = PCI_ANY_ID,
 	.device = PCI_ANY_ID,
 	.port_type = PCIE_RC_PORT,
 	.service_type = PCIE_PORT_SERVICE_AER,
 	}, { /* end: all zeroes */ }
 };
 static struct pcie_port_service_driver root_aerdrv = {
 	.name		= (char *)device_name,
 	.id_table	= &service_id[0],
 	.probe		= aerdrv_load,
 	.remove		= aerdrv_unload,
 	.suspend	= aerdrv_suspend,
 	.resume		= aerdrv_resume,
 };
 Below is a sample code for registering/unregistering a service
 driver.
 static int __init aerdrv_service_init(void)
 {
 	int retval = 0;
 	retval = pcie_port_service_register(&root_aerdrv);
 	if (!retval) {
 		/*
 		 * FIX ME
 		 */
 	}
 	return retval;
 }
 static void __exit aerdrv_service_exit(void)
 {
 	pcie_port_service_unregister(&root_aerdrv);
 }
 module_init(aerdrv_service_init);
 module_exit(aerdrv_service_exit);
 6. Possible Resource Conflicts
 Since all service drivers of a PCI-PCI Bridge Port device are
 allowed to run simultaneously, below lists a few of possible resource
 conflicts with proposed solutions.
 6.1 MSI and MSI-X Vector Resource
 Once MSI or MSI-X interrupts are enabled on a device, it stays in this
 mode until they are disabled again.  Since service drivers of the same
 PCI-PCI Bridge port share the same physical device, if an individual
 service driver enables or disables MSI/MSI-X mode it may result
 unpredictable behavior.
 To avoid this situation all service drivers are not permitted to
 switch interrupt mode on its device. The PCI Express Port Bus driver
 is responsible for determining the interrupt mode and this should be
 transparent to service drivers. Service drivers need to know only
 the vector IRQ assigned to the field irq of struct pcie_device, which
 is passed in when the PCI Express Port Bus driver probes each service
 driver. Service drivers should use (struct pcie_device*)dev->irq to
 call request_irq/free_irq. In addition, the interrupt mode is stored
 in the field interrupt_mode of struct pcie_device.
 6.3 PCI Memory/IO Mapped Regions
 Service drivers for PCI Express Power Management (PME), Advanced
 Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
 PCI configuration space on the PCI Express port. In all cases the
 registers accessed are independent of each other. This patch assumes
 that all service drivers will be well behaved and not overwrite
 other service driver's configuration settings.
 6.4 PCI Config Registers
 Each service driver runs its PCI config operations on its own
 capability structure except the PCI Express capability structure, in
 which Root Control register and Device Control register are shared
 between PME and AER. This patch assumes that all service drivers
 will be well behaved and not overwrite other service driver's
 configuration settings.
--- a/Documentation/PCI/acpi-info.rst
+++ b/Documentation/PCI/acpi-info.rst
@ -0,0 +1,192 @@
 .. SPDX-License-Identifier: GPL-2.0
 ========================================
 ACPI considerations for PCI host bridges
 ========================================
 The general rule is that the ACPI namespace should describe everything the
 OS might use unless there's another way for the OS to find it [1, 2].
 For example, there's no standard hardware mechanism for enumerating PCI
 host bridges, so the ACPI namespace must describe each host bridge, the
 method for accessing PCI config space below it, the address space windows
 the host bridge forwards to PCI (using _CRS), and the routing of legacy
 INTx interrupts (using _PRT).
 PCI devices, which are below the host bridge, generally do not need to be
 described via ACPI.  The OS can discover them via the standard PCI
 enumeration mechanism, using config accesses to discover and identify
 devices and read and size their BARs.  However, ACPI may describe PCI
 devices if it provides power management or hotplug functionality for them
 or if the device has INTx interrupts connected by platform interrupt
 controllers and a _PRT is needed to describe those connections.
 ACPI resource description is done via _CRS objects of devices in the ACPI
 namespace [2].   The _CRS is like a generalized PCI BAR: the OS can read
 _CRS and figure out what resource is being consumed even if it doesn't have
 a driver for the device [3].  That's important because it means an old OS
 can work correctly even on a system with new devices unknown to the OS.
 The new devices might not do anything, but the OS can at least make sure no
 resources conflict with them.
 Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
 reserving address space.  The static tables are for things the OS needs to
 know early in boot, before it can parse the ACPI namespace.  If a new table
 is defined, an old OS needs to operate correctly even though it ignores the
 table.  _CRS allows that because it is generic and understood by the old
 OS; a static table does not.
 If the OS is expected to manage a non-discoverable device described via
 ACPI, that device will have a specific _HID/_CID that tells the OS what
 driver to bind to it, and the _CRS tells the OS and the driver where the
 device's registers are.
 PCI host bridges are PNP0A03 or PNP0A08 devices.  Their _CRS should
 describe all the address space they consume.  This includes all the windows
 they forward down to the PCI bus, as well as registers of the host bridge
 itself that are not forwarded to PCI.  The host bridge registers include
 things like secondary/subordinate bus registers that determine the bus
 range below the bridge, window registers that describe the apertures, etc.
 These are all device-specific, non-architected things, so the only way a
 PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which contain
 the device-specific details.  The host bridge registers also include ECAM
 space, since it is consumed by the host bridge.
 ACPI defines a Consumer/Producer bit to distinguish the bridge registers
 ("Consumer") from the bridge apertures ("Producer") [4, 5], but early
 BIOSes didn't use that bit correctly.  The result is that the current ACPI
 spec defines Consumer/Producer only for the Extended Address Space
 descriptors; the bit should be ignored in the older QWord/DWord/Word
 Address Space descriptors.  Consequently, OSes have to assume all
 QWord/DWord/Word descriptors are windows.
 Prior to the addition of Extended Address Space descriptors, the failure of
 Consumer/Producer meant there was no way to describe bridge registers in
 the PNP0A03/PNP0A08 device itself.  The workaround was to describe the
 bridge registers (including ECAM space) in PNP0C02 catch-all devices [6].
 With the exception of ECAM, the bridge register space is device-specific
 anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to
 know about it.  
 New architectures should be able to use "Consumer" Extended Address Space
 descriptors in the PNP0A03 device for bridge registers, including ECAM,
 although a strict interpretation of [6] might prohibit this.  Old x86 and
 ia64 kernels assume all address space descriptors, including "Consumer"
 Extended Address Space ones, are windows, so it would not be safe to
 describe bridge registers this way on those architectures.
 PNP0C02 "motherboard" devices are basically a catch-all.  There's no
 programming model for them other than "don't use these resources for
 anything else."  So a PNP0C02 _CRS should claim any address space that is
 (1) not claimed by _CRS under any other device object in the ACPI namespace
 and (2) should not be assigned by the OS to something else.
 The PCIe spec requires the Enhanced Configuration Access Method (ECAM)
 unless there's a standard firmware interface for config access, e.g., the
 ia64 SAL interface [7].  A host bridge consumes ECAM memory address space
 and converts memory accesses into PCI configuration accesses.  The spec
 defines the ECAM address space layout and functionality; only the base of
 the address space is device-specific.  An ACPI OS learns the base address
 from either the static MCFG table or a _CBA method in the PNP0A03 device.
 The MCFG table must describe the ECAM space of non-hot pluggable host
 bridges [8].  Since MCFG is a static table and can't be updated by hotplug,
 a _CBA method in the PNP0A03 device describes the ECAM space of a
 hot-pluggable host bridge [9].  Note that for both MCFG and _CBA, the base
 address always corresponds to bus 0, even if the bus range below the bridge
 (which is reported via _CRS) doesn't start at 0.
 [1] ACPI 6.2, sec 6.1:
    For any device that is on a non-enumerable type of bus (for example, an
    ISA bus), OSPM enumerates the devices' identifier(s) and the ACPI
    system firmware must supply an _HID object ... for each device to
    enable OSPM to do that.
 [2] ACPI 6.2, sec 3.7:
    The OS enumerates motherboard devices simply by reading through the
    ACPI Namespace looking for devices with hardware IDs.
    Each device enumerated by ACPI includes ACPI-defined objects in the
    ACPI Namespace that report the hardware resources the device could
    occupy [_PRS], an object that reports the resources that are currently
    used by the device [_CRS], and objects for configuring those resources
    [_SRS].  The information is used by the Plug and Play OS (OSPM) to
    configure the devices.
 [3] ACPI 6.2, sec 6.2:
    OSPM uses device configuration objects to configure hardware resources
    for devices enumerated via ACPI.  Device configuration objects provide
    information about current and possible resource requirements, the
    relationship between shared resources, and methods for configuring
    hardware resources.
    When OSPM enumerates a device, it calls _PRS to determine the resource
    requirements of the device.  It may also call _CRS to find the current
    resource settings for the device.  Using this information, the Plug and
    Play system determines what resources the device should consume and
    sets those resources by calling the device’s _SRS control method.
    In ACPI, devices can consume resources (for example, legacy keyboards),
    provide resources (for example, a proprietary PCI bridge), or do both.
    Unless otherwise specified, resources for a device are assumed to be
    taken from the nearest matching resource above the device in the device
    hierarchy.
 [4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
    QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
      General Flags: Bit [0] Ignored
    Extended Address Space Descriptor (.4)
      General Flags: Bit [0] Consumer/Producer:
        * 1 – This device consumes this resource
        * 0 – This device produces and consumes this resource
 [5] ACPI 6.2, sec 19.6.43:
    ResourceUsage specifies whether the Memory range is consumed by
    this device (ResourceConsumer) or passed on to child devices
    (ResourceProducer).  If nothing is specified, then
    ResourceConsumer is assumed.
 [6] PCI Firmware 3.2, sec 4.1.2:
    If the operating system does not natively comprehend reserving the
    MMCFG region, the MMCFG region must be reserved by firmware.  The
    address range reported in the MCFG table or by _CBA method (see Section
    4.1.3) must be reserved by declaring a motherboard resource.  For most
    systems, the motherboard resource would appear at the root of the ACPI
    namespace (under \_SB) in a node with a _HID of EISAID (PNP0C02), and
    the resources in this case should not be claimed in the root PCI bus’s
    _CRS.  The resources can optionally be returned in Int15 E820 or
    EFIGetMemoryMap as reserved memory but must always be reported through
    ACPI as a motherboard resource.
 [7] PCI Express 4.0, sec 7.2.2:
    For systems that are PC-compatible, or that do not implement a
    processor-architecture-specific firmware interface standard that allows
    access to the Configuration Space, the ECAM is required as defined in
    this section.
 [8] PCI Firmware 3.2, sec 4.1.2:
    The MCFG table is an ACPI table that is used to communicate the base
    addresses corresponding to the non-hot removable PCI Segment Groups
    range within a PCI Segment Group available to the operating system at
    boot. This is required for the PC-compatible systems.
    The MCFG table is only used to communicate the base addresses
    corresponding to the PCI Segment Groups available to the system at
    boot.
 [9] PCI Firmware 3.2, sec 4.1.3:
    The _CBA (Memory mapped Configuration Base Address) control method is
    an optional ACPI object that returns the 64-bit memory mapped
    configuration base address for the hot plug capable host bridge. The
    base address returned by _CBA is processor-relative address. The _CBA
    control method evaluates to an Integer.
    This control method appears under a host bridge object. When the _CBA
    method appears under an active host bridge object, the operating system
    evaluates this structure to identify the memory mapped configuration
    base address corresponding to the PCI Segment Group for the bus number
    range specified in _CRS method. An ACPI name space object that contains
    the _CBA method must also contain a corresponding _SEG method.
--- a/Documentation/PCI/acpi-info.txt
+++ b/Documentation/PCI/acpi-info.txt
@ -1,187 +0,0 @@
 		ACPI considerations for PCI host bridges
 The general rule is that the ACPI namespace should describe everything the
 OS might use unless there's another way for the OS to find it [1, 2].
 For example, there's no standard hardware mechanism for enumerating PCI
 host bridges, so the ACPI namespace must describe each host bridge, the
 method for accessing PCI config space below it, the address space windows
 the host bridge forwards to PCI (using _CRS), and the routing of legacy
 INTx interrupts (using _PRT).
 PCI devices, which are below the host bridge, generally do not need to be
 described via ACPI.  The OS can discover them via the standard PCI
 enumeration mechanism, using config accesses to discover and identify
 devices and read and size their BARs.  However, ACPI may describe PCI
 devices if it provides power management or hotplug functionality for them
 or if the device has INTx interrupts connected by platform interrupt
 controllers and a _PRT is needed to describe those connections.
 ACPI resource description is done via _CRS objects of devices in the ACPI
 namespace [2].   The _CRS is like a generalized PCI BAR: the OS can read
 _CRS and figure out what resource is being consumed even if it doesn't have
 a driver for the device [3].  That's important because it means an old OS
 can work correctly even on a system with new devices unknown to the OS.
 The new devices might not do anything, but the OS can at least make sure no
 resources conflict with them.
 Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
 reserving address space.  The static tables are for things the OS needs to
 know early in boot, before it can parse the ACPI namespace.  If a new table
 is defined, an old OS needs to operate correctly even though it ignores the
 table.  _CRS allows that because it is generic and understood by the old
 OS; a static table does not.
 If the OS is expected to manage a non-discoverable device described via
 ACPI, that device will have a specific _HID/_CID that tells the OS what
 driver to bind to it, and the _CRS tells the OS and the driver where the
 device's registers are.
 PCI host bridges are PNP0A03 or PNP0A08 devices.  Their _CRS should
 describe all the address space they consume.  This includes all the windows
 they forward down to the PCI bus, as well as registers of the host bridge
 itself that are not forwarded to PCI.  The host bridge registers include
 things like secondary/subordinate bus registers that determine the bus
 range below the bridge, window registers that describe the apertures, etc.
 These are all device-specific, non-architected things, so the only way a
 PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which contain
 the device-specific details.  The host bridge registers also include ECAM
 space, since it is consumed by the host bridge.
 ACPI defines a Consumer/Producer bit to distinguish the bridge registers
 ("Consumer") from the bridge apertures ("Producer") [4, 5], but early
 BIOSes didn't use that bit correctly.  The result is that the current ACPI
 spec defines Consumer/Producer only for the Extended Address Space
 descriptors; the bit should be ignored in the older QWord/DWord/Word
 Address Space descriptors.  Consequently, OSes have to assume all
 QWord/DWord/Word descriptors are windows.
 Prior to the addition of Extended Address Space descriptors, the failure of
 Consumer/Producer meant there was no way to describe bridge registers in
 the PNP0A03/PNP0A08 device itself.  The workaround was to describe the
 bridge registers (including ECAM space) in PNP0C02 catch-all devices [6].
 With the exception of ECAM, the bridge register space is device-specific
 anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to
 know about it.  
 New architectures should be able to use "Consumer" Extended Address Space
 descriptors in the PNP0A03 device for bridge registers, including ECAM,
 although a strict interpretation of [6] might prohibit this.  Old x86 and
 ia64 kernels assume all address space descriptors, including "Consumer"
 Extended Address Space ones, are windows, so it would not be safe to
 describe bridge registers this way on those architectures.
 PNP0C02 "motherboard" devices are basically a catch-all.  There's no
 programming model for them other than "don't use these resources for
 anything else."  So a PNP0C02 _CRS should claim any address space that is
 (1) not claimed by _CRS under any other device object in the ACPI namespace
 and (2) should not be assigned by the OS to something else.
 The PCIe spec requires the Enhanced Configuration Access Method (ECAM)
 unless there's a standard firmware interface for config access, e.g., the
 ia64 SAL interface [7].  A host bridge consumes ECAM memory address space
 and converts memory accesses into PCI configuration accesses.  The spec
 defines the ECAM address space layout and functionality; only the base of
 the address space is device-specific.  An ACPI OS learns the base address
 from either the static MCFG table or a _CBA method in the PNP0A03 device.
 The MCFG table must describe the ECAM space of non-hot pluggable host
 bridges [8].  Since MCFG is a static table and can't be updated by hotplug,
 a _CBA method in the PNP0A03 device describes the ECAM space of a
 hot-pluggable host bridge [9].  Note that for both MCFG and _CBA, the base
 address always corresponds to bus 0, even if the bus range below the bridge
 (which is reported via _CRS) doesn't start at 0.
 [1] ACPI 6.2, sec 6.1:
    For any device that is on a non-enumerable type of bus (for example, an
    ISA bus), OSPM enumerates the devices' identifier(s) and the ACPI
    system firmware must supply an _HID object ... for each device to
    enable OSPM to do that.
 [2] ACPI 6.2, sec 3.7:
    The OS enumerates motherboard devices simply by reading through the
    ACPI Namespace looking for devices with hardware IDs.
    Each device enumerated by ACPI includes ACPI-defined objects in the
    ACPI Namespace that report the hardware resources the device could
    occupy [_PRS], an object that reports the resources that are currently
    used by the device [_CRS], and objects for configuring those resources
    [_SRS].  The information is used by the Plug and Play OS (OSPM) to
    configure the devices.
 [3] ACPI 6.2, sec 6.2:
    OSPM uses device configuration objects to configure hardware resources
    for devices enumerated via ACPI.  Device configuration objects provide
    information about current and possible resource requirements, the
    relationship between shared resources, and methods for configuring
    hardware resources.
    When OSPM enumerates a device, it calls _PRS to determine the resource
    requirements of the device.  It may also call _CRS to find the current
    resource settings for the device.  Using this information, the Plug and
    Play system determines what resources the device should consume and
    sets those resources by calling the device’s _SRS control method.
    In ACPI, devices can consume resources (for example, legacy keyboards),
    provide resources (for example, a proprietary PCI bridge), or do both.
    Unless otherwise specified, resources for a device are assumed to be
    taken from the nearest matching resource above the device in the device
    hierarchy.
 [4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
    QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
    General Flags: Bit [0] Ignored
    Extended Address Space Descriptor (.4)
    General Flags: Bit [0] Consumer/Producer:
 	1–This device consumes this resource
 	0–This device produces and consumes this resource
 [5] ACPI 6.2, sec 19.6.43:
    ResourceUsage specifies whether the Memory range is consumed by
    this device (ResourceConsumer) or passed on to child devices
    (ResourceProducer).  If nothing is specified, then
    ResourceConsumer is assumed.
 [6] PCI Firmware 3.2, sec 4.1.2:
    If the operating system does not natively comprehend reserving the
    MMCFG region, the MMCFG region must be reserved by firmware.  The
    address range reported in the MCFG table or by _CBA method (see Section
    4.1.3) must be reserved by declaring a motherboard resource.  For most
    systems, the motherboard resource would appear at the root of the ACPI
    namespace (under \_SB) in a node with a _HID of EISAID (PNP0C02), and
    the resources in this case should not be claimed in the root PCI bus’s
    _CRS.  The resources can optionally be returned in Int15 E820 or
    EFIGetMemoryMap as reserved memory but must always be reported through
    ACPI as a motherboard resource.
 [7] PCI Express 4.0, sec 7.2.2:
    For systems that are PC-compatible, or that do not implement a
    processor-architecture-specific firmware interface standard that allows
    access to the Configuration Space, the ECAM is required as defined in
    this section.
 [8] PCI Firmware 3.2, sec 4.1.2:
    The MCFG table is an ACPI table that is used to communicate the base
    addresses corresponding to the non-hot removable PCI Segment Groups
    range within a PCI Segment Group available to the operating system at
    boot. This is required for the PC-compatible systems.
    The MCFG table is only used to communicate the base addresses
    corresponding to the PCI Segment Groups available to the system at
    boot.
 [9] PCI Firmware 3.2, sec 4.1.3:
    The _CBA (Memory mapped Configuration Base Address) control method is
    an optional ACPI object that returns the 64-bit memory mapped
    configuration base address for the hot plug capable host bridge. The
    base address returned by _CBA is processor-relative address. The _CBA
    control method evaluates to an Integer.
    This control method appears under a host bridge object. When the _CBA
    method appears under an active host bridge object, the operating system
    evaluates this structure to identify the memory mapped configuration
    base address corresponding to the PCI Segment Group for the bus number
    range specified in _CRS method. An ACPI name space object that contains
    the _CBA method must also contain a corresponding _SEG method.
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@ -0,0 +1,13 @@
 .. SPDX-License-Identifier: GPL-2.0
 ======================
 PCI Endpoint Framework
 ======================
 .. toctree::
   :maxdepth: 2
   pci-endpoint
   pci-endpoint-cfs
   pci-test-function
   pci-test-howto
--- a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
@ -0,0 +1,118 @@
 .. SPDX-License-Identifier: GPL-2.0
 =======================================
 Configuring PCI Endpoint Using CONFIGFS
 =======================================
 :Author: Kishon Vijay Abraham I <kishon@ti.com>
 The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
 PCI endpoint function and to bind the endpoint function
 with the endpoint controller. (For introducing other mechanisms to
 configure the PCI Endpoint Function refer to [1]).
 Mounting configfs
 =================
 The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
 directory. configfs can be mounted using the following command::
 	mount -t configfs none /sys/kernel/config
 Directory Structure
 ===================
 The pci_ep configfs has two directories at its root: controllers and
 functions. Every EPC device present in the system will have an entry in
 the *controllers* directory and and every EPF driver present in the system
 will have an entry in the *functions* directory.
 ::
 	/sys/kernel/config/pci_ep/
 		.. controllers/
 		.. functions/
 Creating EPF Device
 ===================
 Every registered EPF driver will be listed in controllers directory. The
 entries corresponding to EPF driver will be created by the EPF core.
 ::
 	/sys/kernel/config/pci_ep/functions/
 		.. <EPF Driver1>/
 			... <EPF Device 11>/
 			... <EPF Device 21>/
 		.. <EPF Driver2>/
 			... <EPF Device 12>/
 			... <EPF Device 22>/
 In order to create a <EPF device> of the type probed by <EPF Driver>, the
 user has to create a directory inside <EPF DriverN>.
 Every <EPF device> directory consists of the following entries that can be
 used to configure the standard configuration header of the endpoint function.
 (These entries are created by the framework when any new <EPF Device> is
 created)
 ::
 		.. <EPF Driver1>/
 			... <EPF Device 11>/
 				... vendorid
 				... deviceid
 				... revid
 				... progif_code
 				... subclass_code
 				... baseclass_code
 				... cache_line_size
 				... subsys_vendor_id
 				... subsys_id
 				... interrupt_pin
 EPC Device
 ==========
 Every registered EPC device will be listed in controllers directory. The
 entries corresponding to EPC device will be created by the EPC core.
 ::
 	/sys/kernel/config/pci_ep/controllers/
 		.. <EPC Device1>/
 			... <Symlink EPF Device11>/
 			... <Symlink EPF Device12>/
 			... start
 		.. <EPC Device2>/
 			... <Symlink EPF Device21>/
 			... <Symlink EPF Device22>/
 			... start
 The <EPC Device> directory will have a list of symbolic links to
 <EPF Device>. These symbolic links should be created by the user to
 represent the functions present in the endpoint device.
 The <EPC Device> directory will also have a *start* field. Once
 "1" is written to this field, the endpoint device will be ready to
 establish the link with the host. This is usually done after
 all the EPF devices are created and linked with the EPC device.
 ::
 			 | controllers/
 				| <Directory: EPC name>/
 					| <Symbolic Link: Function>
 					| start
 			 | functions/
 				| <Directory: EPF driver>/
 					| <Directory: EPF device>/
 						| vendorid
 						| deviceid
 						| revid
 						| progif_code
 						| subclass_code
 						| baseclass_code
 						| cache_line_size
 						| subsys_vendor_id
 						| subsys_id
 						| interrupt_pin
 						| function
 [1] :doc:`pci-endpoint`
--- a/Documentation/PCI/endpoint/pci-endpoint-cfs.txt
+++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.txt
@ -1,105 +0,0 @@
                   CONFIGURING PCI ENDPOINT USING CONFIGFS
                    Kishon Vijay Abraham I <kishon@ti.com>
 The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
 PCI endpoint function and to bind the endpoint function
 with the endpoint controller. (For introducing other mechanisms to
 configure the PCI Endpoint Function refer to [1]).
 *) Mounting configfs
 The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
 directory. configfs can be mounted using the following command.
 	mount -t configfs none /sys/kernel/config
 *) Directory Structure
 The pci_ep configfs has two directories at its root: controllers and
 functions. Every EPC device present in the system will have an entry in
 the *controllers* directory and and every EPF driver present in the system
 will have an entry in the *functions* directory.
 /sys/kernel/config/pci_ep/
 	.. controllers/
 	.. functions/
 *) Creating EPF Device
 Every registered EPF driver will be listed in controllers directory. The
 entries corresponding to EPF driver will be created by the EPF core.
 /sys/kernel/config/pci_ep/functions/
 	.. <EPF Driver1>/
 		... <EPF Device 11>/
 		... <EPF Device 21>/
 	.. <EPF Driver2>/
 		... <EPF Device 12>/
 		... <EPF Device 22>/
 In order to create a <EPF device> of the type probed by <EPF Driver>, the
 user has to create a directory inside <EPF DriverN>.
 Every <EPF device> directory consists of the following entries that can be
 used to configure the standard configuration header of the endpoint function.
 (These entries are created by the framework when any new <EPF Device> is
 created)
 	.. <EPF Driver1>/
 		... <EPF Device 11>/
 			... vendorid
 			... deviceid
 			... revid
 			... progif_code
 			... subclass_code
 			... baseclass_code
 			... cache_line_size
 			... subsys_vendor_id
 			... subsys_id
 			... interrupt_pin
 *) EPC Device
 Every registered EPC device will be listed in controllers directory. The
 entries corresponding to EPC device will be created by the EPC core.
 /sys/kernel/config/pci_ep/controllers/
 	.. <EPC Device1>/
 		... <Symlink EPF Device11>/
 		... <Symlink EPF Device12>/
 		... start
 	.. <EPC Device2>/
 		... <Symlink EPF Device21>/
 		... <Symlink EPF Device22>/
 		... start
 The <EPC Device> directory will have a list of symbolic links to
 <EPF Device>. These symbolic links should be created by the user to
 represent the functions present in the endpoint device.
 The <EPC Device> directory will also have a *start* field. Once
 "1" is written to this field, the endpoint device will be ready to
 establish the link with the host. This is usually done after
 all the EPF devices are created and linked with the EPC device.
 			 | controllers/
 				| <Directory: EPC name>/
 					| <Symbolic Link: Function>
 					| start
 			 | functions/
 				| <Directory: EPF driver>/
 					| <Directory: EPF device>/
 						| vendorid
 						| deviceid
 						| revid
 						| progif_code
 						| subclass_code
 						| baseclass_code
 						| cache_line_size
 						| subsys_vendor_id
 						| subsys_id
 						| interrupt_pin
 						| function
 [1] -> Documentation/PCI/endpoint/pci-endpoint.txt
--- a/Documentation/PCI/endpoint/pci-endpoint.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint.rst
@ -0,0 +1,231 @@
 .. SPDX-License-Identifier: GPL-2.0
 :Author: Kishon Vijay Abraham I <kishon@ti.com>
 This document is a guide to use the PCI Endpoint Framework in order to create
 endpoint controller driver, endpoint function driver, and using configfs
 interface to bind the function driver to the controller driver.
 Introduction
 ============
 Linux has a comprehensive PCI subsystem to support PCI controllers that
 operates in Root Complex mode. The subsystem has capability to scan PCI bus,
 assign memory resources and IRQ resources, load PCI driver (based on
 vendor ID, device ID), support other services like hot-plug, power management,
 advanced error reporting and virtual channels.
 However the PCI controller IP integrated in some SoCs is capable of operating
 either in Root Complex mode or Endpoint mode. PCI Endpoint Framework will
 add endpoint mode support in Linux. This will help to run Linux in an
 EP system which can have a wide variety of use cases from testing or
 validation, co-processor accelerator, etc.
 PCI Endpoint Core
 =================
 The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
 library, the Endpoint Function library, and the configfs layer to bind the
 endpoint function with the endpoint controller.
 PCI Endpoint Controller(EPC) Library
 ------------------------------------
 The EPC library provides APIs to be used by the controller that can operate
 in endpoint mode. It also provides APIs to be used by function driver/library
 in order to implement a particular endpoint function.
 APIs for the PCI controller Driver
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI controller driver.
 * devm_pci_epc_create()/pci_epc_create()
   The PCI controller driver should implement the following ops:
 	 * write_header: ops to populate configuration space header
 	 * set_bar: ops to configure the BAR
 	 * clear_bar: ops to reset the BAR
 	 * alloc_addr_space: ops to allocate in PCI controller address space
 	 * free_addr_space: ops to free the allocated address space
 	 * raise_irq: ops to raise a legacy, MSI or MSI-X interrupt
 	 * start: ops to start the PCI link
 	 * stop: ops to stop the PCI link
   The PCI controller driver can then create a new EPC device by invoking
   devm_pci_epc_create()/pci_epc_create().
 * devm_pci_epc_destroy()/pci_epc_destroy()
   The PCI controller driver can destroy the EPC device created by either
   devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
   pci_epc_destroy().
 * pci_epc_linkup()
   In order to notify all the function devices that the EPC device to which
   they are linked has established a link with the host, the PCI controller
   driver should invoke pci_epc_linkup().
 * pci_epc_mem_init()
   Initialize the pci_epc_mem structure used for allocating EPC addr space.
 * pci_epc_mem_exit()
   Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
 APIs for the PCI Endpoint Function Driver
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint function driver.
 * pci_epc_write_header()
   The PCI endpoint function driver should use pci_epc_write_header() to
   write the standard configuration header to the endpoint controller.
 * pci_epc_set_bar()
   The PCI endpoint function driver should use pci_epc_set_bar() to configure
   the Base Address Register in order for the host to assign PCI addr space.
   Register space of the function driver is usually configured
   using this API.
 * pci_epc_clear_bar()
   The PCI endpoint function driver should use pci_epc_clear_bar() to reset
   the BAR.
 * pci_epc_raise_irq()
   The PCI endpoint function driver should use pci_epc_raise_irq() to raise
   Legacy Interrupt, MSI or MSI-X Interrupt.
 * pci_epc_mem_alloc_addr()
   The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
   allocate memory address from EPC addr space which is required to access
   RC's buffer
 * pci_epc_mem_free_addr()
   The PCI endpoint function driver should use pci_epc_mem_free_addr() to
   free the memory space allocated using pci_epc_mem_alloc_addr().
 Other APIs
 ~~~~~~~~~~
 There are other APIs provided by the EPC library. These are used for binding
 the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
 using these APIs.
 * pci_epc_get()
   Get a reference to the PCI endpoint controller based on the device name of
   the controller.
 * pci_epc_put()
   Release the reference to the PCI endpoint controller obtained using
   pci_epc_get()
 * pci_epc_add_epf()
   Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
   can have up to 8 functions according to the specification.
 * pci_epc_remove_epf()
   Remove the PCI endpoint function from PCI endpoint controller.
 * pci_epc_start()
   The PCI endpoint function driver should invoke pci_epc_start() once it
   has configured the endpoint function and wants to start the PCI link.
 * pci_epc_stop()
   The PCI endpoint function driver should invoke pci_epc_stop() to stop
   the PCI LINK.
 PCI Endpoint Function(EPF) Library
 ----------------------------------
 The EPF library provides APIs to be used by the function driver and the EPC
 library to provide endpoint mode functionality.
 APIs for the PCI Endpoint Function Driver
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint function driver.
 * pci_epf_register_driver()
   The PCI Endpoint Function driver should implement the following ops:
 	 * bind: ops to perform when a EPC device has been bound to EPF device
 	 * unbind: ops to perform when a binding has been lost between a EPC
 	   device and EPF device
 	 * linkup: ops to perform when the EPC device has established a
 	   connection with a host system
  The PCI Function driver can then register the PCI EPF driver by using
  pci_epf_register_driver().
 * pci_epf_unregister_driver()
  The PCI Function driver can unregister the PCI EPF driver by using
  pci_epf_unregister_driver().
 * pci_epf_alloc_space()
  The PCI Function driver can allocate space for a particular BAR using
  pci_epf_alloc_space().
 * pci_epf_free_space()
  The PCI Function driver can free the allocated space
  (using pci_epf_alloc_space) by invoking pci_epf_free_space().
 APIs for the PCI Endpoint Controller Library
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint controller library.
 * pci_epf_linkup()
   The PCI endpoint controller library invokes pci_epf_linkup() when the
   EPC device has established the connection to the host.
 Other APIs
 ~~~~~~~~~~
 There are other APIs provided by the EPF library. These are used to notify
 the function driver when the EPF device is bound to the EPC device.
 pci-ep-cfs.c can be used as reference for using these APIs.
 * pci_epf_create()
   Create a new PCI EPF device by passing the name of the PCI EPF device.
   This name will be used to bind the the EPF device to a EPF driver.
 * pci_epf_destroy()
   Destroy the created PCI EPF device.
 * pci_epf_bind()
   pci_epf_bind() should be invoked when the EPF device has been bound to
   a EPC device.
 * pci_epf_unbind()
   pci_epf_unbind() should be invoked when the binding between EPC device
   and EPF device is lost.
--- a/Documentation/PCI/endpoint/pci-endpoint.txt
+++ b/Documentation/PCI/endpoint/pci-endpoint.txt
@ -1,215 +0,0 @@
 			    PCI ENDPOINT FRAMEWORK
 		    Kishon Vijay Abraham I <kishon@ti.com>
 This document is a guide to use the PCI Endpoint Framework in order to create
 endpoint controller driver, endpoint function driver, and using configfs
 interface to bind the function driver to the controller driver.
 1. Introduction
 Linux has a comprehensive PCI subsystem to support PCI controllers that
 operates in Root Complex mode. The subsystem has capability to scan PCI bus,
 assign memory resources and IRQ resources, load PCI driver (based on
 vendor ID, device ID), support other services like hot-plug, power management,
 advanced error reporting and virtual channels.
 However the PCI controller IP integrated in some SoCs is capable of operating
 either in Root Complex mode or Endpoint mode. PCI Endpoint Framework will
 add endpoint mode support in Linux. This will help to run Linux in an
 EP system which can have a wide variety of use cases from testing or
 validation, co-processor accelerator, etc.
 2. PCI Endpoint Core
 The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
 library, the Endpoint Function library, and the configfs layer to bind the
 endpoint function with the endpoint controller.
 2.1 PCI Endpoint Controller(EPC) Library
 The EPC library provides APIs to be used by the controller that can operate
 in endpoint mode. It also provides APIs to be used by function driver/library
 in order to implement a particular endpoint function.
 2.1.1 APIs for the PCI controller Driver
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI controller driver.
 *) devm_pci_epc_create()/pci_epc_create()
   The PCI controller driver should implement the following ops:
 	 * write_header: ops to populate configuration space header
 	 * set_bar: ops to configure the BAR
 	 * clear_bar: ops to reset the BAR
 	 * alloc_addr_space: ops to allocate in PCI controller address space
 	 * free_addr_space: ops to free the allocated address space
 	 * raise_irq: ops to raise a legacy, MSI or MSI-X interrupt
 	 * start: ops to start the PCI link
 	 * stop: ops to stop the PCI link
   The PCI controller driver can then create a new EPC device by invoking
   devm_pci_epc_create()/pci_epc_create().
 *) devm_pci_epc_destroy()/pci_epc_destroy()
   The PCI controller driver can destroy the EPC device created by either
   devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
   pci_epc_destroy().
 *) pci_epc_linkup()
   In order to notify all the function devices that the EPC device to which
   they are linked has established a link with the host, the PCI controller
   driver should invoke pci_epc_linkup().
 *) pci_epc_mem_init()
   Initialize the pci_epc_mem structure used for allocating EPC addr space.
 *) pci_epc_mem_exit()
   Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
 2.1.2 APIs for the PCI Endpoint Function Driver
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint function driver.
 *) pci_epc_write_header()
   The PCI endpoint function driver should use pci_epc_write_header() to
   write the standard configuration header to the endpoint controller.
 *) pci_epc_set_bar()
   The PCI endpoint function driver should use pci_epc_set_bar() to configure
   the Base Address Register in order for the host to assign PCI addr space.
   Register space of the function driver is usually configured
   using this API.
 *) pci_epc_clear_bar()
   The PCI endpoint function driver should use pci_epc_clear_bar() to reset
   the BAR.
 *) pci_epc_raise_irq()
   The PCI endpoint function driver should use pci_epc_raise_irq() to raise
   Legacy Interrupt, MSI or MSI-X Interrupt.
 *) pci_epc_mem_alloc_addr()
   The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
   allocate memory address from EPC addr space which is required to access
   RC's buffer
 *) pci_epc_mem_free_addr()
   The PCI endpoint function driver should use pci_epc_mem_free_addr() to
   free the memory space allocated using pci_epc_mem_alloc_addr().
 2.1.3 Other APIs
 There are other APIs provided by the EPC library. These are used for binding
 the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
 using these APIs.
 *) pci_epc_get()
   Get a reference to the PCI endpoint controller based on the device name of
   the controller.
 *) pci_epc_put()
   Release the reference to the PCI endpoint controller obtained using
   pci_epc_get()
 *) pci_epc_add_epf()
   Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
   can have up to 8 functions according to the specification.
 *) pci_epc_remove_epf()
   Remove the PCI endpoint function from PCI endpoint controller.
 *) pci_epc_start()
   The PCI endpoint function driver should invoke pci_epc_start() once it
   has configured the endpoint function and wants to start the PCI link.
 *) pci_epc_stop()
   The PCI endpoint function driver should invoke pci_epc_stop() to stop
   the PCI LINK.
 2.2 PCI Endpoint Function(EPF) Library
 The EPF library provides APIs to be used by the function driver and the EPC
 library to provide endpoint mode functionality.
 2.2.1 APIs for the PCI Endpoint Function Driver
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint function driver.
 *) pci_epf_register_driver()
   The PCI Endpoint Function driver should implement the following ops:
 	 * bind: ops to perform when a EPC device has been bound to EPF device
 	 * unbind: ops to perform when a binding has been lost between a EPC
 	   device and EPF device
 	 * linkup: ops to perform when the EPC device has established a
 	   connection with a host system
  The PCI Function driver can then register the PCI EPF driver by using
  pci_epf_register_driver().
 *) pci_epf_unregister_driver()
  The PCI Function driver can unregister the PCI EPF driver by using
  pci_epf_unregister_driver().
 *) pci_epf_alloc_space()
  The PCI Function driver can allocate space for a particular BAR using
  pci_epf_alloc_space().
 *) pci_epf_free_space()
  The PCI Function driver can free the allocated space
  (using pci_epf_alloc_space) by invoking pci_epf_free_space().
 2.2.2 APIs for the PCI Endpoint Controller Library
 This section lists the APIs that the PCI Endpoint core provides to be used
 by the PCI endpoint controller library.
 *) pci_epf_linkup()
   The PCI endpoint controller library invokes pci_epf_linkup() when the
   EPC device has established the connection to the host.
 2.2.2 Other APIs
 There are other APIs provided by the EPF library. These are used to notify
 the function driver when the EPF device is bound to the EPC device.
 pci-ep-cfs.c can be used as reference for using these APIs.
 *) pci_epf_create()
   Create a new PCI EPF device by passing the name of the PCI EPF device.
   This name will be used to bind the the EPF device to a EPF driver.
 *) pci_epf_destroy()
   Destroy the created PCI EPF device.
 *) pci_epf_bind()
   pci_epf_bind() should be invoked when the EPF device has been bound to
   a EPC device.
 *) pci_epf_unbind()
   pci_epf_unbind() should be invoked when the binding between EPC device
   and EPF device is lost.
--- a/Documentation/PCI/endpoint/pci-test-function.rst
+++ b/Documentation/PCI/endpoint/pci-test-function.rst
@ -0,0 +1,103 @@
 .. SPDX-License-Identifier: GPL-2.0
 =================
 PCI Test Function
 =================
 :Author: Kishon Vijay Abraham I <kishon@ti.com>
 Traditionally PCI RC has always been validated by using standard
 PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
 However with the addition of EP-core in linux kernel, it is possible
 to configure a PCI controller that can operate in EP mode to work as
 a test device.
 The PCI endpoint test device is a virtual device (defined in software)
 used to test the endpoint functionality and serve as a sample driver
 for other PCI endpoint devices (to use the EP framework).
 The PCI endpoint test device has the following registers:
 	1) PCI_ENDPOINT_TEST_MAGIC
 	2) PCI_ENDPOINT_TEST_COMMAND
 	3) PCI_ENDPOINT_TEST_STATUS
 	4) PCI_ENDPOINT_TEST_SRC_ADDR
 	5) PCI_ENDPOINT_TEST_DST_ADDR
 	6) PCI_ENDPOINT_TEST_SIZE
 	7) PCI_ENDPOINT_TEST_CHECKSUM
 	8) PCI_ENDPOINT_TEST_IRQ_TYPE
 	9) PCI_ENDPOINT_TEST_IRQ_NUMBER
 * PCI_ENDPOINT_TEST_MAGIC
 This register will be used to test BAR0. A known pattern will be written
 and read back from MAGIC register to verify BAR0.
 * PCI_ENDPOINT_TEST_COMMAND
 This register will be used by the host driver to indicate the function
 that the endpoint device must perform.
 ========	================================================================
 Bitfield	Description
 ========	================================================================
 Bit 0		raise legacy IRQ
 Bit 1		raise MSI IRQ
 Bit 2		raise MSI-X IRQ
 Bit 3		read command (read data from RC buffer)
 Bit 4		write command (write data to RC buffer)
 Bit 5		copy command (copy data from one RC buffer to another RC buffer)
 ========	================================================================
 * PCI_ENDPOINT_TEST_STATUS
 This register reflects the status of the PCI endpoint device.
 ========	==============================
 Bitfield	Description
 ========	==============================
 Bit 0		read success
 Bit 1		read fail
 Bit 2		write success
 Bit 3		write fail
 Bit 4		copy success
 Bit 5		copy fail
 Bit 6		IRQ raised
 Bit 7		source address is invalid
 Bit 8		destination address is invalid
 ========	==============================
 * PCI_ENDPOINT_TEST_SRC_ADDR
 This register contains the source address (RC buffer address) for the
 COPY/READ command.
 * PCI_ENDPOINT_TEST_DST_ADDR
 This register contains the destination address (RC buffer address) for
 the COPY/WRITE command.
 * PCI_ENDPOINT_TEST_IRQ_TYPE
 This register contains the interrupt type (Legacy/MSI) triggered
 for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
 Possible types:
 ======	==
 Legacy	0
 MSI	1
 MSI-X	2
 ======	==
 * PCI_ENDPOINT_TEST_IRQ_NUMBER
 This register contains the triggered ID interrupt.
 Admissible values:
 ======	===========
 Legacy	0
 MSI	[1 .. 32]
 MSI-X	[1 .. 2048]
 ======	===========
--- a/Documentation/PCI/endpoint/pci-test-function.txt
+++ b/Documentation/PCI/endpoint/pci-test-function.txt
@ -1,87 +0,0 @@
 				PCI TEST
 		    Kishon Vijay Abraham I <kishon@ti.com>
 Traditionally PCI RC has always been validated by using standard
 PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
 However with the addition of EP-core in linux kernel, it is possible
 to configure a PCI controller that can operate in EP mode to work as
 a test device.
 The PCI endpoint test device is a virtual device (defined in software)
 used to test the endpoint functionality and serve as a sample driver
 for other PCI endpoint devices (to use the EP framework).
 The PCI endpoint test device has the following registers:
 	1) PCI_ENDPOINT_TEST_MAGIC
 	2) PCI_ENDPOINT_TEST_COMMAND
 	3) PCI_ENDPOINT_TEST_STATUS
 	4) PCI_ENDPOINT_TEST_SRC_ADDR
 	5) PCI_ENDPOINT_TEST_DST_ADDR
 	6) PCI_ENDPOINT_TEST_SIZE
 	7) PCI_ENDPOINT_TEST_CHECKSUM
 	8) PCI_ENDPOINT_TEST_IRQ_TYPE
 	9) PCI_ENDPOINT_TEST_IRQ_NUMBER
 *) PCI_ENDPOINT_TEST_MAGIC
 This register will be used to test BAR0. A known pattern will be written
 and read back from MAGIC register to verify BAR0.
 *) PCI_ENDPOINT_TEST_COMMAND:
 This register will be used by the host driver to indicate the function
 that the endpoint device must perform.
 Bitfield Description:
  Bit 0		: raise legacy IRQ
  Bit 1		: raise MSI IRQ
  Bit 2		: raise MSI-X IRQ
  Bit 3		: read command (read data from RC buffer)
  Bit 4		: write command (write data to RC buffer)
  Bit 5		: copy command (copy data from one RC buffer to another
 		  RC buffer)
 *) PCI_ENDPOINT_TEST_STATUS
 This register reflects the status of the PCI endpoint device.
 Bitfield Description:
  Bit 0		: read success
  Bit 1		: read fail
  Bit 2		: write success
  Bit 3		: write fail
  Bit 4		: copy success
  Bit 5		: copy fail
  Bit 6		: IRQ raised
  Bit 7		: source address is invalid
  Bit 8		: destination address is invalid
 *) PCI_ENDPOINT_TEST_SRC_ADDR
 This register contains the source address (RC buffer address) for the
 COPY/READ command.
 *) PCI_ENDPOINT_TEST_DST_ADDR
 This register contains the destination address (RC buffer address) for
 the COPY/WRITE command.
 *) PCI_ENDPOINT_TEST_IRQ_TYPE
 This register contains the interrupt type (Legacy/MSI) triggered
 for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
 Possible types:
 - Legacy	: 0
 - MSI		: 1
 - MSI-X	: 2
 *) PCI_ENDPOINT_TEST_IRQ_NUMBER
 This register contains the triggered ID interrupt.
 Admissible values:
 - Legacy	: 0
 - MSI		: [1 .. 32]
 - MSI-X	: [1 .. 2048]
--- a/Documentation/PCI/endpoint/pci-test-howto.rst
+++ b/Documentation/PCI/endpoint/pci-test-howto.rst
@ -0,0 +1,235 @@
 .. SPDX-License-Identifier: GPL-2.0
 ===================
 PCI Test User Guide
 ===================
 :Author: Kishon Vijay Abraham I <kishon@ti.com>
 This document is a guide to help users use pci-epf-test function driver
 and pci_endpoint_test host driver for testing PCI. The list of steps to
 be followed in the host side and EP side is given below.
 Endpoint Device
 ===============
 Endpoint Controller Devices
 ---------------------------
 To find the list of endpoint controller devices in the system::
 	# ls /sys/class/pci_epc/
 	  51000000.pcie_ep
 If PCI_ENDPOINT_CONFIGFS is enabled::
 	# ls /sys/kernel/config/pci_ep/controllers
 	  51000000.pcie_ep
 Endpoint Function Drivers
 -------------------------
 To find the list of endpoint function drivers in the system::
 	# ls /sys/bus/pci-epf/drivers
 	  pci_epf_test
 If PCI_ENDPOINT_CONFIGFS is enabled::
 	# ls /sys/kernel/config/pci_ep/functions
 	  pci_epf_test
 Creating pci-epf-test Device
 ----------------------------
 PCI endpoint function device can be created using the configfs. To create
 pci-epf-test device, the following commands can be used::
 	# mount -t configfs none /sys/kernel/config
 	# cd /sys/kernel/config/pci_ep/
 	# mkdir functions/pci_epf_test/func1
 The "mkdir func1" above creates the pci-epf-test function device that will
 be probed by pci_epf_test driver.
 The PCI endpoint framework populates the directory with the following
 configurable fields::
 	# ls functions/pci_epf_test/func1
 	  baseclass_code	interrupt_pin	progif_code	subsys_id
 	  cache_line_size	msi_interrupts	revid		subsys_vendorid
 	  deviceid          	msix_interrupts	subclass_code	vendorid
 The PCI endpoint function driver populates these entries with default values
 when the device is bound to the driver. The pci-epf-test driver populates
 vendorid with 0xffff and interrupt_pin with 0x0001::
 	# cat functions/pci_epf_test/func1/vendorid
 	  0xffff
 	# cat functions/pci_epf_test/func1/interrupt_pin
 	  0x0001
 Configuring pci-epf-test Device
 -------------------------------
 The user can configure the pci-epf-test device using configfs entry. In order
 to change the vendorid and the number of MSI interrupts used by the function
 device, the following commands can be used::
 	# echo 0x104c > functions/pci_epf_test/func1/vendorid
 	# echo 0xb500 > functions/pci_epf_test/func1/deviceid
 	# echo 16 > functions/pci_epf_test/func1/msi_interrupts
 	# echo 8 > functions/pci_epf_test/func1/msix_interrupts
 Binding pci-epf-test Device to EP Controller
 --------------------------------------------
 In order for the endpoint function device to be useful, it has to be bound to
 a PCI endpoint controller driver. Use the configfs to bind the function
 device to one of the controller driver present in the system::
 	# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
 Once the above step is completed, the PCI endpoint is ready to establish a link
 with the host.
 Start the Link
 --------------
 In order for the endpoint device to establish a link with the host, the _start_
 field should be populated with '1'::
 	# echo 1 > controllers/51000000.pcie_ep/start
 RootComplex Device
 ==================
 lspci Output
 ------------
 Note that the devices listed here correspond to the value populated in 1.4
 above::
 	00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
 	01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
 Using Endpoint Test function Device
 -----------------------------------
 pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
 tests. To compile this tool the following commands should be used::
 	# cd <kernel-dir>
 	# make -C tools/pci
 or if you desire to compile and install in your system::
 	# cd <kernel-dir>
 	# make -C tools/pci install
 The tool and script will be located in <rootfs>/usr/bin/
 pcitest.sh Output
 ~~~~~~~~~~~~~~~~~
 ::
 	# pcitest.sh
 	BAR tests
 	BAR0:           OKAY
 	BAR1:           OKAY
 	BAR2:           OKAY
 	BAR3:           OKAY
 	BAR4:           NOT OKAY
 	BAR5:           NOT OKAY
 	Interrupt tests
 	SET IRQ TYPE TO LEGACY:         OKAY
 	LEGACY IRQ:     NOT OKAY
 	SET IRQ TYPE TO MSI:            OKAY
 	MSI1:           OKAY
 	MSI2:           OKAY
 	MSI3:           OKAY
 	MSI4:           OKAY
 	MSI5:           OKAY
 	MSI6:           OKAY
 	MSI7:           OKAY
 	MSI8:           OKAY
 	MSI9:           OKAY
 	MSI10:          OKAY
 	MSI11:          OKAY
 	MSI12:          OKAY
 	MSI13:          OKAY
 	MSI14:          OKAY
 	MSI15:          OKAY
 	MSI16:          OKAY
 	MSI17:          NOT OKAY
 	MSI18:          NOT OKAY
 	MSI19:          NOT OKAY
 	MSI20:          NOT OKAY
 	MSI21:          NOT OKAY
 	MSI22:          NOT OKAY
 	MSI23:          NOT OKAY
 	MSI24:          NOT OKAY
 	MSI25:          NOT OKAY
 	MSI26:          NOT OKAY
 	MSI27:          NOT OKAY
 	MSI28:          NOT OKAY
 	MSI29:          NOT OKAY
 	MSI30:          NOT OKAY
 	MSI31:          NOT OKAY
 	MSI32:          NOT OKAY
 	SET IRQ TYPE TO MSI-X:          OKAY
 	MSI-X1:         OKAY
 	MSI-X2:         OKAY
 	MSI-X3:         OKAY
 	MSI-X4:         OKAY
 	MSI-X5:         OKAY
 	MSI-X6:         OKAY
 	MSI-X7:         OKAY
 	MSI-X8:         OKAY
 	MSI-X9:         NOT OKAY
 	MSI-X10:        NOT OKAY
 	MSI-X11:        NOT OKAY
 	MSI-X12:        NOT OKAY
 	MSI-X13:        NOT OKAY
 	MSI-X14:        NOT OKAY
 	MSI-X15:        NOT OKAY
 	MSI-X16:        NOT OKAY
 	[...]
 	MSI-X2047:      NOT OKAY
 	MSI-X2048:      NOT OKAY
 	Read Tests
 	SET IRQ TYPE TO MSI:            OKAY
 	READ (      1 bytes):           OKAY
 	READ (   1024 bytes):           OKAY
 	READ (   1025 bytes):           OKAY
 	READ (1024000 bytes):           OKAY
 	READ (1024001 bytes):           OKAY
 	Write Tests
 	WRITE (      1 bytes):          OKAY
 	WRITE (   1024 bytes):          OKAY
 	WRITE (   1025 bytes):          OKAY
 	WRITE (1024000 bytes):          OKAY
 	WRITE (1024001 bytes):          OKAY
 	Copy Tests
 	COPY (      1 bytes):           OKAY
 	COPY (   1024 bytes):           OKAY
 	COPY (   1025 bytes):           OKAY
 	COPY (1024000 bytes):           OKAY
 	COPY (1024001 bytes):           OKAY
--- a/Documentation/PCI/endpoint/pci-test-howto.txt
+++ b/Documentation/PCI/endpoint/pci-test-howto.txt
@ -1,206 +0,0 @@
 			    PCI TEST USERGUIDE
 		    Kishon Vijay Abraham I <kishon@ti.com>
 This document is a guide to help users use pci-epf-test function driver
 and pci_endpoint_test host driver for testing PCI. The list of steps to
 be followed in the host side and EP side is given below.
 1. Endpoint Device
 1.1 Endpoint Controller Devices
 To find the list of endpoint controller devices in the system:
 	# ls /sys/class/pci_epc/
 	  51000000.pcie_ep
 If PCI_ENDPOINT_CONFIGFS is enabled
 	# ls /sys/kernel/config/pci_ep/controllers
 	  51000000.pcie_ep
 1.2 Endpoint Function Drivers
 To find the list of endpoint function drivers in the system:
 	# ls /sys/bus/pci-epf/drivers
 	  pci_epf_test
 If PCI_ENDPOINT_CONFIGFS is enabled
 	# ls /sys/kernel/config/pci_ep/functions
 	  pci_epf_test
 1.3 Creating pci-epf-test Device
 PCI endpoint function device can be created using the configfs. To create
 pci-epf-test device, the following commands can be used
 	# mount -t configfs none /sys/kernel/config
 	# cd /sys/kernel/config/pci_ep/
 	# mkdir functions/pci_epf_test/func1
 The "mkdir func1" above creates the pci-epf-test function device that will
 be probed by pci_epf_test driver.
 The PCI endpoint framework populates the directory with the following
 configurable fields.
 	# ls functions/pci_epf_test/func1
 	  baseclass_code	interrupt_pin	progif_code	subsys_id
 	  cache_line_size	msi_interrupts	revid		subsys_vendorid
 	  deviceid          	msix_interrupts	subclass_code	vendorid
 The PCI endpoint function driver populates these entries with default values
 when the device is bound to the driver. The pci-epf-test driver populates
 vendorid with 0xffff and interrupt_pin with 0x0001
 	# cat functions/pci_epf_test/func1/vendorid
 	  0xffff
 	# cat functions/pci_epf_test/func1/interrupt_pin
 	  0x0001
 1.4 Configuring pci-epf-test Device
 The user can configure the pci-epf-test device using configfs entry. In order
 to change the vendorid and the number of MSI interrupts used by the function
 device, the following commands can be used.
 	# echo 0x104c > functions/pci_epf_test/func1/vendorid
 	# echo 0xb500 > functions/pci_epf_test/func1/deviceid
 	# echo 16 > functions/pci_epf_test/func1/msi_interrupts
 	# echo 8 > functions/pci_epf_test/func1/msix_interrupts
 1.5 Binding pci-epf-test Device to EP Controller
 In order for the endpoint function device to be useful, it has to be bound to
 a PCI endpoint controller driver. Use the configfs to bind the function
 device to one of the controller driver present in the system.
 	# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
 Once the above step is completed, the PCI endpoint is ready to establish a link
 with the host.
 1.6 Start the Link
 In order for the endpoint device to establish a link with the host, the _start_
 field should be populated with '1'.
 	# echo 1 > controllers/51000000.pcie_ep/start
 2. RootComplex Device
 2.1 lspci Output
 Note that the devices listed here correspond to the value populated in 1.4 above
 	00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
 	01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
 2.2 Using Endpoint Test function Device
 pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
 tests. To compile this tool the following commands should be used:
 	# cd <kernel-dir>
 	# make -C tools/pci
 or if you desire to compile and install in your system:
 	# cd <kernel-dir>
 	# make -C tools/pci install
 The tool and script will be located in <rootfs>/usr/bin/
 2.2.1 pcitest.sh Output
 	# pcitest.sh
 	BAR tests
 	BAR0:           OKAY
 	BAR1:           OKAY
 	BAR2:           OKAY
 	BAR3:           OKAY
 	BAR4:           NOT OKAY
 	BAR5:           NOT OKAY
 	Interrupt tests
 	SET IRQ TYPE TO LEGACY:         OKAY
 	LEGACY IRQ:     NOT OKAY
 	SET IRQ TYPE TO MSI:            OKAY
 	MSI1:           OKAY
 	MSI2:           OKAY
 	MSI3:           OKAY
 	MSI4:           OKAY
 	MSI5:           OKAY
 	MSI6:           OKAY
 	MSI7:           OKAY
 	MSI8:           OKAY
 	MSI9:           OKAY
 	MSI10:          OKAY
 	MSI11:          OKAY
 	MSI12:          OKAY
 	MSI13:          OKAY
 	MSI14:          OKAY
 	MSI15:          OKAY
 	MSI16:          OKAY
 	MSI17:          NOT OKAY
 	MSI18:          NOT OKAY
 	MSI19:          NOT OKAY
 	MSI20:          NOT OKAY
 	MSI21:          NOT OKAY
 	MSI22:          NOT OKAY
 	MSI23:          NOT OKAY
 	MSI24:          NOT OKAY
 	MSI25:          NOT OKAY
 	MSI26:          NOT OKAY
 	MSI27:          NOT OKAY
 	MSI28:          NOT OKAY
 	MSI29:          NOT OKAY
 	MSI30:          NOT OKAY
 	MSI31:          NOT OKAY
 	MSI32:          NOT OKAY
 	SET IRQ TYPE TO MSI-X:          OKAY
 	MSI-X1:         OKAY
 	MSI-X2:         OKAY
 	MSI-X3:         OKAY
 	MSI-X4:         OKAY
 	MSI-X5:         OKAY
 	MSI-X6:         OKAY
 	MSI-X7:         OKAY
 	MSI-X8:         OKAY
 	MSI-X9:         NOT OKAY
 	MSI-X10:        NOT OKAY
 	MSI-X11:        NOT OKAY
 	MSI-X12:        NOT OKAY
 	MSI-X13:        NOT OKAY
 	MSI-X14:        NOT OKAY
 	MSI-X15:        NOT OKAY
 	MSI-X16:        NOT OKAY
 	[...]
 	MSI-X2047:      NOT OKAY
 	MSI-X2048:      NOT OKAY
 	Read Tests
 	SET IRQ TYPE TO MSI:            OKAY
 	READ (      1 bytes):           OKAY
 	READ (   1024 bytes):           OKAY
 	READ (   1025 bytes):           OKAY
 	READ (1024000 bytes):           OKAY
 	READ (1024001 bytes):           OKAY
 	Write Tests
 	WRITE (      1 bytes):          OKAY
 	WRITE (   1024 bytes):          OKAY
 	WRITE (   1025 bytes):          OKAY
 	WRITE (1024000 bytes):          OKAY
 	WRITE (1024001 bytes):          OKAY
 	Copy Tests
 	COPY (      1 bytes):           OKAY
 	COPY (   1024 bytes):           OKAY
 	COPY (   1025 bytes):           OKAY
 	COPY (1024000 bytes):           OKAY
 	COPY (1024001 bytes):           OKAY
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@ -0,0 +1,18 @@
 .. SPDX-License-Identifier: GPL-2.0
 =======================
 Linux PCI Bus Subsystem
 =======================
 .. toctree::
   :maxdepth: 2
   :numbered:
   pci
   picebus-howto
   pci-iov-howto
   msi-howto
   acpi-info
   pci-error-recovery
   pcieaer-howto
   endpoint/index
--- a/Documentation/PCI/msi-howto.rst
+++ b/Documentation/PCI/msi-howto.rst
@ -0,0 +1,287 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. include:: <isonum.txt>
 ==========================
 The MSI Driver Guide HOWTO
 ==========================
 :Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
 :Copyright: 2003, 2008 Intel Corporation
 About this guide
 ================
 This guide describes the basics of Message Signaled Interrupts (MSIs),
 the advantages of using MSI over traditional interrupt mechanisms, how
 to change your driver to use MSI or MSI-X and some basic diagnostics to
 try if a device doesn't support MSIs.
 What are MSIs?
 ==============
 A Message Signaled Interrupt is a write from the device to a special
 address which causes an interrupt to be received by the CPU.
 The MSI capability was first specified in PCI 2.2 and was later enhanced
 in PCI 3.0 to allow each interrupt to be masked individually.  The MSI-X
 capability was also introduced with PCI 3.0.  It supports more interrupts
 per device than MSI and allows interrupts to be independently configured.
 Devices may support both MSI and MSI-X, but only one can be enabled at
 a time.
 Why use MSIs?
 =============
 There are three reasons why using MSIs can give an advantage over
 traditional pin-based interrupts.
 Pin-based PCI interrupts are often shared amongst several devices.
 To support this, the kernel must call each interrupt handler associated
 with an interrupt, which leads to reduced performance for the system as
 a whole.  MSIs are never shared, so this problem cannot arise.
 When a device writes data to memory, then raises a pin-based interrupt,
 it is possible that the interrupt may arrive before all the data has
 arrived in memory (this becomes more likely with devices behind PCI-PCI
 bridges).  In order to ensure that all the data has arrived in memory,
 the interrupt handler must read a register on the device which raised
 the interrupt.  PCI transaction ordering rules require that all the data
 arrive in memory before the value may be returned from the register.
 Using MSIs avoids this problem as the interrupt-generating write cannot
 pass the data writes, so by the time the interrupt is raised, the driver
 knows that all the data has arrived in memory.
 PCI devices can only support a single pin-based interrupt per function.
 Often drivers have to query the device to find out what event has
 occurred, slowing down interrupt handling for the common case.  With
 MSIs, a device can support more interrupts, allowing each interrupt
 to be specialised to a different purpose.  One possible design gives
 infrequent conditions (such as errors) their own interrupt which allows
 the driver to handle the normal interrupt handling path more efficiently.
 Other possible designs include giving one interrupt to each packet queue
 in a network card or each port in a storage controller.
 How to use MSIs
 ===============
 PCI devices are initialised to use pin-based interrupts.  The device
 driver has to set up the device to use MSI or MSI-X.  Not all machines
 support MSIs correctly, and for those machines, the APIs described below
 will simply fail and the device will continue to use pin-based interrupts.
 Include kernel support for MSIs
 -------------------------------
 To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
 option enabled.  This option is only available on some architectures,
 and it may depend on some other options also being set.  For example,
 on x86, you must also enable X86_UP_APIC or SMP in order to see the
 CONFIG_PCI_MSI option.
 Using MSI
 ---------
 Most of the hard work is done for the driver in the PCI layer.  The driver
 simply has to request that the PCI layer set up the MSI capability for this
 device.
 To automatically use MSI or MSI-X interrupt vectors, use the following
 function::
  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		unsigned int max_vecs, unsigned int flags);
 which allocates up to max_vecs interrupt vectors for a PCI device.  It
 returns the number of vectors allocated or a negative error.  If the device
 has a requirements for a minimum number of vectors the driver can pass a
 min_vecs argument set to this limit, and the PCI core will return -ENOSPC
 if it can't meet the minimum number of vectors.
 The flags argument is used to specify which type of interrupt can be used
 by the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX).
 A convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for
 any possible kind of interrupt.  If the PCI_IRQ_AFFINITY flag is set,
 pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
 To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
 vectors, use the following function::
  int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
 Any allocated resources should be freed before removing the device using
 the following function::
  void pci_free_irq_vectors(struct pci_dev *dev);
 If a device supports both MSI-X and MSI capabilities, this API will use the
 MSI-X facilities in preference to the MSI facilities.  MSI-X supports any
 number of interrupts between 1 and 2048.  In contrast, MSI is restricted to
 a maximum of 32 interrupts (and must be a power of two).  In addition, the
 MSI interrupt vectors must be allocated consecutively, so the system might
 not be able to allocate as many vectors for MSI as it could for MSI-X.  On
 some platforms, MSI interrupts must all be targeted at the same set of CPUs
 whereas MSI-X interrupts can all be targeted at different CPUs.
 If a device supports neither MSI-X or MSI it will fall back to a single
 legacy IRQ vector.
 The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
 as possible, likely up to the limit supported by the device.  If nvec is
 larger than the number supported by the device it will automatically be
 capped to the supported limit, so there is no need to query the number of
 vectors supported beforehand::
 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
 	if (nvec < 0)
 		goto out_err;
 If a driver is unable or unwilling to deal with a variable number of MSI
 interrupts it can request a particular number of interrupts by passing that
 number to pci_alloc_irq_vectors() function as both 'min_vecs' and
 'max_vecs' parameters::
 	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
 	if (ret < 0)
 		goto out_err;
 The most notorious example of the request type described above is enabling
 the single MSI mode for a device.  It could be done by passing two 1s as
 'min_vecs' and 'max_vecs'::
 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
 	if (ret < 0)
 		goto out_err;
 Some devices might not support using legacy line interrupts, in which case
 the driver can specify that only MSI or MSI-X is acceptable::
 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
 	if (nvec < 0)
 		goto out_err;
 Legacy APIs
 -----------
 The following old APIs to enable and disable MSI or MSI-X interrupts should
 not be used in new code::
  pci_enable_msi()		/* deprecated */
  pci_disable_msi()		/* deprecated */
  pci_enable_msix_range()	/* deprecated */
  pci_enable_msix_exact()	/* deprecated */
  pci_disable_msix()		/* deprecated */
 Additionally there are APIs to provide the number of supported MSI or MSI-X
 vectors: pci_msi_vec_count() and pci_msix_vec_count().  In general these
 should be avoided in favor of letting pci_alloc_irq_vectors() cap the
 number of vectors.  If you have a legitimate special use case for the count
 of vectors we might have to revisit that decision and add a
 pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
 Considerations when using MSIs
 ------------------------------
 Spinlocks
 ~~~~~~~~~
 Most device drivers have a per-device spinlock which is taken in the
 interrupt handler.  With pin-based interrupts or a single MSI, it is not
 necessary to disable interrupts (Linux guarantees the same interrupt will
 not be re-entered).  If a device uses multiple interrupts, the driver
 must disable interrupts while the lock is held.  If the device sends
 a different interrupt, the driver will deadlock trying to recursively
 acquire the spinlock.  Such deadlocks can be avoided by using
 spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
 and acquire the lock (see Documentation/kernel-hacking/locking.rst).
 How to tell whether MSI/MSI-X is enabled on a device
 ----------------------------------------------------
 Using 'lspci -v' (as root) may show some devices with "MSI", "Message
 Signalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities
 has an 'Enable' flag which is followed with either "+" (enabled)
 or "-" (disabled).
 MSI quirks
 ==========
 Several PCI chipsets or devices are known not to support MSIs.
 The PCI stack provides three ways to disable MSIs:
 1. globally
 2. on all devices behind a specific bridge
 3. on a single device
 Disabling MSIs globally
 -----------------------
 Some host chipsets simply don't support MSIs properly.  If we're
 lucky, the manufacturer knows this and has indicated it in the ACPI
 FADT table.  In this case, Linux automatically disables MSIs.
 Some boards don't include this information in the table and so we have
 to detect them ourselves.  The complete list of these is found near the
 quirk_disable_all_msi() function in drivers/pci/quirks.c.
 If you have a board which has problems with MSIs, you can pass pci=nomsi
 on the kernel command line to disable MSIs on all devices.  It would be
 in your best interests to report the problem to linux-pci@vger.kernel.org
 including a full 'lspci -v' so we can add the quirks to the kernel.
 Disabling MSIs below a bridge
 -----------------------------
 Some PCI bridges are not able to route MSIs between busses properly.
 In this case, MSIs must be disabled on all devices behind the bridge.
 Some bridges allow you to enable MSIs by changing some bits in their
 PCI configuration space (especially the Hypertransport chipsets such
 as the nVidia nForce and Serverworks HT2000).  As with host chipsets,
 Linux mostly knows about them and automatically enables MSIs if it can.
 If you have a bridge unknown to Linux, you can enable
 MSIs in configuration space using whatever method you know works, then
 enable MSIs on that bridge by doing::
       echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
 where $bridge is the PCI address of the bridge you've enabled (eg
 0000:00:0e.0).
 To disable MSIs, echo 0 instead of 1.  Changing this value should be
 done with caution as it could break interrupt handling for all devices
 below this bridge.
 Again, please notify linux-pci@vger.kernel.org of any bridges that need
 special handling.
 Disabling MSIs on a single device
 ---------------------------------
 Some devices are known to have faulty MSI implementations.  Usually this
 is handled in the individual device driver, but occasionally it's necessary
 to handle this with a quirk.  Some drivers have an option to disable use
 of MSI.  While this is a convenient workaround for the driver author,
 it is not good practice, and should not be emulated.
 Finding why MSIs are disabled on a device
 -----------------------------------------
 From the above three sections, you can see that there are many reasons
 why MSIs may not be enabled for a given device.  Your first step should
 be to examine your dmesg carefully to determine whether MSIs are enabled
 for your machine.  You should also check your .config to be sure you
 have enabled CONFIG_PCI_MSI.
 Then, 'lspci -t' gives the list of bridges above a device. Reading
 `/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
 or disabled (0).  If 0 is found in any of the msi_bus files belonging
 to bridges between the PCI root and the device, MSIs are disabled.
 It is also worth checking the device driver to see whether it supports MSIs.
 For example, it may contain calls to pci_irq_alloc_vectors() with the
 PCI_IRQ_MSI or PCI_IRQ_MSIX flags.
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@ -0,0 +1,424 @@
 .. SPDX-License-Identifier: GPL-2.0
 ==================
 PCI Error Recovery
 ==================
 :Authors: - Linas Vepstas <linasvepstas@gmail.com>
          - Richard Lary <rlary@us.ibm.com>
          - Mike Mason <mmlnx@us.ibm.com>
 Many PCI bus controllers are able to detect a variety of hardware
 PCI errors on the bus, such as parity errors on the data and address
 buses, as well as SERR and PERR errors.  Some of the more advanced
 chipsets are able to deal with these errors; these include PCI-E chipsets,
 and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
 pSeries boxes. A typical action taken is to disconnect the affected device,
 halting all I/O to it.  The goal of a disconnection is to avoid system
 corruption; for example, to halt system memory corruption due to DMA's
 to "wild" addresses. Typically, a reconnection mechanism is also
 offered, so that the affected PCI device(s) are reset and put back
 into working condition. The reset phase requires coordination
 between the affected device drivers and the PCI controller chip.
 This document describes a generic API for notifying device drivers
 of a bus disconnection, and then performing error recovery.
 This API is currently implemented in the 2.6.16 and later kernels.
 Reporting and recovery is performed in several steps. First, when
 a PCI hardware error has resulted in a bus disconnect, that event
 is reported as soon as possible to all affected device drivers,
 including multiple instances of a device driver on multi-function
 cards. This allows device drivers to avoid deadlocking in spinloops,
 waiting for some i/o-space register to change, when it never will.
 It also gives the drivers a chance to defer incoming I/O as
 needed.
 Next, recovery is performed in several stages. Most of the complexity
 is forced by the need to handle multi-function devices, that is,
 devices that have multiple device drivers associated with them.
 In the first stage, each driver is allowed to indicate what type
 of reset it desires, the choices being a simple re-enabling of I/O
 or requesting a slot reset.
 If any driver requests a slot reset, that is what will be done.
 After a reset and/or a re-enabling of I/O, all drivers are
 again notified, so that they may then perform any device setup/config
 that may be required.  After these have all completed, a final
 "resume normal operations" event is sent out.
 The biggest reason for choosing a kernel-based implementation rather
 than a user-space implementation was the need to deal with bus
 disconnects of PCI devices attached to storage media, and, in particular,
 disconnects from devices holding the root file system.  If the root
 file system is disconnected, a user-space mechanism would have to go
 through a large number of contortions to complete recovery. Almost all
 of the current Linux file systems are not tolerant of disconnection
 from/reconnection to their underlying block device. By contrast,
 bus errors are easy to manage in the device driver. Indeed, most
 device drivers already handle very similar recovery procedures;
 for example, the SCSI-generic layer already provides significant
 mechanisms for dealing with SCSI bus errors and SCSI bus resets.
 Detailed Design
 ===============
 Design and implementation details below, based on a chain of
 public email discussions with Ben Herrenschmidt, circa 5 April 2005.
 The error recovery API support is exposed to the driver in the form of
 a structure of function pointers pointed to by a new field in struct
 pci_driver. A driver that fails to provide the structure is "non-aware",
 and the actual recovery steps taken are platform dependent.  The
 arch/powerpc implementation will simulate a PCI hotplug remove/add.
 This structure has the form::
 	struct pci_error_handlers
 	{
 		int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
 		int (*mmio_enabled)(struct pci_dev *dev);
 		int (*slot_reset)(struct pci_dev *dev);
 		void (*resume)(struct pci_dev *dev);
 	};
 The possible channel states are::
 	enum pci_channel_state {
 		pci_channel_io_normal,  /* I/O channel is in normal state */
 		pci_channel_io_frozen,  /* I/O to channel is blocked */
 		pci_channel_io_perm_failure, /* PCI card is dead */
 	};
 Possible return values are::
 	enum pci_ers_result {
 		PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */
 		PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
 		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
 		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
 		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
 	};
 A driver does not have to implement all of these callbacks; however,
 if it implements any, it must implement error_detected(). If a callback
 is not implemented, the corresponding feature is considered unsupported.
 For example, if mmio_enabled() and resume() aren't there, then it
 is assumed that the driver is not doing any direct recovery and requires
 a slot reset.  Typically a driver will want to know about
 a slot_reset().
 The actual steps taken by a platform to recover from a PCI error
 event will be platform-dependent, but will follow the general
 sequence described below.
 STEP 0: Error Event
 -------------------
 A PCI bus error is detected by the PCI hardware.  On powerpc, the slot
 is isolated, in that all I/O is blocked: all reads return 0xffffffff,
 all writes are ignored.
 STEP 1: Notification
 --------------------
 Platform calls the error_detected() callback on every instance of
 every driver affected by the error.
 At this point, the device might not be accessible anymore, depending on
 the platform (the slot will be isolated on powerpc). The driver may
 already have "noticed" the error because of a failing I/O, but this
 is the proper "synchronization point", that is, it gives the driver
 a chance to cleanup, waiting for pending stuff (timers, whatever, etc...)
 to complete; it can take semaphores, schedule, etc... everything but
 touch the device. Within this function and after it returns, the driver
 shouldn't do any new IOs. Called in task context. This is sort of a
 "quiesce" point. See note about interrupts at the end of this doc.
 All drivers participating in this system must implement this call.
 The driver must return one of the following result codes:
  - PCI_ERS_RESULT_CAN_RECOVER
      Driver returns this if it thinks it might be able to recover
      the HW by just banging IOs or if it wants to be given
      a chance to extract some diagnostic information (see
      mmio_enable, below).
  - PCI_ERS_RESULT_NEED_RESET
      Driver returns this if it can't recover without a
      slot reset.
  - PCI_ERS_RESULT_DISCONNECT
      Driver returns this if it doesn't want to recover at all.
 The next step taken will depend on the result codes returned by the
 drivers.
 If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
 then the platform should re-enable IOs on the slot (or do nothing in
 particular, if the platform doesn't isolate slots), and recovery
 proceeds to STEP 2 (MMIO Enable).
 If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
 then recovery proceeds to STEP 4 (Slot Reset).
 If the platform is unable to recover the slot, the next step
 is STEP 6 (Permanent Failure).
 .. note::
   The current powerpc implementation assumes that a device driver will
   *not* schedule or semaphore in this routine; the current powerpc
   implementation uses one kernel thread to notify all devices;
   thus, if one device sleeps/schedules, all devices are affected.
   Doing better requires complex multi-threaded logic in the error
   recovery implementation (e.g. waiting for all notification threads
   to "join" before proceeding with recovery.)  This seems excessively
   complex and not worth implementing.
   The current powerpc implementation doesn't much care if the device
   attempts I/O at this point, or not.  I/O's will fail, returning
   a value of 0xff on read, and writes will be dropped. If more than
   EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
   assumes that the device driver has gone into an infinite loop
   and prints an error to syslog.  A reboot is then required to
   get the device working again.
 STEP 2: MMIO Enabled
 --------------------
 The platform re-enables MMIO to the device (but typically not the
 DMA), and then calls the mmio_enabled() callback on all affected
 device drivers.
 This is the "early recovery" call. IOs are allowed again, but DMA is
 not, with some restrictions. This is NOT a callback for the driver to
 start operations again, only to peek/poke at the device, extract diagnostic
 information, if any, and eventually do things like trigger a device local
 reset or some such, but not restart operations. This callback is made if
 all drivers on a segment agree that they can try to recover and if no automatic
 link reset was performed by the HW. If the platform can't just re-enable IOs
 without a slot reset or a link reset, it will not call this callback, and
 instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
 .. note::
   The following is proposed; no platform implements this yet:
   Proposal: All I/O's should be done _synchronously_ from within
   this callback, errors triggered by them will be returned via
   the normal pci_check_whatever() API, no new error_detected()
   callback will be issued due to an error happening here. However,
   such an error might cause IOs to be re-blocked for the whole
   segment, and thus invalidate the recovery that other devices
   on the same segment might have done, forcing the whole segment
   into one of the next states, that is, link reset or slot reset.
 The driver should return one of the following result codes:
  - PCI_ERS_RESULT_RECOVERED
      Driver returns this if it thinks the device is fully
      functional and thinks it is ready to start
      normal driver operations again. There is no
      guarantee that the driver will actually be
      allowed to proceed, as another driver on the
      same segment might have failed and thus triggered a
      slot reset on platforms that support it.
  - PCI_ERS_RESULT_NEED_RESET
      Driver returns this if it thinks the device is not
      recoverable in its current state and it needs a slot
      reset to proceed.
  - PCI_ERS_RESULT_DISCONNECT
      Same as above. Total failure, no recovery even after
      reset driver dead. (To be defined more precisely)
 The next step taken depends on the results returned by the drivers.
 If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
 proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
 If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
 proceeds to STEP 4 (Slot Reset)
 STEP 3: Link Reset
 ------------------
 The platform resets the link.  This is a PCI-Express specific step
 and is done whenever a fatal error has been detected that can be
 "solved" by resetting the link.
 STEP 4: Slot Reset
 ------------------
 In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
 the platform will perform a slot reset on the requesting PCI device(s).
 The actual steps taken by a platform to perform a slot reset
 will be platform-dependent. Upon completion of slot reset, the
 platform will call the device slot_reset() callback.
 Powerpc platforms implement two levels of slot reset:
 soft reset(default) and fundamental(optional) reset.
 Powerpc soft reset consists of asserting the adapter #RST line and then
 restoring the PCI BAR's and PCI configuration header to a state
 that is equivalent to what it would be after a fresh system
 power-on followed by power-on BIOS/system firmware initialization.
 Soft reset is also known as hot-reset.
 Powerpc fundamental reset is supported by PCI Express cards only
 and results in device's state machines, hardware logic, port states and
 configuration registers to initialize to their default conditions.
 For most PCI devices, a soft reset will be sufficient for recovery.
 Optional fundamental reset is provided to support a limited number
 of PCI Express devices for which a soft reset is not sufficient
 for recovery.
 If the platform supports PCI hotplug, then the reset might be
 performed by toggling the slot electrical power off/on.
 It is important for the platform to restore the PCI config space
 to the "fresh poweron" state, rather than the "last state". After
 a slot reset, the device driver will almost always use its standard
 device initialization routines, and an unusual config space setup
 may result in hung devices, kernel panics, or silent data corruption.
 This call gives drivers the chance to re-initialize the hardware
 (re-download firmware, etc.).  At this point, the driver may assume
 that the card is in a fresh state and is fully functional. The slot
 is unfrozen and the driver has full access to PCI config space,
 memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
 will also be available.
 Drivers should not restart normal I/O processing operations
 at this point.  If all device drivers report success on this
 callback, the platform will call resume() to complete the sequence,
 and let the driver restart normal I/O processing.
 A driver can still return a critical failure for this function if
 it can't get the device operational after reset.  If the platform
 previously tried a soft reset, it might now try a hard reset (power
 cycle) and then call slot_reset() again.  It the device still can't
 be recovered, there is nothing more that can be done;  the platform
 will typically report a "permanent failure" in such a case.  The
 device will be considered "dead" in this case.
 Drivers for multi-function cards will need to coordinate among
 themselves as to which driver instance will perform any "one-shot"
 or global device initialization. For example, the Symbios sym53cxx2
 driver performs device init only from PCI function 0::
 	+       if (PCI_FUNC(pdev->devfn) == 0)
 	+               sym_reset_scsi_bus(np, 0);
 Result codes:
 	- PCI_ERS_RESULT_DISCONNECT
 	  Same as above.
 Drivers for PCI Express cards that require a fundamental reset must
 set the needs_freset bit in the pci_dev structure in their probe function.
 For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
 PCI card types::
 	+	/* Set EEH reset type to fundamental if required by hba  */
 	+	if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
 	+		pdev->needs_freset = 1;
 	+
 Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
 Failure).
 .. note::
   The current powerpc implementation does not try a power-cycle
   reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
   However, it probably should.
 STEP 5: Resume Operations
 -------------------------
 The platform will call the resume() callback on all affected device
 drivers if all drivers on the segment have returned
 PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks.
 The goal of this callback is to tell the driver to restart activity,
 that everything is back and running. This callback does not return
 a result code.
 At this point, if a new error happens, the platform will restart
 a new error recovery sequence.
 STEP 6: Permanent Failure
 -------------------------
 A "permanent failure" has occurred, and the platform cannot recover
 the device.  The platform will call error_detected() with a
 pci_channel_state value of pci_channel_io_perm_failure.
 The device driver should, at this point, assume the worst. It should
 cancel all pending I/O, refuse all new I/O, returning -EIO to
 higher layers. The device driver should then clean up all of its
 memory and remove itself from kernel operations, much as it would
 during system shutdown.
 The platform will typically notify the system operator of the
 permanent failure in some way.  If the device is hotplug-capable,
 the operator will probably want to remove and replace the device.
 Note, however, not all failures are truly "permanent". Some are
 caused by over-heating, some by a poorly seated card. Many
 PCI error events are caused by software bugs, e.g. DMA's to
 wild addresses or bogus split transactions due to programming
 errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
 for additional detail on real-life experience of the causes of
 software errors.
 Conclusion; General Remarks
 ---------------------------
 The way the callbacks are called is platform policy. A platform with
 no slot reset capability may want to just "ignore" drivers that can't
 recover (disconnect them) and try to let other cards on the same segment
 recover. Keep in mind that in most real life cases, though, there will
 be only one driver per segment.
 Now, a note about interrupts. If you get an interrupt and your
 device is dead or has been isolated, there is a problem :)
 The current policy is to turn this into a platform policy.
 That is, the recovery API only requires that:
 - There is no guarantee that interrupt delivery can proceed from any
   device on the segment starting from the error detection and until the
   slot_reset callback is called, at which point interrupts are expected
   to be fully operational.
 - There is no guarantee that interrupt delivery is stopped, that is,
   a driver that gets an interrupt after detecting an error, or that detects
   an error within the interrupt handler such that it prevents proper
   ack'ing of the interrupt (and thus removal of the source) should just
   return IRQ_NOTHANDLED. It's up to the platform to deal with that
   condition, typically by masking the IRQ source during the duration of
   the error handling. It is expected that the platform "knows" which
   interrupts are routed to error-management capable slots and can deal
   with temporarily disabling that IRQ number during error processing (this
   isn't terribly complex). That means some IRQ latency for other devices
   sharing the interrupt, but there is simply no other way. High end
   platforms aren't supposed to share interrupts between many devices
   anyway :)
 .. note::
   Implementation details for the powerpc platform are discussed in
   the file Documentation/powerpc/eeh-pci-error-recovery.txt
   As of this writing, there is a growing list of device drivers with
   patches implementing error recovery. Not all of these patches are in
   mainline yet. These may be used as "examples":
   - drivers/scsi/ipr
   - drivers/scsi/sym53c8xx_2
   - drivers/scsi/qla2xxx
   - drivers/scsi/lpfc
   - drivers/next/bnx2.c
   - drivers/next/e100.c
   - drivers/net/e1000
   - drivers/net/e1000e
   - drivers/net/ixgb
   - drivers/net/ixgbe
   - drivers/net/cxgb3
   - drivers/net/s2io.c
   - drivers/net/qlge
--- a/Documentation/PCI/pci-error-recovery.txt
+++ b/Documentation/PCI/pci-error-recovery.txt
@ -1,413 +0,0 @@
                       PCI Error Recovery
                       ------------------
                        February 2, 2006
                 Current document maintainer:
             Linas Vepstas <linasvepstas@gmail.com>
          updated by Richard Lary <rlary@us.ibm.com>
       and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
 Many PCI bus controllers are able to detect a variety of hardware
 PCI errors on the bus, such as parity errors on the data and address
 buses, as well as SERR and PERR errors.  Some of the more advanced
 chipsets are able to deal with these errors; these include PCI-E chipsets,
 and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
 pSeries boxes. A typical action taken is to disconnect the affected device,
 halting all I/O to it.  The goal of a disconnection is to avoid system
 corruption; for example, to halt system memory corruption due to DMA's
 to "wild" addresses. Typically, a reconnection mechanism is also
 offered, so that the affected PCI device(s) are reset and put back
 into working condition. The reset phase requires coordination
 between the affected device drivers and the PCI controller chip.
 This document describes a generic API for notifying device drivers
 of a bus disconnection, and then performing error recovery.
 This API is currently implemented in the 2.6.16 and later kernels.
 Reporting and recovery is performed in several steps. First, when
 a PCI hardware error has resulted in a bus disconnect, that event
 is reported as soon as possible to all affected device drivers,
 including multiple instances of a device driver on multi-function
 cards. This allows device drivers to avoid deadlocking in spinloops,
 waiting for some i/o-space register to change, when it never will.
 It also gives the drivers a chance to defer incoming I/O as
 needed.
 Next, recovery is performed in several stages. Most of the complexity
 is forced by the need to handle multi-function devices, that is,
 devices that have multiple device drivers associated with them.
 In the first stage, each driver is allowed to indicate what type
 of reset it desires, the choices being a simple re-enabling of I/O
 or requesting a slot reset.
 If any driver requests a slot reset, that is what will be done.
 After a reset and/or a re-enabling of I/O, all drivers are
 again notified, so that they may then perform any device setup/config
 that may be required.  After these have all completed, a final
 "resume normal operations" event is sent out.
 The biggest reason for choosing a kernel-based implementation rather
 than a user-space implementation was the need to deal with bus
 disconnects of PCI devices attached to storage media, and, in particular,
 disconnects from devices holding the root file system.  If the root
 file system is disconnected, a user-space mechanism would have to go
 through a large number of contortions to complete recovery. Almost all
 of the current Linux file systems are not tolerant of disconnection
 from/reconnection to their underlying block device. By contrast,
 bus errors are easy to manage in the device driver. Indeed, most
 device drivers already handle very similar recovery procedures;
 for example, the SCSI-generic layer already provides significant
 mechanisms for dealing with SCSI bus errors and SCSI bus resets.
 Detailed Design
 ---------------
 Design and implementation details below, based on a chain of
 public email discussions with Ben Herrenschmidt, circa 5 April 2005.
 The error recovery API support is exposed to the driver in the form of
 a structure of function pointers pointed to by a new field in struct
 pci_driver. A driver that fails to provide the structure is "non-aware",
 and the actual recovery steps taken are platform dependent.  The
 arch/powerpc implementation will simulate a PCI hotplug remove/add.
 This structure has the form:
 struct pci_error_handlers
 {
 	int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
 	int (*mmio_enabled)(struct pci_dev *dev);
 	int (*slot_reset)(struct pci_dev *dev);
 	void (*resume)(struct pci_dev *dev);
 };
 The possible channel states are:
 enum pci_channel_state {
 	pci_channel_io_normal,  /* I/O channel is in normal state */
 	pci_channel_io_frozen,  /* I/O to channel is blocked */
 	pci_channel_io_perm_failure, /* PCI card is dead */
 };
 Possible return values are:
 enum pci_ers_result {
 	PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */
 	PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
 	PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
 	PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
 	PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
 };
 A driver does not have to implement all of these callbacks; however,
 if it implements any, it must implement error_detected(). If a callback
 is not implemented, the corresponding feature is considered unsupported.
 For example, if mmio_enabled() and resume() aren't there, then it
 is assumed that the driver is not doing any direct recovery and requires
 a slot reset.  Typically a driver will want to know about
 a slot_reset().
 The actual steps taken by a platform to recover from a PCI error
 event will be platform-dependent, but will follow the general
 sequence described below.
 STEP 0: Error Event
 -------------------
 A PCI bus error is detected by the PCI hardware.  On powerpc, the slot
 is isolated, in that all I/O is blocked: all reads return 0xffffffff,
 all writes are ignored.
 STEP 1: Notification
 --------------------
 Platform calls the error_detected() callback on every instance of
 every driver affected by the error.
 At this point, the device might not be accessible anymore, depending on
 the platform (the slot will be isolated on powerpc). The driver may
 already have "noticed" the error because of a failing I/O, but this
 is the proper "synchronization point", that is, it gives the driver
 a chance to cleanup, waiting for pending stuff (timers, whatever, etc...)
 to complete; it can take semaphores, schedule, etc... everything but
 touch the device. Within this function and after it returns, the driver
 shouldn't do any new IOs. Called in task context. This is sort of a
 "quiesce" point. See note about interrupts at the end of this doc.
 All drivers participating in this system must implement this call.
 The driver must return one of the following result codes:
 		- PCI_ERS_RESULT_CAN_RECOVER:
 		  Driver returns this if it thinks it might be able to recover
 		  the HW by just banging IOs or if it wants to be given
 		  a chance to extract some diagnostic information (see
 		  mmio_enable, below).
 		- PCI_ERS_RESULT_NEED_RESET:
 		  Driver returns this if it can't recover without a
 		  slot reset.
 		- PCI_ERS_RESULT_DISCONNECT:
 		  Driver returns this if it doesn't want to recover at all.
 The next step taken will depend on the result codes returned by the
 drivers.
 If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
 then the platform should re-enable IOs on the slot (or do nothing in
 particular, if the platform doesn't isolate slots), and recovery
 proceeds to STEP 2 (MMIO Enable).
 If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
 then recovery proceeds to STEP 4 (Slot Reset).
 If the platform is unable to recover the slot, the next step
 is STEP 6 (Permanent Failure).
 >>> The current powerpc implementation assumes that a device driver will
 >>> *not* schedule or semaphore in this routine; the current powerpc
 >>> implementation uses one kernel thread to notify all devices;
 >>> thus, if one device sleeps/schedules, all devices are affected.
 >>> Doing better requires complex multi-threaded logic in the error
 >>> recovery implementation (e.g. waiting for all notification threads
 >>> to "join" before proceeding with recovery.)  This seems excessively
 >>> complex and not worth implementing.
 >>> The current powerpc implementation doesn't much care if the device
 >>> attempts I/O at this point, or not.  I/O's will fail, returning
 >>> a value of 0xff on read, and writes will be dropped. If more than
 >>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
 >>> assumes that the device driver has gone into an infinite loop
 >>> and prints an error to syslog.  A reboot is then required to
 >>> get the device working again.
 STEP 2: MMIO Enabled
 -------------------
 The platform re-enables MMIO to the device (but typically not the
 DMA), and then calls the mmio_enabled() callback on all affected
 device drivers.
 This is the "early recovery" call. IOs are allowed again, but DMA is
 not, with some restrictions. This is NOT a callback for the driver to
 start operations again, only to peek/poke at the device, extract diagnostic
 information, if any, and eventually do things like trigger a device local
 reset or some such, but not restart operations. This callback is made if
 all drivers on a segment agree that they can try to recover and if no automatic
 link reset was performed by the HW. If the platform can't just re-enable IOs
 without a slot reset or a link reset, it will not call this callback, and
 instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
 >>> The following is proposed; no platform implements this yet:
 >>> Proposal: All I/O's should be done _synchronously_ from within
 >>> this callback, errors triggered by them will be returned via
 >>> the normal pci_check_whatever() API, no new error_detected()
 >>> callback will be issued due to an error happening here. However,
 >>> such an error might cause IOs to be re-blocked for the whole
 >>> segment, and thus invalidate the recovery that other devices
 >>> on the same segment might have done, forcing the whole segment
 >>> into one of the next states, that is, link reset or slot reset.
 The driver should return one of the following result codes:
 		- PCI_ERS_RESULT_RECOVERED
 		  Driver returns this if it thinks the device is fully
 		  functional and thinks it is ready to start
 		  normal driver operations again. There is no
 		  guarantee that the driver will actually be
 		  allowed to proceed, as another driver on the
 		  same segment might have failed and thus triggered a
 		  slot reset on platforms that support it.
 		- PCI_ERS_RESULT_NEED_RESET
 		  Driver returns this if it thinks the device is not
 		  recoverable in its current state and it needs a slot
 		  reset to proceed.
 		- PCI_ERS_RESULT_DISCONNECT
 		  Same as above. Total failure, no recovery even after
 		  reset driver dead. (To be defined more precisely)
 The next step taken depends on the results returned by the drivers.
 If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
 proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
 If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
 proceeds to STEP 4 (Slot Reset)
 STEP 3: Link Reset
 ------------------
 The platform resets the link.  This is a PCI-Express specific step
 and is done whenever a fatal error has been detected that can be
 "solved" by resetting the link.
 STEP 4: Slot Reset
 ------------------
 In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
 the platform will perform a slot reset on the requesting PCI device(s).
 The actual steps taken by a platform to perform a slot reset
 will be platform-dependent. Upon completion of slot reset, the
 platform will call the device slot_reset() callback.
 Powerpc platforms implement two levels of slot reset:
 soft reset(default) and fundamental(optional) reset.
 Powerpc soft reset consists of asserting the adapter #RST line and then
 restoring the PCI BAR's and PCI configuration header to a state
 that is equivalent to what it would be after a fresh system
 power-on followed by power-on BIOS/system firmware initialization.
 Soft reset is also known as hot-reset.
 Powerpc fundamental reset is supported by PCI Express cards only
 and results in device's state machines, hardware logic, port states and
 configuration registers to initialize to their default conditions.
 For most PCI devices, a soft reset will be sufficient for recovery.
 Optional fundamental reset is provided to support a limited number
 of PCI Express devices for which a soft reset is not sufficient
 for recovery.
 If the platform supports PCI hotplug, then the reset might be
 performed by toggling the slot electrical power off/on.
 It is important for the platform to restore the PCI config space
 to the "fresh poweron" state, rather than the "last state". After
 a slot reset, the device driver will almost always use its standard
 device initialization routines, and an unusual config space setup
 may result in hung devices, kernel panics, or silent data corruption.
 This call gives drivers the chance to re-initialize the hardware
 (re-download firmware, etc.).  At this point, the driver may assume
 that the card is in a fresh state and is fully functional. The slot
 is unfrozen and the driver has full access to PCI config space,
 memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
 will also be available.
 Drivers should not restart normal I/O processing operations
 at this point.  If all device drivers report success on this
 callback, the platform will call resume() to complete the sequence,
 and let the driver restart normal I/O processing.
 A driver can still return a critical failure for this function if
 it can't get the device operational after reset.  If the platform
 previously tried a soft reset, it might now try a hard reset (power
 cycle) and then call slot_reset() again.  It the device still can't
 be recovered, there is nothing more that can be done;  the platform
 will typically report a "permanent failure" in such a case.  The
 device will be considered "dead" in this case.
 Drivers for multi-function cards will need to coordinate among
 themselves as to which driver instance will perform any "one-shot"
 or global device initialization. For example, the Symbios sym53cxx2
 driver performs device init only from PCI function 0:
 +       if (PCI_FUNC(pdev->devfn) == 0)
 +               sym_reset_scsi_bus(np, 0);
 	Result codes:
 		- PCI_ERS_RESULT_DISCONNECT
 		Same as above.
 Drivers for PCI Express cards that require a fundamental reset must
 set the needs_freset bit in the pci_dev structure in their probe function.
 For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
 PCI card types:
 +	/* Set EEH reset type to fundamental if required by hba  */
 +	if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
 +		pdev->needs_freset = 1;
 +
 Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
 Failure).
 >>> The current powerpc implementation does not try a power-cycle
 >>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
 >>> However, it probably should.
 STEP 5: Resume Operations
 -------------------------
 The platform will call the resume() callback on all affected device
 drivers if all drivers on the segment have returned
 PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks.
 The goal of this callback is to tell the driver to restart activity,
 that everything is back and running. This callback does not return
 a result code.
 At this point, if a new error happens, the platform will restart
 a new error recovery sequence.
 STEP 6: Permanent Failure
 -------------------------
 A "permanent failure" has occurred, and the platform cannot recover
 the device.  The platform will call error_detected() with a
 pci_channel_state value of pci_channel_io_perm_failure.
 The device driver should, at this point, assume the worst. It should
 cancel all pending I/O, refuse all new I/O, returning -EIO to
 higher layers. The device driver should then clean up all of its
 memory and remove itself from kernel operations, much as it would
 during system shutdown.
 The platform will typically notify the system operator of the
 permanent failure in some way.  If the device is hotplug-capable,
 the operator will probably want to remove and replace the device.
 Note, however, not all failures are truly "permanent". Some are
 caused by over-heating, some by a poorly seated card. Many
 PCI error events are caused by software bugs, e.g. DMA's to
 wild addresses or bogus split transactions due to programming
 errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
 for additional detail on real-life experience of the causes of
 software errors.
 Conclusion; General Remarks
 ---------------------------
 The way the callbacks are called is platform policy. A platform with
 no slot reset capability may want to just "ignore" drivers that can't
 recover (disconnect them) and try to let other cards on the same segment
 recover. Keep in mind that in most real life cases, though, there will
 be only one driver per segment.
 Now, a note about interrupts. If you get an interrupt and your
 device is dead or has been isolated, there is a problem :)
 The current policy is to turn this into a platform policy.
 That is, the recovery API only requires that:
 - There is no guarantee that interrupt delivery can proceed from any
 device on the segment starting from the error detection and until the
 slot_reset callback is called, at which point interrupts are expected
 to be fully operational.
 - There is no guarantee that interrupt delivery is stopped, that is,
 a driver that gets an interrupt after detecting an error, or that detects
 an error within the interrupt handler such that it prevents proper
 ack'ing of the interrupt (and thus removal of the source) should just
 return IRQ_NOTHANDLED. It's up to the platform to deal with that
 condition, typically by masking the IRQ source during the duration of
 the error handling. It is expected that the platform "knows" which
 interrupts are routed to error-management capable slots and can deal
 with temporarily disabling that IRQ number during error processing (this
 isn't terribly complex). That means some IRQ latency for other devices
 sharing the interrupt, but there is simply no other way. High end
 platforms aren't supposed to share interrupts between many devices
 anyway :)
 >>> Implementation details for the powerpc platform are discussed in
 >>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
 >>> As of this writing, there is a growing list of device drivers with
 >>> patches implementing error recovery. Not all of these patches are in
 >>> mainline yet. These may be used as "examples":
 >>>
 >>> drivers/scsi/ipr
 >>> drivers/scsi/sym53c8xx_2
 >>> drivers/scsi/qla2xxx
 >>> drivers/scsi/lpfc
 >>> drivers/next/bnx2.c
 >>> drivers/next/e100.c
 >>> drivers/net/e1000
 >>> drivers/net/e1000e
 >>> drivers/net/ixgb
 >>> drivers/net/ixgbe
 >>> drivers/net/cxgb3
 >>> drivers/net/s2io.c
 >>> drivers/net/qlge
 The End
 -------
--- a/Documentation/PCI/pci-iov-howto.rst
+++ b/Documentation/PCI/pci-iov-howto.rst
@ -0,0 +1,172 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. include:: <isonum.txt>
 ====================================
 PCI Express I/O Virtualization Howto
 ====================================
 :Copyright: |copy| 2009 Intel Corporation
 :Authors: - Yu Zhao <yu.zhao@intel.com>
          - Donald Dutile <ddutile@redhat.com>
 Overview
 ========
 What is SR-IOV
 --------------
 Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
 capability which makes one physical device appear as multiple virtual
 devices. The physical device is referred to as Physical Function (PF)
 while the virtual devices are referred to as Virtual Functions (VF).
 Allocation of the VF can be dynamically controlled by the PF via
 registers encapsulated in the capability. By default, this feature is
 not enabled and the PF behaves as traditional PCIe device. Once it's
 turned on, each VF's PCI configuration space can be accessed by its own
 Bus, Device and Function Number (Routing ID). And each VF also has PCI
 Memory Space, which is used to map its register set. VF device driver
 operates on the register set so it can be functional and appear as a
 real existing PCI device.
 User Guide
 ==========
 How can I enable SR-IOV capability
 ----------------------------------
 Multiple methods are available for SR-IOV enablement.
 In the first method, the device driver (PF driver) will control the
 enabling and disabling of the capability via API provided by SR-IOV core.
 If the hardware has SR-IOV capability, loading its PF driver would
 enable it and all VFs associated with the PF.  Some PF drivers require
 a module parameter to be set to determine the number of VFs to enable.
 In the second method, a write to the sysfs file sriov_numvfs will
 enable and disable the VFs associated with a PCIe PF.  This method
 enables per-PF, VF enable/disable values versus the first method,
 which applies to all PFs of the same device.  Additionally, the
 PCI SRIOV core support ensures that enable/disable operations are
 valid to reduce duplication in multiple drivers for the same
 checks, e.g., check numvfs == 0 if enabling VFs, ensure
 numvfs <= totalvfs.
 The second method is the recommended method for new/future VF devices.
 How can I use the Virtual Functions
 -----------------------------------
 The VF is treated as hot-plugged PCI devices in the kernel, so they
 should be able to work in the same way as real PCI devices. The VF
 requires device driver that is same as a normal PCI device's.
 Developer Guide
 ===============
 SR-IOV API
 ----------
 To enable SR-IOV capability:
 (a) For the first method, in the driver::
 	int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 'nr_virtfn' is number of VFs to be enabled.
 (b) For the second method, from sysfs::
 	echo 'nr_virtfn' > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 To disable SR-IOV capability:
 (a) For the first method, in the driver::
 	void pci_disable_sriov(struct pci_dev *dev);
 (b) For the second method, from sysfs::
 	echo  0 > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 To enable auto probing VFs by a compatible driver on the host, run
 command below before enabling SR-IOV capabilities. This is the
 default behavior.
 ::
 	echo 1 > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
 To disable auto probing VFs by a compatible driver on the host, run
 command below before enabling SR-IOV capabilities. Updating this
 entry will not affect VFs which are already probed.
 ::
 	echo  0 > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
 Usage example
 -------------
 Following piece of code illustrates the usage of the SR-IOV API.
 ::
 	static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	{
 		pci_enable_sriov(dev, NR_VIRTFN);
 		...
 		return 0;
 	}
 	static void dev_remove(struct pci_dev *dev)
 	{
 		pci_disable_sriov(dev);
 		...
 	}
 	static int dev_suspend(struct pci_dev *dev, pm_message_t state)
 	{
 		...
 		return 0;
 	}
 	static int dev_resume(struct pci_dev *dev)
 	{
 		...
 		return 0;
 	}
 	static void dev_shutdown(struct pci_dev *dev)
 	{
 		...
 	}
 	static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
 	{
 		if (numvfs > 0) {
 			...
 			pci_enable_sriov(dev, numvfs);
 			...
 			return numvfs;
 		}
 		if (numvfs == 0) {
 			....
 			pci_disable_sriov(dev);
 			...
 			return 0;
 		}
 	}
 	static struct pci_driver dev_driver = {
 		.name =		"SR-IOV Physical Function driver",
 		.id_table =	dev_id_table,
 		.probe =	dev_probe,
 		.remove =	dev_remove,
 		.suspend =	dev_suspend,
 		.resume =	dev_resume,
 		.shutdown =	dev_shutdown,
 		.sriov_configure = dev_sriov_configure,
 	};
--- a/Documentation/PCI/pci-iov-howto.txt
+++ b/Documentation/PCI/pci-iov-howto.txt
@ -1,147 +0,0 @@
 		PCI Express I/O Virtualization Howto
 		Copyright (C) 2009 Intel Corporation
 		    Yu Zhao <yu.zhao@intel.com>
 		Update: November 2012
 			-- sysfs-based SRIOV enable-/disable-ment
 		Donald Dutile <ddutile@redhat.com>
 1. Overview
 1.1 What is SR-IOV
 Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
 capability which makes one physical device appear as multiple virtual
 devices. The physical device is referred to as Physical Function (PF)
 while the virtual devices are referred to as Virtual Functions (VF).
 Allocation of the VF can be dynamically controlled by the PF via
 registers encapsulated in the capability. By default, this feature is
 not enabled and the PF behaves as traditional PCIe device. Once it's
 turned on, each VF's PCI configuration space can be accessed by its own
 Bus, Device and Function Number (Routing ID). And each VF also has PCI
 Memory Space, which is used to map its register set. VF device driver
 operates on the register set so it can be functional and appear as a
 real existing PCI device.
 2. User Guide
 2.1 How can I enable SR-IOV capability
 Multiple methods are available for SR-IOV enablement.
 In the first method, the device driver (PF driver) will control the
 enabling and disabling of the capability via API provided by SR-IOV core.
 If the hardware has SR-IOV capability, loading its PF driver would
 enable it and all VFs associated with the PF.  Some PF drivers require
 a module parameter to be set to determine the number of VFs to enable.
 In the second method, a write to the sysfs file sriov_numvfs will
 enable and disable the VFs associated with a PCIe PF.  This method
 enables per-PF, VF enable/disable values versus the first method,
 which applies to all PFs of the same device.  Additionally, the
 PCI SRIOV core support ensures that enable/disable operations are
 valid to reduce duplication in multiple drivers for the same
 checks, e.g., check numvfs == 0 if enabling VFs, ensure
 numvfs <= totalvfs.
 The second method is the recommended method for new/future VF devices.
 2.2 How can I use the Virtual Functions
 The VF is treated as hot-plugged PCI devices in the kernel, so they
 should be able to work in the same way as real PCI devices. The VF
 requires device driver that is same as a normal PCI device's.
 3. Developer Guide
 3.1 SR-IOV API
 To enable SR-IOV capability:
 (a) For the first method, in the driver:
 	int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 	'nr_virtfn' is number of VFs to be enabled.
 (b) For the second method, from sysfs:
 	echo 'nr_virtfn' > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 To disable SR-IOV capability:
 (a) For the first method, in the driver:
 	void pci_disable_sriov(struct pci_dev *dev);
 (b) For the second method, from sysfs:
 	echo  0 > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 To enable auto probing VFs by a compatible driver on the host, run
 command below before enabling SR-IOV capabilities. This is the
 default behavior.
 	echo 1 > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
 To disable auto probing VFs by a compatible driver on the host, run
 command below before enabling SR-IOV capabilities. Updating this
 entry will not affect VFs which are already probed.
 	echo  0 > \
        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
 3.2 Usage example
 Following piece of code illustrates the usage of the SR-IOV API.
 static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
 	pci_enable_sriov(dev, NR_VIRTFN);
 	...
 	return 0;
 }
 static void dev_remove(struct pci_dev *dev)
 {
 	pci_disable_sriov(dev);
 	...
 }
 static int dev_suspend(struct pci_dev *dev, pm_message_t state)
 {
 	...
 	return 0;
 }
 static int dev_resume(struct pci_dev *dev)
 {
 	...
 	return 0;
 }
 static void dev_shutdown(struct pci_dev *dev)
 {
 	...
 }
 static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
 {
 	if (numvfs > 0) {
 		...
 		pci_enable_sriov(dev, numvfs);
 		...
 		return numvfs;
 	}
 	if (numvfs == 0) {
 		....
 		pci_disable_sriov(dev);
 		...
 		return 0;
 	}
 }
 static struct pci_driver dev_driver = {
 	.name =		"SR-IOV Physical Function driver",
 	.id_table =	dev_id_table,
 	.probe =	dev_probe,
 	.remove =	dev_remove,
 	.suspend =	dev_suspend,
 	.resume =	dev_resume,
 	.shutdown =	dev_shutdown,
 	.sriov_configure = dev_sriov_configure,
 };
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@ -0,0 +1,578 @@
 .. SPDX-License-Identifier: GPL-2.0
 ==============================
 How To Write Linux PCI Drivers
 ==============================
 :Authors: - Martin Mares <mj@ucw.cz>
          - Grant Grundler <grundler@parisc-linux.org>
 The world of PCI is vast and full of (mostly unpleasant) surprises.
 Since each CPU architecture implements different chip-sets and PCI devices
 have different requirements (erm, "features"), the result is the PCI support
 in the Linux kernel is not as trivial as one would wish. This short paper
 tries to introduce all potential driver authors to Linux APIs for
 PCI device drivers.
 A more complete resource is the third edition of "Linux Device Drivers"
 by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
 LDD3 is available for free (under Creative Commons License) from:
 http://lwn.net/Kernel/LDD3/.
 However, keep in mind that all documents are subject to "bit rot".
 Refer to the source code if things are not working as described here.
 Please send questions/comments/patches about Linux PCI API to the
 "Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
 Structure of PCI drivers
 ========================
 PCI drivers "discover" PCI devices in a system via pci_register_driver().
 Actually, it's the other way around. When the PCI generic code discovers
 a new device, the driver with a matching "description" will be notified.
 Details on this below.
 pci_register_driver() leaves most of the probing for devices to
 the PCI layer and supports online insertion/removal of devices [thus
 supporting hot-pluggable PCI, CardBus, and Express-Card in a single driver].
 pci_register_driver() call requires passing in a table of function
 pointers and thus dictates the high level structure of a driver.
 Once the driver knows about a PCI device and takes ownership, the
 driver generally needs to perform the following initialization:
  - Enable the device
  - Request MMIO/IOP resources
  - Set the DMA mask size (for both coherent and streaming DMA)
  - Allocate and initialize shared control data (pci_allocate_coherent())
  - Access device configuration space (if needed)
  - Register IRQ handler (request_irq())
  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
  - Enable DMA/processing engines
 When done using the device, and perhaps the module needs to be unloaded,
 the driver needs to take the follow steps:
  - Disable the device from generating IRQs
  - Release the IRQ (free_irq())
  - Stop all DMA activity
  - Release DMA buffers (both streaming and coherent)
  - Unregister from other subsystems (e.g. scsi or netdev)
  - Release MMIO/IOP resources
  - Disable the device
 Most of these topics are covered in the following sections.
 For the rest look at LDD3 or <linux/pci.h> .
 If the PCI subsystem is not configured (CONFIG_PCI is not set), most of
 the PCI functions described below are defined as inline functions either
 completely empty or just returning an appropriate error codes to avoid
 lots of ifdefs in the drivers.
 pci_register_driver() call
 ==========================
 PCI device drivers call ``pci_register_driver()`` during their
 initialization with a pointer to a structure describing the driver
 (``struct pci_driver``):
 .. kernel-doc:: include/linux/pci.h
   :functions: pci_driver
 The ID table is an array of ``struct pci_device_id`` entries ending with an
 all-zero entry.  Definitions with static const are generally preferred.
 .. kernel-doc:: include/linux/mod_devicetable.h
   :functions: pci_device_id
 Most drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
 a pci_device_id table.
 New PCI IDs may be added to a device driver pci_ids table at runtime
 as shown below::
  echo "vendor device subvendor subdevice class class_mask driver_data" > \
  /sys/bus/pci/drivers/{driver}/new_id
 All fields are passed in as hexadecimal values (no leading 0x).
 The vendor and device fields are mandatory, the others are optional. Users
 need pass only as many optional fields as necessary:
  - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
  - class and classmask fields default to 0
  - driver_data defaults to 0UL.
 Note that driver_data must match the value used by any of the pci_device_id
 entries defined in the driver. This makes the driver_data field mandatory
 if all the pci_device_id entries have a non-zero driver_data value.
 Once added, the driver probe routine will be invoked for any unclaimed
 PCI devices listed in its (newly updated) pci_ids list.
 When the driver exits, it just calls pci_unregister_driver() and the PCI layer
 automatically calls the remove hook for all devices handled by the driver.
 "Attributes" for driver functions/data
 --------------------------------------
 Please mark the initialization and cleanup functions where appropriate
 (the corresponding macros are defined in <linux/init.h>):
 	======		=================================================
 	__init		Initialization code. Thrown away after the driver
 			initializes.
 	__exit		Exit code. Ignored for non-modular drivers.
 	======		=================================================
 Tips on when/where to use the above attributes:
 	- The module_init()/module_exit() functions (and all
 	  initialization functions called _only_ from these)
 	  should be marked __init/__exit.
 	- Do not mark the struct pci_driver.
 	- Do NOT mark a function if you are not sure which mark to use.
 	  Better to not mark the function than mark the function wrong.
 How to find PCI devices manually
 ================================
 PCI drivers should have a really good reason for not using the
 pci_register_driver() interface to search for PCI devices.
 The main reason PCI devices are controlled by multiple drivers
 is because one PCI device implements several different HW services.
 E.g. combined serial/parallel port/floppy controller.
 A manual search may be performed using the following constructs:
 Searching by vendor and device ID::
 	struct pci_dev *dev = NULL;
 	while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
 		configure_device(dev);
 Searching by class ID (iterate in a similar way)::
 	pci_get_class(CLASS_ID, dev)
 Searching by both vendor/device and subsystem vendor/device ID::
 	pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
 You can use the constant PCI_ANY_ID as a wildcard replacement for
 VENDOR_ID or DEVICE_ID.  This allows searching for any device from a
 specific vendor, for example.
 These functions are hotplug-safe. They increment the reference count on
 the pci_dev that they return. You must eventually (possibly at module unload)
 decrement the reference count on these devices by calling pci_dev_put().
 Device Initialization Steps
 ===========================
 As noted in the introduction, most PCI drivers need the following steps
 for device initialization:
  - Enable the device
  - Request MMIO/IOP resources
  - Set the DMA mask size (for both coherent and streaming DMA)
  - Allocate and initialize shared control data (pci_allocate_coherent())
  - Access device configuration space (if needed)
  - Register IRQ handler (request_irq())
  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
  - Enable DMA/processing engines.
 The driver can access PCI config space registers at any time.
 (Well, almost. When running BIST, config space can go away...but
 that will just result in a PCI Bus Master Abort and config reads
 will return garbage).
 Enable the PCI device
 ---------------------
 Before touching any device registers, the driver needs to enable
 the PCI device by calling pci_enable_device(). This will:
  - wake up the device if it was in suspended state,
  - allocate I/O and memory regions of the device (if BIOS did not),
  - allocate an IRQ (if BIOS did not).
 .. note::
   pci_enable_device() can fail! Check the return value.
 .. warning::
   OS BUG: we don't check resource allocations before enabling those
   resources. The sequence would make more sense if we called
   pci_request_resources() before calling pci_enable_device().
   Currently, the device drivers can't detect the bug when when two
   devices have been allocated the same range. This is not a common
   problem and unlikely to get fixed soon.
   This has been discussed before but not changed as of 2.6.19:
   http://lkml.org/lkml/2006/3/2/194
 pci_set_master() will enable DMA by setting the bus master bit
 in the PCI_COMMAND register. It also fixes the latency timer value if
 it's set to something bogus by the BIOS.  pci_clear_master() will
 disable DMA by clearing the bus master bit.
 If the PCI device can use the PCI Memory-Write-Invalidate transaction,
 call pci_set_mwi().  This enables the PCI_COMMAND bit for Mem-Wr-Inval
 and also ensures that the cache line size register is set correctly.
 Check the return value of pci_set_mwi() as not all architectures
 or chip-sets may support Memory-Write-Invalidate.  Alternatively,
 if Mem-Wr-Inval would be nice to have but is not required, call
 pci_try_set_mwi() to have the system do its best effort at enabling
 Mem-Wr-Inval.
 Request MMIO/IOP resources
 --------------------------
 Memory (MMIO), and I/O port addresses should NOT be read directly
 from the PCI device config space. Use the values in the pci_dev structure
 as the PCI "bus address" might have been remapped to a "host physical"
 address by the arch/chip-set specific kernel support.
 See Documentation/io-mapping.txt for how to access device registers
 or device memory.
 The device driver needs to call pci_request_region() to verify
 no other device is already using the same address resource.
 Conversely, drivers should call pci_release_region() AFTER
 calling pci_disable_device().
 The idea is to prevent two devices colliding on the same address range.
 .. tip::
   See OS BUG comment above. Currently (2.6.19), The driver can only
   determine MMIO and IO Port resource availability _after_ calling
   pci_enable_device().
 Generic flavors of pci_request_region() are request_mem_region()
 (for MMIO ranges) and request_region() (for IO Port ranges).
 Use these for address resources that are not described by "normal" PCI
 BARs.
 Also see pci_request_selected_regions() below.
 Set the DMA mask size
 ---------------------
 .. note::
   If anything below doesn't make sense, please refer to
   Documentation/DMA-API.txt. This section is just a reminder that
   drivers need to indicate DMA capabilities of the device and is not
   an authoritative source for DMA interfaces.
 While all drivers should explicitly indicate the DMA capability
 (e.g. 32 or 64 bit) of the PCI bus master, devices with more than
 32-bit bus master capability for streaming data need the driver
 to "register" this capability by calling pci_set_dma_mask() with
 appropriate parameters.  In general this allows more efficient DMA
 on systems where System RAM exists above 4G _physical_ address.
 Drivers for all PCI-X and PCIe compliant devices must call
 pci_set_dma_mask() as they are 64-bit DMA devices.
 Similarly, drivers must also "register" this capability if the device
 can directly address "consistent memory" in System RAM above 4G physical
 address by calling pci_set_consistent_dma_mask().
 Again, this includes drivers for all PCI-X and PCIe compliant devices.
 Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
 64-bit DMA capable for payload ("streaming") data but not control
 ("consistent") data.
 Setup shared control data
 -------------------------
 Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
 memory.  See Documentation/DMA-API.txt for a full description of
 the DMA APIs. This section is just a reminder that it needs to be done
 before enabling DMA on the device.
 Initialize device registers
 ---------------------------
 Some drivers will need specific "capability" fields programmed
 or other "vendor specific" register initialized or reset.
 E.g. clearing pending interrupts.
 Register IRQ handler
 --------------------
 While calling request_irq() is the last step described here,
 this is often just another intermediate step to initialize a device.
 This step can often be deferred until the device is opened for use.
 All interrupt handlers for IRQ lines should be registered with IRQF_SHARED
 and use the devid to map IRQs to devices (remember that all PCI IRQ lines
 can be shared).
 request_irq() will associate an interrupt handler and device handle
 with an interrupt number. Historically interrupt numbers represent
 IRQ lines which run from the PCI device to the Interrupt controller.
 With MSI and MSI-X (more below) the interrupt number is a CPU "vector".
 request_irq() also enables the interrupt. Make sure the device is
 quiesced and does not have any interrupts pending before registering
 the interrupt handler.
 MSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts"
 which deliver interrupts to the CPU via a DMA write to a Local APIC.
 The fundamental difference between MSI and MSI-X is how multiple
 "vectors" get allocated. MSI requires contiguous blocks of vectors
 while MSI-X can allocate several individual ones.
 MSI capability can be enabled by calling pci_alloc_irq_vectors() with the
 PCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This
 causes the PCI support to program CPU vector data into the PCI device
 capability registers. Many architectures, chip-sets, or BIOSes do NOT
 support MSI or MSI-X and a call to pci_alloc_irq_vectors with just
 the PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always
 specify PCI_IRQ_LEGACY as well.
 Drivers that have different interrupt handlers for MSI/MSI-X and
 legacy INTx should chose the right one based on the msi_enabled
 and msix_enabled flags in the pci_dev structure after calling
 pci_alloc_irq_vectors.
 There are (at least) two really good reasons for using MSI:
 1) MSI is an exclusive interrupt vector by definition.
   This means the interrupt handler doesn't have to verify
   its device caused the interrupt.
 2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed
   to be visible to the host CPU(s) when the MSI is delivered. This
   is important for both data coherency and avoiding stale control data.
   This guarantee allows the driver to omit MMIO reads to flush
   the DMA stream.
 See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
 of MSI/MSI-X usage.
 PCI device shutdown
 ===================
 When a PCI device driver is being unloaded, most of the following
 steps need to be performed:
  - Disable the device from generating IRQs
  - Release the IRQ (free_irq())
  - Stop all DMA activity
  - Release DMA buffers (both streaming and consistent)
  - Unregister from other subsystems (e.g. scsi or netdev)
  - Disable device from responding to MMIO/IO Port addresses
  - Release MMIO/IO Port resource(s)
 Stop IRQs on the device
 -----------------------
 How to do this is chip/device specific. If it's not done, it opens
 the possibility of a "screaming interrupt" if (and only if)
 the IRQ is shared with another device.
 When the shared IRQ handler is "unhooked", the remaining devices
 using the same IRQ line will still need the IRQ enabled. Thus if the
 "unhooked" device asserts IRQ line, the system will respond assuming
 it was one of the remaining devices asserted the IRQ line. Since none
 of the other devices will handle the IRQ, the system will "hang" until
 it decides the IRQ isn't going to get handled and masks the IRQ (100,000
 iterations later). Once the shared IRQ is masked, the remaining devices
 will stop functioning properly. Not a nice situation.
 This is another reason to use MSI or MSI-X if it's available.
 MSI and MSI-X are defined to be exclusive interrupts and thus
 are not susceptible to the "screaming interrupt" problem.
 Release the IRQ
 ---------------
 Once the device is quiesced (no more IRQs), one can call free_irq().
 This function will return control once any pending IRQs are handled,
 "unhook" the drivers IRQ handler from that IRQ, and finally release
 the IRQ if no one else is using it.
 Stop all DMA activity
 ---------------------
 It's extremely important to stop all DMA operations BEFORE attempting
 to deallocate DMA control data. Failure to do so can result in memory
 corruption, hangs, and on some chip-sets a hard crash.
 Stopping DMA after stopping the IRQs can avoid races where the
 IRQ handler might restart DMA engines.
 While this step sounds obvious and trivial, several "mature" drivers
 didn't get this step right in the past.
 Release DMA buffers
 -------------------
 Once DMA is stopped, clean up streaming DMA first.
 I.e. unmap data buffers and return buffers to "upstream"
 owners if there is one.
 Then clean up "consistent" buffers which contain the control data.
 See Documentation/DMA-API.txt for details on unmapping interfaces.
 Unregister from other subsystems
 --------------------------------
 Most low level PCI device drivers support some other subsystem
 like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
 driver isn't losing resources from that other subsystem.
 If this happens, typically the symptom is an Oops (panic) when
 the subsystem attempts to call into a driver that has been unloaded.
 Disable Device from responding to MMIO/IO Port addresses
 --------------------------------------------------------
 io_unmap() MMIO or IO Port resources and then call pci_disable_device().
 This is the symmetric opposite of pci_enable_device().
 Do not access device registers after calling pci_disable_device().
 Release MMIO/IO Port Resource(s)
 --------------------------------
 Call pci_release_region() to mark the MMIO or IO Port range as available.
 Failure to do so usually results in the inability to reload the driver.
 How to access PCI config space
 ==============================
 You can use `pci_(read|write)_config_(byte|word|dword)` to access the config
 space of a device represented by `struct pci_dev *`. All these functions return
 0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
 text string by pcibios_strerror. Most drivers expect that accesses to valid PCI
 devices don't fail.
 If you don't have a struct pci_dev available, you can call
 `pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
 and function on that bus.
 If you access fields in the standard portion of the config header, please
 use symbolic names of locations and bits declared in <linux/pci.h>.
 If you need to access Extended PCI Capability registers, just call
 pci_find_capability() for the particular capability and it will find the
 corresponding register block for you.
 Other interesting functions
 ===========================
 =============================	================================================
 pci_get_domain_bus_and_slot()	Find pci_dev corresponding to given domain,
 				bus and slot and number. If the device is
 				found, its reference count is increased.
 pci_set_power_state()		Set PCI Power Management state (0=D0 ... 3=D3)
 pci_find_capability()		Find specified capability in device's capability
 				list.
 pci_resource_start()		Returns bus start address for a given PCI region
 pci_resource_end()		Returns bus end address for a given PCI region
 pci_resource_len()		Returns the byte length of a PCI region
 pci_set_drvdata()		Set private driver data pointer for a pci_dev
 pci_get_drvdata()		Return private driver data pointer for a pci_dev
 pci_set_mwi()			Enable Memory-Write-Invalidate transactions.
 pci_clear_mwi()			Disable Memory-Write-Invalidate transactions.
 =============================	================================================
 Miscellaneous hints
 ===================
 When displaying PCI device names to the user (for example when a driver wants
 to tell the user what card has it found), please use pci_name(pci_dev).
 Always refer to the PCI devices by a pointer to the pci_dev structure.
 All PCI layer functions use this identification and it's the only
 reasonable one. Don't use bus/slot/function numbers except for very
 special purposes -- on systems with multiple primary buses their semantics
 can be pretty complex.
 Don't try to turn on Fast Back to Back writes in your driver.  All devices
 on the bus need to be capable of doing it, so this is something which needs
 to be handled by platform and generic code, not individual drivers.
 Vendor and device identifications
 =================================
 Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
 are shared across multiple drivers.  You can add private definitions in
 your driver if they're helpful, or just use plain hex constants.
 The device IDs are arbitrary hex numbers (vendor controlled) and normally used
 only in a single location, the pci_device_id table.
 Please DO submit new vendor/device IDs to http://pci-ids.ucw.cz/.
 There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
 and https://github.com/pciutils/pciids.
 Obsolete functions
 ==================
 There are several functions which you might come across when trying to
 port an old driver to the new PCI interface.  They are no longer present
 in the kernel as they aren't compatible with hotplug or PCI domains or
 having sane locking.
 =================	===========================================
 pci_find_device()	Superseded by pci_get_device()
 pci_find_subsys()	Superseded by pci_get_subsys()
 pci_find_slot()		Superseded by pci_get_domain_bus_and_slot()
 pci_get_slot()		Superseded by pci_get_domain_bus_and_slot()
 =================	===========================================
 The alternative is the traditional PCI device driver that walks PCI
 device lists. This is still possible but discouraged.
 MMIO Space and "Write Posting"
 ==============================
 Converting a driver from using I/O Port space to using MMIO space
 often requires some additional changes. Specifically, "write posting"
 needs to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2)
 already do this. I/O Port space guarantees write transactions reach the PCI
 device before the CPU can continue. Writes to MMIO space allow the CPU
 to continue before the transaction reaches the PCI device. HW weenies
 call this "Write Posting" because the write completion is "posted" to
 the CPU before the transaction has reached its destination.
 Thus, timing sensitive code should add readl() where the CPU is
 expected to wait before doing other work.  The classic "bit banging"
 sequence works fine for I/O Port space::
       for (i = 8; --i; val >>= 1) {
               outb(val & 1, ioport_reg);      /* write bit */
               udelay(10);
       }
 The same sequence for MMIO space should be::
       for (i = 8; --i; val >>= 1) {
               writeb(val & 1, mmio_reg);      /* write bit */
               readb(safe_mmio_reg);           /* flush posted write */
               udelay(10);
       }
 It is important that "safe_mmio_reg" not have any side effects that
 interferes with the correct operation of the device.
 Another case to watch out for is when resetting a PCI device. Use PCI
 Configuration space reads to flush the writel(). This will gracefully
 handle the PCI master abort on all platforms if the PCI device is
 expected to not respond to a readl().  Most x86 platforms will allow
 MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
 (e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
--- a/Documentation/PCI/pci.txt
+++ b/Documentation/PCI/pci.txt
@ -1,636 +0,0 @@
 			How To Write Linux PCI Drivers
 		by Martin Mares <mj@ucw.cz> on 07-Feb-2000
 	updated by Grant Grundler <grundler@parisc-linux.org> on 23-Dec-2006
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The world of PCI is vast and full of (mostly unpleasant) surprises.
 Since each CPU architecture implements different chip-sets and PCI devices
 have different requirements (erm, "features"), the result is the PCI support
 in the Linux kernel is not as trivial as one would wish. This short paper
 tries to introduce all potential driver authors to Linux APIs for
 PCI device drivers.
 A more complete resource is the third edition of "Linux Device Drivers"
 by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
 LDD3 is available for free (under Creative Commons License) from:
 	http://lwn.net/Kernel/LDD3/
 However, keep in mind that all documents are subject to "bit rot".
 Refer to the source code if things are not working as described here.
 Please send questions/comments/patches about Linux PCI API to the
 "Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
 0. Structure of PCI drivers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 PCI drivers "discover" PCI devices in a system via pci_register_driver().
 Actually, it's the other way around. When the PCI generic code discovers
 a new device, the driver with a matching "description" will be notified.
 Details on this below.
 pci_register_driver() leaves most of the probing for devices to
 the PCI layer and supports online insertion/removal of devices [thus
 supporting hot-pluggable PCI, CardBus, and Express-Card in a single driver].
 pci_register_driver() call requires passing in a table of function
 pointers and thus dictates the high level structure of a driver.
 Once the driver knows about a PCI device and takes ownership, the
 driver generally needs to perform the following initialization:
 	Enable the device
 	Request MMIO/IOP resources
 	Set the DMA mask size (for both coherent and streaming DMA)
 	Allocate and initialize shared control data (pci_allocate_coherent())
 	Access device configuration space (if needed)
 	Register IRQ handler (request_irq())
 	Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
 	Enable DMA/processing engines
 When done using the device, and perhaps the module needs to be unloaded,
 the driver needs to take the follow steps:
 	Disable the device from generating IRQs
 	Release the IRQ (free_irq())
 	Stop all DMA activity
 	Release DMA buffers (both streaming and coherent)
 	Unregister from other subsystems (e.g. scsi or netdev)
 	Release MMIO/IOP resources
 	Disable the device
 Most of these topics are covered in the following sections.
 For the rest look at LDD3 or <linux/pci.h> .
 If the PCI subsystem is not configured (CONFIG_PCI is not set), most of
 the PCI functions described below are defined as inline functions either
 completely empty or just returning an appropriate error codes to avoid
 lots of ifdefs in the drivers.
 1. pci_register_driver() call
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 PCI device drivers call pci_register_driver() during their
 initialization with a pointer to a structure describing the driver
 (struct pci_driver):
 	field name	Description
 	----------	------------------------------------------------------
 	id_table	Pointer to table of device ID's the driver is
 			interested in.  Most drivers should export this
 			table using MODULE_DEVICE_TABLE(pci,...).
 	probe		This probing function gets called (during execution
 			of pci_register_driver() for already existing
 			devices or later if a new device gets inserted) for
 			all PCI devices which match the ID table and are not
 			"owned" by the other drivers yet. This function gets
 			passed a "struct pci_dev *" for each device whose
 			entry in the ID table matches the device. The probe
 			function returns zero when the driver chooses to
 			take "ownership" of the device or an error code
 			(negative number) otherwise.
 			The probe function always gets called from process
 			context, so it can sleep.
 	remove		The remove() function gets called whenever a device
 			being handled by this driver is removed (either during
 			deregistration of the driver or when it's manually
 			pulled out of a hot-pluggable slot).
 			The remove function always gets called from process
 			context, so it can sleep.
 	suspend		Put device into low power state.
 	suspend_late	Put device into low power state.
 	resume_early	Wake device from low power state.
 	resume		Wake device from low power state.
 		(Please see Documentation/power/pci.txt for descriptions
 		of PCI Power Management and the related functions.)
 	shutdown	Hook into reboot_notifier_list (kernel/sys.c).
 			Intended to stop any idling DMA operations.
 			Useful for enabling wake-on-lan (NIC) or changing
 			the power state of a device before reboot.
 			e.g. drivers/net/e100.c.
 	err_handler	See Documentation/PCI/pci-error-recovery.txt
 The ID table is an array of struct pci_device_id entries ending with an
 all-zero entry.  Definitions with static const are generally preferred.
 Each entry consists of:
 	vendor,device	Vendor and device ID to match (or PCI_ANY_ID)
 	subvendor,	Subsystem vendor and device ID to match (or PCI_ANY_ID)
 	subdevice,
 	class		Device class, subclass, and "interface" to match.
 			See Appendix D of the PCI Local Bus Spec or
 			include/linux/pci_ids.h for a full list of classes.
 			Most drivers do not need to specify class/class_mask
 			as vendor/device is normally sufficient.
 	class_mask	limit which sub-fields of the class field are compared.
 			See drivers/scsi/sym53c8xx_2/ for example of usage.
 	driver_data	Data private to the driver.
 			Most drivers don't need to use driver_data field.
 			Best practice is to use driver_data as an index
 			into a static list of equivalent device types,
 			instead of using it as a pointer.
 Most drivers only need PCI_DEVICE() or PCI_DEVICE_CLASS() to set up
 a pci_device_id table.
 New PCI IDs may be added to a device driver pci_ids table at runtime
 as shown below:
 echo "vendor device subvendor subdevice class class_mask driver_data" > \
 /sys/bus/pci/drivers/{driver}/new_id
 All fields are passed in as hexadecimal values (no leading 0x).
 The vendor and device fields are mandatory, the others are optional. Users
 need pass only as many optional fields as necessary:
 	o subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
 	o class and classmask fields default to 0
 	o driver_data defaults to 0UL.
 Note that driver_data must match the value used by any of the pci_device_id
 entries defined in the driver. This makes the driver_data field mandatory
 if all the pci_device_id entries have a non-zero driver_data value.
 Once added, the driver probe routine will be invoked for any unclaimed
 PCI devices listed in its (newly updated) pci_ids list.
 When the driver exits, it just calls pci_unregister_driver() and the PCI layer
 automatically calls the remove hook for all devices handled by the driver.
 1.1 "Attributes" for driver functions/data
 Please mark the initialization and cleanup functions where appropriate
 (the corresponding macros are defined in <linux/init.h>):
 	__init		Initialization code. Thrown away after the driver
 			initializes.
 	__exit		Exit code. Ignored for non-modular drivers.
 Tips on when/where to use the above attributes:
 	o The module_init()/module_exit() functions (and all
 	  initialization functions called _only_ from these)
 	  should be marked __init/__exit.
 	o Do not mark the struct pci_driver.
 	o Do NOT mark a function if you are not sure which mark to use.
 	  Better to not mark the function than mark the function wrong.
 2. How to find PCI devices manually
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 PCI drivers should have a really good reason for not using the
 pci_register_driver() interface to search for PCI devices.
 The main reason PCI devices are controlled by multiple drivers
 is because one PCI device implements several different HW services.
 E.g. combined serial/parallel port/floppy controller.
 A manual search may be performed using the following constructs:
 Searching by vendor and device ID:
 	struct pci_dev *dev = NULL;
 	while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
 		configure_device(dev);
 Searching by class ID (iterate in a similar way):
 	pci_get_class(CLASS_ID, dev)
 Searching by both vendor/device and subsystem vendor/device ID:
 	pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
 You can use the constant PCI_ANY_ID as a wildcard replacement for
 VENDOR_ID or DEVICE_ID.  This allows searching for any device from a
 specific vendor, for example.
 These functions are hotplug-safe. They increment the reference count on
 the pci_dev that they return. You must eventually (possibly at module unload)
 decrement the reference count on these devices by calling pci_dev_put().
 3. Device Initialization Steps
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 As noted in the introduction, most PCI drivers need the following steps
 for device initialization:
 	Enable the device
 	Request MMIO/IOP resources
 	Set the DMA mask size (for both coherent and streaming DMA)
 	Allocate and initialize shared control data (pci_allocate_coherent())
 	Access device configuration space (if needed)
 	Register IRQ handler (request_irq())
 	Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
 	Enable DMA/processing engines.
 The driver can access PCI config space registers at any time.
 (Well, almost. When running BIST, config space can go away...but
 that will just result in a PCI Bus Master Abort and config reads
 will return garbage).
 3.1 Enable the PCI device
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 Before touching any device registers, the driver needs to enable
 the PCI device by calling pci_enable_device(). This will:
 	o wake up the device if it was in suspended state,
 	o allocate I/O and memory regions of the device (if BIOS did not),
 	o allocate an IRQ (if BIOS did not).
 NOTE: pci_enable_device() can fail! Check the return value.
 [ OS BUG: we don't check resource allocations before enabling those
  resources. The sequence would make more sense if we called
  pci_request_resources() before calling pci_enable_device().
  Currently, the device drivers can't detect the bug when when two
  devices have been allocated the same range. This is not a common
  problem and unlikely to get fixed soon.
  This has been discussed before but not changed as of 2.6.19:
 	http://lkml.org/lkml/2006/3/2/194
 ]
 pci_set_master() will enable DMA by setting the bus master bit
 in the PCI_COMMAND register. It also fixes the latency timer value if
 it's set to something bogus by the BIOS.  pci_clear_master() will
 disable DMA by clearing the bus master bit.
 If the PCI device can use the PCI Memory-Write-Invalidate transaction,
 call pci_set_mwi().  This enables the PCI_COMMAND bit for Mem-Wr-Inval
 and also ensures that the cache line size register is set correctly.
 Check the return value of pci_set_mwi() as not all architectures
 or chip-sets may support Memory-Write-Invalidate.  Alternatively,
 if Mem-Wr-Inval would be nice to have but is not required, call
 pci_try_set_mwi() to have the system do its best effort at enabling
 Mem-Wr-Inval.
 3.2 Request MMIO/IOP resources
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Memory (MMIO), and I/O port addresses should NOT be read directly
 from the PCI device config space. Use the values in the pci_dev structure
 as the PCI "bus address" might have been remapped to a "host physical"
 address by the arch/chip-set specific kernel support.
 See Documentation/io-mapping.txt for how to access device registers
 or device memory.
 The device driver needs to call pci_request_region() to verify
 no other device is already using the same address resource.
 Conversely, drivers should call pci_release_region() AFTER
 calling pci_disable_device().
 The idea is to prevent two devices colliding on the same address range.
 [ See OS BUG comment above. Currently (2.6.19), The driver can only
  determine MMIO and IO Port resource availability _after_ calling
  pci_enable_device(). ]
 Generic flavors of pci_request_region() are request_mem_region()
 (for MMIO ranges) and request_region() (for IO Port ranges).
 Use these for address resources that are not described by "normal" PCI
 BARs.
 Also see pci_request_selected_regions() below.
 3.3 Set the DMA mask size
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 [ If anything below doesn't make sense, please refer to
  Documentation/DMA-API.txt. This section is just a reminder that
  drivers need to indicate DMA capabilities of the device and is not
  an authoritative source for DMA interfaces. ]
 While all drivers should explicitly indicate the DMA capability
 (e.g. 32 or 64 bit) of the PCI bus master, devices with more than
 32-bit bus master capability for streaming data need the driver
 to "register" this capability by calling pci_set_dma_mask() with
 appropriate parameters.  In general this allows more efficient DMA
 on systems where System RAM exists above 4G _physical_ address.
 Drivers for all PCI-X and PCIe compliant devices must call
 pci_set_dma_mask() as they are 64-bit DMA devices.
 Similarly, drivers must also "register" this capability if the device
 can directly address "consistent memory" in System RAM above 4G physical
 address by calling pci_set_consistent_dma_mask().
 Again, this includes drivers for all PCI-X and PCIe compliant devices.
 Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
 64-bit DMA capable for payload ("streaming") data but not control
 ("consistent") data.
 3.4 Setup shared control data
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
 memory.  See Documentation/DMA-API.txt for a full description of
 the DMA APIs. This section is just a reminder that it needs to be done
 before enabling DMA on the device.
 3.5 Initialize device registers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Some drivers will need specific "capability" fields programmed
 or other "vendor specific" register initialized or reset.
 E.g. clearing pending interrupts.
 3.6 Register IRQ handler
 ~~~~~~~~~~~~~~~~~~~~~~~~
 While calling request_irq() is the last step described here,
 this is often just another intermediate step to initialize a device.
 This step can often be deferred until the device is opened for use.
 All interrupt handlers for IRQ lines should be registered with IRQF_SHARED
 and use the devid to map IRQs to devices (remember that all PCI IRQ lines
 can be shared).
 request_irq() will associate an interrupt handler and device handle
 with an interrupt number. Historically interrupt numbers represent
 IRQ lines which run from the PCI device to the Interrupt controller.
 With MSI and MSI-X (more below) the interrupt number is a CPU "vector".
 request_irq() also enables the interrupt. Make sure the device is
 quiesced and does not have any interrupts pending before registering
 the interrupt handler.
 MSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts"
 which deliver interrupts to the CPU via a DMA write to a Local APIC.
 The fundamental difference between MSI and MSI-X is how multiple
 "vectors" get allocated. MSI requires contiguous blocks of vectors
 while MSI-X can allocate several individual ones.
 MSI capability can be enabled by calling pci_alloc_irq_vectors() with the
 PCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This
 causes the PCI support to program CPU vector data into the PCI device
 capability registers. Many architectures, chip-sets, or BIOSes do NOT
 support MSI or MSI-X and a call to pci_alloc_irq_vectors with just
 the PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always
 specify PCI_IRQ_LEGACY as well.
 Drivers that have different interrupt handlers for MSI/MSI-X and
 legacy INTx should chose the right one based on the msi_enabled
 and msix_enabled flags in the pci_dev structure after calling
 pci_alloc_irq_vectors.
 There are (at least) two really good reasons for using MSI:
 1) MSI is an exclusive interrupt vector by definition.
   This means the interrupt handler doesn't have to verify
   its device caused the interrupt.
 2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed
   to be visible to the host CPU(s) when the MSI is delivered. This
   is important for both data coherency and avoiding stale control data.
   This guarantee allows the driver to omit MMIO reads to flush
   the DMA stream.
 See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
 of MSI/MSI-X usage.
 4. PCI device shutdown
 ~~~~~~~~~~~~~~~~~~~~~~~
 When a PCI device driver is being unloaded, most of the following
 steps need to be performed:
 	Disable the device from generating IRQs
 	Release the IRQ (free_irq())
 	Stop all DMA activity
 	Release DMA buffers (both streaming and consistent)
 	Unregister from other subsystems (e.g. scsi or netdev)
 	Disable device from responding to MMIO/IO Port addresses
 	Release MMIO/IO Port resource(s)
 4.1 Stop IRQs on the device
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 How to do this is chip/device specific. If it's not done, it opens
 the possibility of a "screaming interrupt" if (and only if)
 the IRQ is shared with another device.
 When the shared IRQ handler is "unhooked", the remaining devices
 using the same IRQ line will still need the IRQ enabled. Thus if the
 "unhooked" device asserts IRQ line, the system will respond assuming
 it was one of the remaining devices asserted the IRQ line. Since none
 of the other devices will handle the IRQ, the system will "hang" until
 it decides the IRQ isn't going to get handled and masks the IRQ (100,000
 iterations later). Once the shared IRQ is masked, the remaining devices
 will stop functioning properly. Not a nice situation.
 This is another reason to use MSI or MSI-X if it's available.
 MSI and MSI-X are defined to be exclusive interrupts and thus
 are not susceptible to the "screaming interrupt" problem.
 4.2 Release the IRQ
 ~~~~~~~~~~~~~~~~~~~
 Once the device is quiesced (no more IRQs), one can call free_irq().
 This function will return control once any pending IRQs are handled,
 "unhook" the drivers IRQ handler from that IRQ, and finally release
 the IRQ if no one else is using it.
 4.3 Stop all DMA activity
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 It's extremely important to stop all DMA operations BEFORE attempting
 to deallocate DMA control data. Failure to do so can result in memory
 corruption, hangs, and on some chip-sets a hard crash.
 Stopping DMA after stopping the IRQs can avoid races where the
 IRQ handler might restart DMA engines.
 While this step sounds obvious and trivial, several "mature" drivers
 didn't get this step right in the past.
 4.4 Release DMA buffers
 ~~~~~~~~~~~~~~~~~~~~~~~
 Once DMA is stopped, clean up streaming DMA first.
 I.e. unmap data buffers and return buffers to "upstream"
 owners if there is one.
 Then clean up "consistent" buffers which contain the control data.
 See Documentation/DMA-API.txt for details on unmapping interfaces.
 4.5 Unregister from other subsystems
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Most low level PCI device drivers support some other subsystem
 like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
 driver isn't losing resources from that other subsystem.
 If this happens, typically the symptom is an Oops (panic) when
 the subsystem attempts to call into a driver that has been unloaded.
 4.6 Disable Device from responding to MMIO/IO Port addresses
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 io_unmap() MMIO or IO Port resources and then call pci_disable_device().
 This is the symmetric opposite of pci_enable_device().
 Do not access device registers after calling pci_disable_device().
 4.7 Release MMIO/IO Port Resource(s)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Call pci_release_region() to mark the MMIO or IO Port range as available.
 Failure to do so usually results in the inability to reload the driver.
 5. How to access PCI config space
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 You can use pci_(read|write)_config_(byte|word|dword) to access the config
 space of a device represented by struct pci_dev *. All these functions return 0
 when successful or an error code (PCIBIOS_...) which can be translated to a text
 string by pcibios_strerror. Most drivers expect that accesses to valid PCI
 devices don't fail.
 If you don't have a struct pci_dev available, you can call
 pci_bus_(read|write)_config_(byte|word|dword) to access a given device
 and function on that bus.
 If you access fields in the standard portion of the config header, please
 use symbolic names of locations and bits declared in <linux/pci.h>.
 If you need to access Extended PCI Capability registers, just call
 pci_find_capability() for the particular capability and it will find the
 corresponding register block for you.
 6. Other interesting functions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 pci_get_domain_bus_and_slot()	Find pci_dev corresponding to given domain,
 				bus and slot and number. If the device is
 				found, its reference count is increased.
 pci_set_power_state()		Set PCI Power Management state (0=D0 ... 3=D3)
 pci_find_capability()		Find specified capability in device's capability
 				list.
 pci_resource_start()		Returns bus start address for a given PCI region
 pci_resource_end()		Returns bus end address for a given PCI region
 pci_resource_len()		Returns the byte length of a PCI region
 pci_set_drvdata()		Set private driver data pointer for a pci_dev
 pci_get_drvdata()		Return private driver data pointer for a pci_dev
 pci_set_mwi()			Enable Memory-Write-Invalidate transactions.
 pci_clear_mwi()			Disable Memory-Write-Invalidate transactions.
 7. Miscellaneous hints
 ~~~~~~~~~~~~~~~~~~~~~~
 When displaying PCI device names to the user (for example when a driver wants
 to tell the user what card has it found), please use pci_name(pci_dev).
 Always refer to the PCI devices by a pointer to the pci_dev structure.
 All PCI layer functions use this identification and it's the only
 reasonable one. Don't use bus/slot/function numbers except for very
 special purposes -- on systems with multiple primary buses their semantics
 can be pretty complex.
 Don't try to turn on Fast Back to Back writes in your driver.  All devices
 on the bus need to be capable of doing it, so this is something which needs
 to be handled by platform and generic code, not individual drivers.
 8. Vendor and device identifications
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
 are shared across multiple drivers.  You can add private definitions in
 your driver if they're helpful, or just use plain hex constants.
 The device IDs are arbitrary hex numbers (vendor controlled) and normally used
 only in a single location, the pci_device_id table.
 Please DO submit new vendor/device IDs to http://pci-ids.ucw.cz/.
 There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
 and https://github.com/pciutils/pciids.
 9. Obsolete functions
 ~~~~~~~~~~~~~~~~~~~~~
 There are several functions which you might come across when trying to
 port an old driver to the new PCI interface.  They are no longer present
 in the kernel as they aren't compatible with hotplug or PCI domains or
 having sane locking.
 pci_find_device()	Superseded by pci_get_device()
 pci_find_subsys()	Superseded by pci_get_subsys()
 pci_find_slot()		Superseded by pci_get_domain_bus_and_slot()
 pci_get_slot()		Superseded by pci_get_domain_bus_and_slot()
 The alternative is the traditional PCI device driver that walks PCI
 device lists. This is still possible but discouraged.
 10. MMIO Space and "Write Posting"
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Converting a driver from using I/O Port space to using MMIO space
 often requires some additional changes. Specifically, "write posting"
 needs to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2)
 already do this. I/O Port space guarantees write transactions reach the PCI
 device before the CPU can continue. Writes to MMIO space allow the CPU
 to continue before the transaction reaches the PCI device. HW weenies
 call this "Write Posting" because the write completion is "posted" to
 the CPU before the transaction has reached its destination.
 Thus, timing sensitive code should add readl() where the CPU is
 expected to wait before doing other work.  The classic "bit banging"
 sequence works fine for I/O Port space:
       for (i = 8; --i; val >>= 1) {
               outb(val & 1, ioport_reg);      /* write bit */
               udelay(10);
       }
 The same sequence for MMIO space should be:
       for (i = 8; --i; val >>= 1) {
               writeb(val & 1, mmio_reg);      /* write bit */
               readb(safe_mmio_reg);           /* flush posted write */
               udelay(10);
       }
 It is important that "safe_mmio_reg" not have any side effects that
 interferes with the correct operation of the device.
 Another case to watch out for is when resetting a PCI device. Use PCI
 Configuration space reads to flush the writel(). This will gracefully
 handle the PCI master abort on all platforms if the PCI device is
 expected to not respond to a readl().  Most x86 platforms will allow
 MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
 (e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@ -0,0 +1,311 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. include:: <isonum.txt>
 ===========================================================
 The PCI Express Advanced Error Reporting Driver Guide HOWTO
 ===========================================================
 :Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
          - Yanmin Zhang <yanmin.zhang@intel.com>
 :Copyright: |copy| 2006 Intel Corporation
 Overview
 ===========
 About this guide
 ----------------
 This guide describes the basics of the PCI Express Advanced Error
 Reporting (AER) driver and provides information on how to use it, as
 well as how to enable the drivers of endpoint devices to conform with
 PCI Express AER driver.
 What is the PCI Express AER Driver?
 -----------------------------------
 PCI Express error signaling can occur on the PCI Express link itself
 or on behalf of transactions initiated on the link. PCI Express
 defines two error reporting paradigms: the baseline capability and
 the Advanced Error Reporting capability. The baseline capability is
 required of all PCI Express components providing a minimum defined
 set of error reporting requirements. Advanced Error Reporting
 capability is implemented with a PCI Express advanced error reporting
 extended capability structure providing more robust error reporting.
 The PCI Express AER driver provides the infrastructure to support PCI
 Express Advanced Error Reporting capability. The PCI Express AER
 driver provides three basic functions:
  - Gathers the comprehensive error information if errors occurred.
  - Reports error to the users.
  - Performs error recovery actions.
 AER driver only attaches root ports which support PCI-Express AER
 capability.
 User Guide
 ==========
 Include the PCI Express AER Root Driver into the Linux Kernel
 -------------------------------------------------------------
 The PCI Express AER Root driver is a Root Port service driver attached
 to the PCI Express Port Bus driver. If a user wants to use it, the driver
 has to be compiled. Option CONFIG_PCIEAER supports this capability. It
 depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
 CONFIG_PCIEAER = y.
 Load PCI Express AER Root Driver
 --------------------------------
 Some systems have AER support in firmware. Enabling Linux AER support at
 the same time the firmware handles AER may result in unpredictable
 behavior. Therefore, Linux does not handle AER events unless the firmware
 grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
 Specification for details regarding _OSC usage.
 AER error output
 ----------------
 When a PCIe AER error is captured, an error message will be output to
 console. If it's a correctable error, it is output as a warning.
 Otherwise, it is printed as an error. So users could choose different
 log level to filter out correctable error messages.
 Below shows an example::
  0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
  0000:50:00.0:   device [8086:0329] error status/mask=00100000/00000000
  0000:50:00.0:    [20] Unsupported Request    (First)
  0000:50:00.0:   TLP Header: 04000001 00200a03 05010000 00050100
 In the example, 'Requester ID' means the ID of the device who sends
 the error message to root port. Pls. refer to pci express specs for
 other fields.
 AER Statistics / Counters
 -------------------------
 When PCIe AER errors are captured, the counters / statistics are also exposed
 in the form of sysfs attributes which are documented at
 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
 Developer Guide
 ===============
 To enable AER aware support requires a software driver to configure
 the AER capability structure within its device and to provide callbacks.
 To support AER better, developers need understand how AER does work
 firstly.
 PCI Express errors are classified into two types: correctable errors
 and uncorrectable errors. This classification is based on the impacts
 of those errors, which may result in degraded performance or function
 failure.
 Correctable errors pose no impacts on the functionality of the
 interface. The PCI Express protocol can recover without any software
 intervention or any loss of data. These errors are detected and
 corrected by hardware. Unlike correctable errors, uncorrectable
 errors impact functionality of the interface. Uncorrectable errors
 can cause a particular transaction or a particular PCI Express link
 to be unreliable. Depending on those error conditions, uncorrectable
 errors are further classified into non-fatal errors and fatal errors.
 Non-fatal errors cause the particular transaction to be unreliable,
 but the PCI Express link itself is fully functional. Fatal errors, on
 the other hand, cause the link to be unreliable.
 When AER is enabled, a PCI Express device will automatically send an
 error message to the PCIe root port above it when the device captures
 an error. The Root Port, upon receiving an error reporting message,
 internally processes and logs the error message in its PCI Express
 capability structure. Error information being logged includes storing
 the error reporting agent's requestor ID into the Error Source
 Identification Registers and setting the error bits of the Root Error
 Status Register accordingly. If AER error reporting is enabled in Root
 Error Command Register, the Root Port generates an interrupt if an
 error is detected.
 Note that the errors as described above are related to the PCI Express
 hierarchy and links. These errors do not include any device specific
 errors because device specific errors will still get sent directly to
 the device driver.
 Configure the AER capability structure
 --------------------------------------
 AER aware drivers of PCI Express component need change the device
 control registers to enable AER. They also could change AER registers,
 including mask and severity registers. Helper function
 pci_enable_pcie_error_reporting could be used to enable AER. See
 section 3.3.
 Provide callbacks
 -----------------
 callback reset_link to reset pci express link
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 This callback is used to reset the pci express physical link when a
 fatal error happens. The root port aer service driver provides a
 default reset_link function, but different upstream ports might
 have different specifications to reset pci express link, so all
 upstream ports should provide their own reset_link functions.
 In struct pcie_port_service_driver, a new pointer, reset_link, is
 added.
 ::
 	pci_ers_result_t (*reset_link) (struct pci_dev *dev);
 Section 3.2.2.2 provides more detailed info on when to call
 reset_link.
 PCI error-recovery callbacks
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The PCI Express AER Root driver uses error callbacks to coordinate
 with downstream device drivers associated with a hierarchy in question
 when performing error recovery actions.
 Data struct pci_driver has a pointer, err_handler, to point to
 pci_error_handlers who consists of a couple of callback function
 pointers. AER driver follows the rules defined in
 pci-error-recovery.txt except pci express specific parts (e.g.
 reset_link). Pls. refer to pci-error-recovery.txt for detailed
 definitions of the callbacks.
 Below sections specify when to call the error callback functions.
 Correctable errors
 ~~~~~~~~~~~~~~~~~~
 Correctable errors pose no impacts on the functionality of
 the interface. The PCI Express protocol can recover without any
 software intervention or any loss of data. These errors do not
 require any recovery actions. The AER driver clears the device's
 correctable error status register accordingly and logs these errors.
 Non-correctable (non-fatal and fatal) errors
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 If an error message indicates a non-fatal error, performing link reset
 at upstream is not required. The AER driver calls error_detected(dev,
 pci_channel_io_normal) to all drivers associated within a hierarchy in
 question. for example::
  EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
 If Upstream port A captures an AER error, the hierarchy consists of
 Downstream port B and EndPoint.
 A driver may return PCI_ERS_RESULT_CAN_RECOVER,
 PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
 whether it can recover or the AER driver calls mmio_enabled as next.
 If an error message indicates a fatal error, kernel will broadcast
 error_detected(dev, pci_channel_io_frozen) to all drivers within
 a hierarchy in question. Then, performing link reset at upstream is
 necessary. As different kinds of devices might use different approaches
 to reset link, AER port service driver is required to provide the
 function to reset link. Firstly, kernel looks for if the upstream
 component has an aer driver. If it has, kernel uses the reset_link
 callback of the aer driver. If the upstream component has no aer driver
 and the port is downstream port, we will perform a hot reset as the
 default by setting the Secondary Bus Reset bit of the Bridge Control
 register associated with the downstream port. As for upstream ports,
 they should provide their own aer service drivers with reset_link
 function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
 reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
 to mmio_enabled.
 helper functions
 ----------------
 ::
  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 pci_enable_pcie_error_reporting enables the device to send error
 messages to root port when an error is detected. Note that devices
 don't enable the error reporting by default, so device drivers need
 call this function to enable it.
 ::
  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 pci_disable_pcie_error_reporting disables the device to send error
 messages to root port when an error is detected.
 ::
  int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
 pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
 error status register.
 Frequent Asked Questions
 ------------------------
 Q:
  What happens if a PCI Express device driver does not provide an
  error recovery handler (pci_driver->err_handler is equal to NULL)?
 A:
  The devices attached with the driver won't be recovered. If the
  error is fatal, kernel will print out warning messages. Please refer
  to section 3 for more information.
 Q:
  What happens if an upstream port service driver does not provide
  callback reset_link?
 A:
  Fatal error recovery will fail if the errors are reported by the
  upstream ports who are attached by the service driver.
 Q:
  How does this infrastructure deal with driver that is not PCI
  Express aware?
 A:
  This infrastructure calls the error callback functions of the
  driver when an error happens. But if the driver is not aware of
  PCI Express, the device might not report its own errors to root
  port.
 Q:
  What modifications will that driver need to make it compatible
  with the PCI Express AER Root driver?
 A:
  It could call the helper functions to enable AER in devices and
  cleanup uncorrectable status register. Pls. refer to section 3.3.
 Software error injection
 ========================
 Debugging PCIe AER error recovery code is quite difficult because it
 is hard to trigger real hardware errors. Software based error
 injection can be used to fake various kinds of PCIe errors.
 First you should enable PCIe AER software error injection in kernel
 configuration, that is, following item should be in your .config.
 CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
 After reboot with new kernel or insert the module, a device file named
 /dev/aer_inject should be created.
 Then, you need a user space tool named aer-inject, which can be gotten
 from:
    https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
 More information about aer-inject can be found in the document comes
 with its source code.
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.txt
@ -1,267 +0,0 @@
   The PCI Express Advanced Error Reporting Driver Guide HOWTO
 		T. Long Nguyen	<tom.l.nguyen@intel.com>
 		Yanmin Zhang	<yanmin.zhang@intel.com>
 				07/29/2006
 1. Overview
 1.1 About this guide
 This guide describes the basics of the PCI Express Advanced Error
 Reporting (AER) driver and provides information on how to use it, as
 well as how to enable the drivers of endpoint devices to conform with
 PCI Express AER driver.
 1.2 Copyright (C) Intel Corporation 2006.
 1.3 What is the PCI Express AER Driver?
 PCI Express error signaling can occur on the PCI Express link itself
 or on behalf of transactions initiated on the link. PCI Express
 defines two error reporting paradigms: the baseline capability and
 the Advanced Error Reporting capability. The baseline capability is
 required of all PCI Express components providing a minimum defined
 set of error reporting requirements. Advanced Error Reporting
 capability is implemented with a PCI Express advanced error reporting
 extended capability structure providing more robust error reporting.
 The PCI Express AER driver provides the infrastructure to support PCI
 Express Advanced Error Reporting capability. The PCI Express AER
 driver provides three basic functions:
 -	Gathers the comprehensive error information if errors occurred.
 -	Reports error to the users.
 -	Performs error recovery actions.
 AER driver only attaches root ports which support PCI-Express AER
 capability.
 2. User Guide
 2.1 Include the PCI Express AER Root Driver into the Linux Kernel
 The PCI Express AER Root driver is a Root Port service driver attached
 to the PCI Express Port Bus driver. If a user wants to use it, the driver
 has to be compiled. Option CONFIG_PCIEAER supports this capability. It
 depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
 CONFIG_PCIEAER = y.
 2.2 Load PCI Express AER Root Driver
 Some systems have AER support in firmware. Enabling Linux AER support at
 the same time the firmware handles AER may result in unpredictable
 behavior. Therefore, Linux does not handle AER events unless the firmware
 grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
 Specification for details regarding _OSC usage.
 2.3 AER error output
 When a PCIe AER error is captured, an error message will be output to
 console. If it's a correctable error, it is output as a warning.
 Otherwise, it is printed as an error. So users could choose different
 log level to filter out correctable error messages.
 Below shows an example:
 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
 0000:50:00.0:   device [8086:0329] error status/mask=00100000/00000000
 0000:50:00.0:    [20] Unsupported Request    (First)
 0000:50:00.0:   TLP Header: 04000001 00200a03 05010000 00050100
 In the example, 'Requester ID' means the ID of the device who sends
 the error message to root port. Pls. refer to pci express specs for
 other fields.
 2.4 AER Statistics / Counters
 When PCIe AER errors are captured, the counters / statistics are also exposed
 in the form of sysfs attributes which are documented at
 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
 3. Developer Guide
 To enable AER aware support requires a software driver to configure
 the AER capability structure within its device and to provide callbacks.
 To support AER better, developers need understand how AER does work
 firstly.
 PCI Express errors are classified into two types: correctable errors
 and uncorrectable errors. This classification is based on the impacts
 of those errors, which may result in degraded performance or function
 failure.
 Correctable errors pose no impacts on the functionality of the
 interface. The PCI Express protocol can recover without any software
 intervention or any loss of data. These errors are detected and
 corrected by hardware. Unlike correctable errors, uncorrectable
 errors impact functionality of the interface. Uncorrectable errors
 can cause a particular transaction or a particular PCI Express link
 to be unreliable. Depending on those error conditions, uncorrectable
 errors are further classified into non-fatal errors and fatal errors.
 Non-fatal errors cause the particular transaction to be unreliable,
 but the PCI Express link itself is fully functional. Fatal errors, on
 the other hand, cause the link to be unreliable.
 When AER is enabled, a PCI Express device will automatically send an
 error message to the PCIe root port above it when the device captures
 an error. The Root Port, upon receiving an error reporting message,
 internally processes and logs the error message in its PCI Express
 capability structure. Error information being logged includes storing
 the error reporting agent's requestor ID into the Error Source
 Identification Registers and setting the error bits of the Root Error
 Status Register accordingly. If AER error reporting is enabled in Root
 Error Command Register, the Root Port generates an interrupt if an
 error is detected.
 Note that the errors as described above are related to the PCI Express
 hierarchy and links. These errors do not include any device specific
 errors because device specific errors will still get sent directly to
 the device driver.
 3.1 Configure the AER capability structure
 AER aware drivers of PCI Express component need change the device
 control registers to enable AER. They also could change AER registers,
 including mask and severity registers. Helper function
 pci_enable_pcie_error_reporting could be used to enable AER. See
 section 3.3.
 3.2. Provide callbacks
 3.2.1 callback reset_link to reset pci express link
 This callback is used to reset the pci express physical link when a
 fatal error happens. The root port aer service driver provides a
 default reset_link function, but different upstream ports might
 have different specifications to reset pci express link, so all
 upstream ports should provide their own reset_link functions.
 In struct pcie_port_service_driver, a new pointer, reset_link, is
 added.
 pci_ers_result_t (*reset_link) (struct pci_dev *dev);
 Section 3.2.2.2 provides more detailed info on when to call
 reset_link.
 3.2.2 PCI error-recovery callbacks
 The PCI Express AER Root driver uses error callbacks to coordinate
 with downstream device drivers associated with a hierarchy in question
 when performing error recovery actions.
 Data struct pci_driver has a pointer, err_handler, to point to
 pci_error_handlers who consists of a couple of callback function
 pointers. AER driver follows the rules defined in
 pci-error-recovery.txt except pci express specific parts (e.g.
 reset_link). Pls. refer to pci-error-recovery.txt for detailed
 definitions of the callbacks.
 Below sections specify when to call the error callback functions.
 3.2.2.1 Correctable errors
 Correctable errors pose no impacts on the functionality of
 the interface. The PCI Express protocol can recover without any
 software intervention or any loss of data. These errors do not
 require any recovery actions. The AER driver clears the device's
 correctable error status register accordingly and logs these errors.
 3.2.2.2 Non-correctable (non-fatal and fatal) errors
 If an error message indicates a non-fatal error, performing link reset
 at upstream is not required. The AER driver calls error_detected(dev,
 pci_channel_io_normal) to all drivers associated within a hierarchy in
 question. for example,
 EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
 If Upstream port A captures an AER error, the hierarchy consists of
 Downstream port B and EndPoint.
 A driver may return PCI_ERS_RESULT_CAN_RECOVER,
 PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
 whether it can recover or the AER driver calls mmio_enabled as next.
 If an error message indicates a fatal error, kernel will broadcast
 error_detected(dev, pci_channel_io_frozen) to all drivers within
 a hierarchy in question. Then, performing link reset at upstream is
 necessary. As different kinds of devices might use different approaches
 to reset link, AER port service driver is required to provide the
 function to reset link. Firstly, kernel looks for if the upstream
 component has an aer driver. If it has, kernel uses the reset_link
 callback of the aer driver. If the upstream component has no aer driver
 and the port is downstream port, we will perform a hot reset as the
 default by setting the Secondary Bus Reset bit of the Bridge Control
 register associated with the downstream port. As for upstream ports,
 they should provide their own aer service drivers with reset_link
 function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
 reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
 to mmio_enabled.
 3.3 helper functions
 3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 pci_enable_pcie_error_reporting enables the device to send error
 messages to root port when an error is detected. Note that devices
 don't enable the error reporting by default, so device drivers need
 call this function to enable it.
 3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 pci_disable_pcie_error_reporting disables the device to send error
 messages to root port when an error is detected.
 3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
 pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
 error status register.
 3.4 Frequent Asked Questions
 Q: What happens if a PCI Express device driver does not provide an
 error recovery handler (pci_driver->err_handler is equal to NULL)?
 A: The devices attached with the driver won't be recovered. If the
 error is fatal, kernel will print out warning messages. Please refer
 to section 3 for more information.
 Q: What happens if an upstream port service driver does not provide
 callback reset_link?
 A: Fatal error recovery will fail if the errors are reported by the
 upstream ports who are attached by the service driver.
 Q: How does this infrastructure deal with driver that is not PCI
 Express aware?
 A: This infrastructure calls the error callback functions of the
 driver when an error happens. But if the driver is not aware of
 PCI Express, the device might not report its own errors to root
 port.
 Q: What modifications will that driver need to make it compatible
 with the PCI Express AER Root driver?
 A: It could call the helper functions to enable AER in devices and
 cleanup uncorrectable status register. Pls. refer to section 3.3.
 4. Software error injection
 Debugging PCIe AER error recovery code is quite difficult because it
 is hard to trigger real hardware errors. Software based error
 injection can be used to fake various kinds of PCIe errors.
 First you should enable PCIe AER software error injection in kernel
 configuration, that is, following item should be in your .config.
 CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
 After reboot with new kernel or insert the module, a device file named
 /dev/aer_inject should be created.
 Then, you need a user space tool named aer-inject, which can be gotten
 from:
    https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
 More information about aer-inject can be found in the document comes
 with its source code.
--- a/Documentation/PCI/picebus-howto.rst
+++ b/Documentation/PCI/picebus-howto.rst
@ -0,0 +1,220 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. include:: <isonum.txt>
 ===========================================
 The PCI Express Port Bus Driver Guide HOWTO
 ===========================================
 :Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
 :Copyright: |copy| 2004 Intel Corporation
 About this guide
 ================
 This guide describes the basics of the PCI Express Port Bus driver
 and provides information on how to enable the service drivers to
 register/unregister with the PCI Express Port Bus Driver.
 What is the PCI Express Port Bus Driver
 =======================================
 A PCI Express Port is a logical PCI-PCI Bridge structure. There
 are two types of PCI Express Port: the Root Port and the Switch
 Port. The Root Port originates a PCI Express link from a PCI Express
 Root Complex and the Switch Port connects PCI Express links to
 internal logical PCI buses. The Switch Port, which has its secondary
 bus representing the switch's internal routing logic, is called the
 switch's Upstream Port. The switch's Downstream Port is bridging from
 switch's internal routing bus to a bus representing the downstream
 PCI Express link from the PCI Express Switch.
 A PCI Express Port can provide up to four distinct functions,
 referred to in this document as services, depending on its port type.
 PCI Express Port's services include native hotplug support (HP),
 power management event support (PME), advanced error reporting
 support (AER), and virtual channel support (VC). These services may
 be handled by a single complex driver or be individually distributed
 and handled by corresponding service drivers.
 Why use the PCI Express Port Bus Driver?
 ========================================
 In existing Linux kernels, the Linux Device Driver Model allows a
 physical device to be handled by only a single driver. The PCI
 Express Port is a PCI-PCI Bridge device with multiple distinct
 services. To maintain a clean and simple solution each service
 may have its own software service driver. In this case several
 service drivers will compete for a single PCI-PCI Bridge device.
 For example, if the PCI Express Root Port native hotplug service
 driver is loaded first, it claims a PCI-PCI Bridge Root Port. The
 kernel therefore does not load other service drivers for that Root
 Port. In other words, it is impossible to have multiple service
 drivers load and run on a PCI-PCI Bridge device simultaneously
 using the current driver model.
 To enable multiple service drivers running simultaneously requires
 having a PCI Express Port Bus driver, which manages all populated
 PCI Express Ports and distributes all provided service requests
 to the corresponding service drivers as required. Some key
 advantages of using the PCI Express Port Bus driver are listed below:
  - Allow multiple service drivers to run simultaneously on
    a PCI-PCI Bridge Port device.
  - Allow service drivers implemented in an independent
    staged approach.
  - Allow one service driver to run on multiple PCI-PCI Bridge
    Port devices.
  - Manage and distribute resources of a PCI-PCI Bridge Port
    device to requested service drivers.
 Configuring the PCI Express Port Bus Driver vs. Service Drivers
 ===============================================================
 Including the PCI Express Port Bus Driver Support into the Kernel
 -----------------------------------------------------------------
 Including the PCI Express Port Bus driver depends on whether the PCI
 Express support is included in the kernel config. The kernel will
 automatically include the PCI Express Port Bus driver as a kernel
 driver when the PCI Express support is enabled in the kernel.
 Enabling Service Driver Support
 -------------------------------
 PCI device drivers are implemented based on Linux Device Driver Model.
 All service drivers are PCI device drivers. As discussed above, it is
 impossible to load any service driver once the kernel has loaded the
 PCI Express Port Bus Driver. To meet the PCI Express Port Bus Driver
 Model requires some minimal changes on existing service drivers that
 imposes no impact on the functionality of existing service drivers.
 A service driver is required to use the two APIs shown below to
 register its service with the PCI Express Port Bus driver (see
 section 5.2.1 & 5.2.2). It is important that a service driver
 initializes the pcie_port_service_driver data structure, included in
 header file /include/linux/pcieport_if.h, before calling these APIs.
 Failure to do so will result an identity mismatch, which prevents
 the PCI Express Port Bus driver from loading a service driver.
 pcie_port_service_register
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 ::
  int pcie_port_service_register(struct pcie_port_service_driver *new)
 This API replaces the Linux Driver Model's pci_register_driver API. A
 service driver should always calls pcie_port_service_register at
 module init. Note that after service driver being loaded, calls
 such as pci_enable_device(dev) and pci_set_master(dev) are no longer
 necessary since these calls are executed by the PCI Port Bus driver.
 pcie_port_service_unregister
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ::
  void pcie_port_service_unregister(struct pcie_port_service_driver *new)
 pcie_port_service_unregister replaces the Linux Driver Model's
 pci_unregister_driver. It's always called by service driver when a
 module exits.
 Sample Code
 ~~~~~~~~~~~
 Below is sample service driver code to initialize the port service
 driver data structure.
 ::
  static struct pcie_port_service_id service_id[] = { {
    .vendor = PCI_ANY_ID,
    .device = PCI_ANY_ID,
    .port_type = PCIE_RC_PORT,
    .service_type = PCIE_PORT_SERVICE_AER,
    }, { /* end: all zeroes */ }
  };
  static struct pcie_port_service_driver root_aerdrv = {
    .name		= (char *)device_name,
    .id_table	= &service_id[0],
    .probe		= aerdrv_load,
    .remove		= aerdrv_unload,
    .suspend	= aerdrv_suspend,
    .resume		= aerdrv_resume,
  };
 Below is a sample code for registering/unregistering a service
 driver.
 ::
  static int __init aerdrv_service_init(void)
  {
    int retval = 0;
    retval = pcie_port_service_register(&root_aerdrv);
    if (!retval) {
      /*
      * FIX ME
      */
    }
    return retval;
  }
  static void __exit aerdrv_service_exit(void)
  {
    pcie_port_service_unregister(&root_aerdrv);
  }
  module_init(aerdrv_service_init);
  module_exit(aerdrv_service_exit);
 Possible Resource Conflicts
 ===========================
 Since all service drivers of a PCI-PCI Bridge Port device are
 allowed to run simultaneously, below lists a few of possible resource
 conflicts with proposed solutions.
 MSI and MSI-X Vector Resource
 -----------------------------
 Once MSI or MSI-X interrupts are enabled on a device, it stays in this
 mode until they are disabled again.  Since service drivers of the same
 PCI-PCI Bridge port share the same physical device, if an individual
 service driver enables or disables MSI/MSI-X mode it may result
 unpredictable behavior.
 To avoid this situation all service drivers are not permitted to
 switch interrupt mode on its device. The PCI Express Port Bus driver
 is responsible for determining the interrupt mode and this should be
 transparent to service drivers. Service drivers need to know only
 the vector IRQ assigned to the field irq of struct pcie_device, which
 is passed in when the PCI Express Port Bus driver probes each service
 driver. Service drivers should use (struct pcie_device*)dev->irq to
 call request_irq/free_irq. In addition, the interrupt mode is stored
 in the field interrupt_mode of struct pcie_device.
 PCI Memory/IO Mapped Regions
 ----------------------------
 Service drivers for PCI Express Power Management (PME), Advanced
 Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
 PCI configuration space on the PCI Express port. In all cases the
 registers accessed are independent of each other. This patch assumes
 that all service drivers will be well behaved and not overwrite
 other service driver's configuration settings.
 PCI Config Registers
 --------------------
 Each service driver runs its PCI config operations on its own
 capability structure except the PCI Express capability structure, in
 which Root Control register and Device Control register are shared
 between PME and AER. This patch assumes that all service drivers
 will be well behaved and not overwrite other service driver's
 configuration settings.
--- a/Documentation/RCU/UP.rst
+++ b/Documentation/RCU/UP.rst
@ -0,0 +1,143 @@
 .. _up_doc:
 RCU on Uniprocessor Systems
 ===========================
 A common misconception is that, on UP systems, the call_rcu() primitive
 may immediately invoke its function.  The basis of this misconception
 is that since there is only one CPU, it should not be necessary to
 wait for anything else to get done, since there are no other CPUs for
 anything else to be happening on.  Although this approach will *sort of*
 work a surprising amount of the time, it is a very bad idea in general.
 This document presents three examples that demonstrate exactly how bad
 an idea this is.
 Example 1: softirq Suicide
 --------------------------
 Suppose that an RCU-based algorithm scans a linked list containing
 elements A, B, and C in process context, and can delete elements from
 this same list in softirq context.  Suppose that the process-context scan
 is referencing element B when it is interrupted by softirq processing,
 which deletes element B, and then invokes call_rcu() to free element B
 after a grace period.
 Now, if call_rcu() were to directly invoke its arguments, then upon return
 from softirq, the list scan would find itself referencing a newly freed
 element B.  This situation can greatly decrease the life expectancy of
 your kernel.
 This same problem can occur if call_rcu() is invoked from a hardware
 interrupt handler.
 Example 2: Function-Call Fatality
 ---------------------------------
 Of course, one could avert the suicide described in the preceding example
 by having call_rcu() directly invoke its arguments only if it was called
 from process context.  However, this can fail in a similar manner.
 Suppose that an RCU-based algorithm again scans a linked list containing
 elements A, B, and C in process contexts, but that it invokes a function
 on each element as it is scanned.  Suppose further that this function
 deletes element B from the list, then passes it to call_rcu() for deferred
 freeing.  This may be a bit unconventional, but it is perfectly legal
 RCU usage, since call_rcu() must wait for a grace period to elapse.
 Therefore, in this case, allowing call_rcu() to immediately invoke
 its arguments would cause it to fail to make the fundamental guarantee
 underlying RCU, namely that call_rcu() defers invoking its arguments until
 all RCU read-side critical sections currently executing have completed.
 Quick Quiz #1:
 	Why is it *not* legal to invoke synchronize_rcu() in this case?
 :ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
 Example 3: Death by Deadlock
 ----------------------------
 Suppose that call_rcu() is invoked while holding a lock, and that the
 callback function must acquire this same lock.  In this case, if
 call_rcu() were to directly invoke the callback, the result would
 be self-deadlock.
 In some cases, it would possible to restructure to code so that
 the call_rcu() is delayed until after the lock is released.  However,
 there are cases where this can be quite ugly:
 1.	If a number of items need to be passed to call_rcu() within
 	the same critical section, then the code would need to create
 	a list of them, then traverse the list once the lock was
 	released.
 2.	In some cases, the lock will be held across some kernel API,
 	so that delaying the call_rcu() until the lock is released
 	requires that the data item be passed up via a common API.
 	It is far better to guarantee that callbacks are invoked
 	with no locks held than to have to modify such APIs to allow
 	arbitrary data items to be passed back up through them.
 If call_rcu() directly invokes the callback, painful locking restrictions
 or API changes would be required.
 Quick Quiz #2:
 	What locking restriction must RCU callbacks respect?
 :ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
 Summary
 -------
 Permitting call_rcu() to immediately invoke its arguments breaks RCU,
 even on a UP system.  So do not do it!  Even on a UP system, the RCU
 infrastructure *must* respect grace periods, and *must* invoke callbacks
 from a known environment in which no locks are held.
 Note that it *is* safe for synchronize_rcu() to return immediately on
 UP systems, including PREEMPT SMP builds running on UP systems.
 Quick Quiz #3:
 	Why can't synchronize_rcu() return immediately on UP systems running
 	preemptable RCU?
 .. _answer_quick_quiz_up:
 Answer to Quick Quiz #1:
 	Why is it *not* legal to invoke synchronize_rcu() in this case?
 	Because the calling function is scanning an RCU-protected linked
 	list, and is therefore within an RCU read-side critical section.
 	Therefore, the called function has been invoked within an RCU
 	read-side critical section, and is not permitted to block.
 Answer to Quick Quiz #2:
 	What locking restriction must RCU callbacks respect?
 	Any lock that is acquired within an RCU callback must be acquired
 	elsewhere using an _bh variant of the spinlock primitive.
 	For example, if "mylock" is acquired by an RCU callback, then
 	a process-context acquisition of this lock must use something
 	like spin_lock_bh() to acquire the lock.  Please note that
 	it is also OK to use _irq variants of spinlocks, for example,
 	spin_lock_irqsave().
 	If the process-context code were to simply use spin_lock(),
 	then, since RCU callbacks can be invoked from softirq context,
 	the callback might be called from a softirq that interrupted
 	the process-context critical section.  This would result in
 	self-deadlock.
 	This restriction might seem gratuitous, since very few RCU
 	callbacks acquire locks directly.  However, a great many RCU
 	callbacks do acquire locks *indirectly*, for example, via
 	the kfree() primitive.
 Answer to Quick Quiz #3:
 	Why can't synchronize_rcu() return immediately on UP systems
 	running preemptable RCU?
 	Because some other task might have been preempted in the middle
 	of an RCU read-side critical section.  If synchronize_rcu()
 	simply immediately returned, it would prematurely signal the
 	end of the grace period, which would come as a nasty shock to
 	that other thread when it started running again.
--- a/Documentation/RCU/UP.txt
+++ b/Documentation/RCU/UP.txt
@ -1,133 +0,0 @@
 RCU on Uniprocessor Systems
 A common misconception is that, on UP systems, the call_rcu() primitive
 may immediately invoke its function.  The basis of this misconception
 is that since there is only one CPU, it should not be necessary to
 wait for anything else to get done, since there are no other CPUs for
 anything else to be happening on.  Although this approach will -sort- -of-
 work a surprising amount of the time, it is a very bad idea in general.
 This document presents three examples that demonstrate exactly how bad
 an idea this is.
 Example 1: softirq Suicide
 Suppose that an RCU-based algorithm scans a linked list containing
 elements A, B, and C in process context, and can delete elements from
 this same list in softirq context.  Suppose that the process-context scan
 is referencing element B when it is interrupted by softirq processing,
 which deletes element B, and then invokes call_rcu() to free element B
 after a grace period.
 Now, if call_rcu() were to directly invoke its arguments, then upon return
 from softirq, the list scan would find itself referencing a newly freed
 element B.  This situation can greatly decrease the life expectancy of
 your kernel.
 This same problem can occur if call_rcu() is invoked from a hardware
 interrupt handler.
 Example 2: Function-Call Fatality
 Of course, one could avert the suicide described in the preceding example
 by having call_rcu() directly invoke its arguments only if it was called
 from process context.  However, this can fail in a similar manner.
 Suppose that an RCU-based algorithm again scans a linked list containing
 elements A, B, and C in process contexts, but that it invokes a function
 on each element as it is scanned.  Suppose further that this function
 deletes element B from the list, then passes it to call_rcu() for deferred
 freeing.  This may be a bit unconventional, but it is perfectly legal
 RCU usage, since call_rcu() must wait for a grace period to elapse.
 Therefore, in this case, allowing call_rcu() to immediately invoke
 its arguments would cause it to fail to make the fundamental guarantee
 underlying RCU, namely that call_rcu() defers invoking its arguments until
 all RCU read-side critical sections currently executing have completed.
 Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in
 	this case?
 Example 3: Death by Deadlock
 Suppose that call_rcu() is invoked while holding a lock, and that the
 callback function must acquire this same lock.  In this case, if
 call_rcu() were to directly invoke the callback, the result would
 be self-deadlock.
 In some cases, it would possible to restructure to code so that
 the call_rcu() is delayed until after the lock is released.  However,
 there are cases where this can be quite ugly:
 1.	If a number of items need to be passed to call_rcu() within
 	the same critical section, then the code would need to create
 	a list of them, then traverse the list once the lock was
 	released.
 2.	In some cases, the lock will be held across some kernel API,
 	so that delaying the call_rcu() until the lock is released
 	requires that the data item be passed up via a common API.
 	It is far better to guarantee that callbacks are invoked
 	with no locks held than to have to modify such APIs to allow
 	arbitrary data items to be passed back up through them.
 If call_rcu() directly invokes the callback, painful locking restrictions
 or API changes would be required.
 Quick Quiz #2: What locking restriction must RCU callbacks respect?
 Summary
 Permitting call_rcu() to immediately invoke its arguments breaks RCU,
 even on a UP system.  So do not do it!  Even on a UP system, the RCU
 infrastructure -must- respect grace periods, and -must- invoke callbacks
 from a known environment in which no locks are held.
 Note that it -is- safe for synchronize_rcu() to return immediately on
 UP systems, including !PREEMPT SMP builds running on UP systems.
 Quick Quiz #3: Why can't synchronize_rcu() return immediately on
 	UP systems running preemptable RCU?
 Answer to Quick Quiz #1:
 	Why is it -not- legal to invoke synchronize_rcu() in this case?
 	Because the calling function is scanning an RCU-protected linked
 	list, and is therefore within an RCU read-side critical section.
 	Therefore, the called function has been invoked within an RCU
 	read-side critical section, and is not permitted to block.
 Answer to Quick Quiz #2:
 	What locking restriction must RCU callbacks respect?
 	Any lock that is acquired within an RCU callback must be
 	acquired elsewhere using an _irq variant of the spinlock
 	primitive.  For example, if "mylock" is acquired by an
 	RCU callback, then a process-context acquisition of this
 	lock must use something like spin_lock_irqsave() to
 	acquire the lock.
 	If the process-context code were to simply use spin_lock(),
 	then, since RCU callbacks can be invoked from softirq context,
 	the callback might be called from a softirq that interrupted
 	the process-context critical section.  This would result in
 	self-deadlock.
 	This restriction might seem gratuitous, since very few RCU
 	callbacks acquire locks directly.  However, a great many RCU
 	callbacks do acquire locks -indirectly-, for example, via
 	the kfree() primitive.
 Answer to Quick Quiz #3:
 	Why can't synchronize_rcu() return immediately on UP systems
 	running preemptable RCU?
 	Because some other task might have been preempted in the middle
 	of an RCU read-side critical section.  If synchronize_rcu()
 	simply immediately returned, it would prematurely signal the
 	end of the grace period, which would come as a nasty shock to
 	that other thread when it started running again.
--- a/Documentation/RCU/index.rst
+++ b/Documentation/RCU/index.rst
@ -0,0 +1,19 @@
 .. _rcu_concepts:
 ============
 RCU concepts
 ============
 .. toctree::
   :maxdepth: 1
   rcu
   listRCU
   UP
 .. only:: subproject and html
   Indices
   =======
   * :ref:`genindex`
--- a/Documentation/RCU/listRCU.rst
+++ b/Documentation/RCU/listRCU.rst
@ -0,0 +1,321 @@
 .. _list_rcu_doc:
 Using RCU to Protect Read-Mostly Linked Lists
 =============================================
 One of the best applications of RCU is to protect read-mostly linked lists
 ("struct list_head" in list.h).  One big advantage of this approach
 is that all of the required memory barriers are included for you in
 the list macros.  This document describes several applications of RCU,
 with the best fits first.
 Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
 ----------------------------------------------------------------------
 The best applications are cases where, if reader-writer locking were
 used, the read-side lock would be dropped before taking any action
 based on the results of the search.  The most celebrated example is
 the routing table.  Because the routing table is tracking the state of
 equipment outside of the computer, it will at times contain stale data.
 Therefore, once the route has been computed, there is no need to hold
 the routing table static during transmission of the packet.  After all,
 you can hold the routing table static all you want, but that won't keep
 the external Internet from changing, and it is the state of the external
 Internet that really matters.  In addition, routing entries are typically
 added or deleted, rather than being modified in place.
 A straightforward example of this use of RCU may be found in the
 system-call auditing support.  For example, a reader-writer locked
 implementation of audit_filter_task() might be as follows::
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
 		read_lock(&auditsc_lock);
 		/* Note: audit_netlink_sem held by caller. */
 		list_for_each_entry(e, &audit_tsklist, list) {
 			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
 				read_unlock(&auditsc_lock);
 				return state;
 			}
 		}
 		read_unlock(&auditsc_lock);
 		return AUDIT_BUILD_CONTEXT;
 	}
 Here the list is searched under the lock, but the lock is dropped before
 the corresponding value is returned.  By the time that this value is acted
 on, the list may well have been modified.  This makes sense, since if
 you are turning auditing off, it is OK to audit a few extra system calls.
 This means that RCU can be easily applied to the read side, as follows::
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
 		rcu_read_lock();
 		/* Note: audit_netlink_sem held by caller. */
 		list_for_each_entry_rcu(e, &audit_tsklist, list) {
 			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
 				rcu_read_unlock();
 				return state;
 			}
 		}
 		rcu_read_unlock();
 		return AUDIT_BUILD_CONTEXT;
 	}
 The read_lock() and read_unlock() calls have become rcu_read_lock()
 and rcu_read_unlock(), respectively, and the list_for_each_entry() has
 become list_for_each_entry_rcu().  The _rcu() list-traversal primitives
 insert the read-side memory barriers that are required on DEC Alpha CPUs.
 The changes to the update side are also straightforward.  A reader-writer
 lock might be used as follows for deletion and insertion::
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
 	{
 		struct audit_entry  *e;
 		write_lock(&auditsc_lock);
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				list_del(&e->list);
 				write_unlock(&auditsc_lock);
 				return 0;
 			}
 		}
 		write_unlock(&auditsc_lock);
 		return -EFAULT;		/* No matching rule */
 	}
 	static inline int audit_add_rule(struct audit_entry *entry,
 					 struct list_head *list)
 	{
 		write_lock(&auditsc_lock);
 		if (entry->rule.flags & AUDIT_PREPEND) {
 			entry->rule.flags &= ~AUDIT_PREPEND;
 			list_add(&entry->list, list);
 		} else {
 			list_add_tail(&entry->list, list);
 		}
 		write_unlock(&auditsc_lock);
 		return 0;
 	}
 Following are the RCU equivalents for these two functions::
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
 	{
 		struct audit_entry  *e;
 		/* Do not use the _rcu iterator here, since this is the only
 		 * deletion routine. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				list_del_rcu(&e->list);
 				call_rcu(&e->rcu, audit_free_rule);
 				return 0;
 			}
 		}
 		return -EFAULT;		/* No matching rule */
 	}
 	static inline int audit_add_rule(struct audit_entry *entry,
 					 struct list_head *list)
 	{
 		if (entry->rule.flags & AUDIT_PREPEND) {
 			entry->rule.flags &= ~AUDIT_PREPEND;
 			list_add_rcu(&entry->list, list);
 		} else {
 			list_add_tail_rcu(&entry->list, list);
 		}
 		return 0;
 	}
 Normally, the write_lock() and write_unlock() would be replaced by
 a spin_lock() and a spin_unlock(), but in this case, all callers hold
 audit_netlink_sem, so no additional locking is required.  The auditsc_lock
 can therefore be eliminated, since use of RCU eliminates the need for
 writers to exclude readers.  Normally, the write_lock() calls would
 be converted into spin_lock() calls.
 The list_del(), list_add(), and list_add_tail() primitives have been
 replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
 The _rcu() list-manipulation primitives add memory barriers that are
 needed on weakly ordered CPUs (most of them!).  The list_del_rcu()
 primitive omits the pointer poisoning debug-assist code that would
 otherwise cause concurrent readers to fail spectacularly.
 So, when readers can tolerate stale data and when entries are either added
 or deleted, without in-place modification, it is very easy to use RCU!
 Example 2: Handling In-Place Updates
 ------------------------------------
 The system-call auditing code does not update auditing rules in place.
 However, if it did, reader-writer-locked code to do so might look as
 follows (presumably, the field_count is only permitted to decrease,
 otherwise, the added fields would need to be filled in)::
 	static inline int audit_upd_rule(struct audit_rule *rule,
 					 struct list_head *list,
 					 __u32 newaction,
 					 __u32 newfield_count)
 	{
 		struct audit_entry  *e;
 		struct audit_newentry *ne;
 		write_lock(&auditsc_lock);
 		/* Note: audit_netlink_sem held by caller. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				e->rule.action = newaction;
 				e->rule.file_count = newfield_count;
 				write_unlock(&auditsc_lock);
 				return 0;
 			}
 		}
 		write_unlock(&auditsc_lock);
 		return -EFAULT;		/* No matching rule */
 	}
 The RCU version creates a copy, updates the copy, then replaces the old
 entry with the newly updated entry.  This sequence of actions, allowing
 concurrent reads while doing a copy to perform an update, is what gives
 RCU ("read-copy update") its name.  The RCU code is as follows::
 	static inline int audit_upd_rule(struct audit_rule *rule,
 					 struct list_head *list,
 					 __u32 newaction,
 					 __u32 newfield_count)
 	{
 		struct audit_entry  *e;
 		struct audit_newentry *ne;
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
 				if (ne == NULL)
 					return -ENOMEM;
 				audit_copy_rule(&ne->rule, &e->rule);
 				ne->rule.action = newaction;
 				ne->rule.file_count = newfield_count;
 				list_replace_rcu(&e->list, &ne->list);
 				call_rcu(&e->rcu, audit_free_rule);
 				return 0;
 			}
 		}
 		return -EFAULT;		/* No matching rule */
 	}
 Again, this assumes that the caller holds audit_netlink_sem.  Normally,
 the reader-writer lock would become a spinlock in this sort of code.
 Example 3: Eliminating Stale Data
 ---------------------------------
 The auditing examples above tolerate stale data, as do most algorithms
 that are tracking external state.  Because there is a delay from the
 time the external state changes before Linux becomes aware of the change,
 additional RCU-induced staleness is normally not a problem.
 However, there are many examples where stale data cannot be tolerated.
 One example in the Linux kernel is the System V IPC (see the ipc_lock()
 function in ipc/util.c).  This code checks a "deleted" flag under a
 per-entry spinlock, and, if the "deleted" flag is set, pretends that the
 entry does not exist.  For this to be helpful, the search function must
 return holding the per-entry spinlock, as ipc_lock() does in fact do.
 Quick Quiz:
 	Why does the search function need to return holding the per-entry lock for
 	this deleted-flag technique to be helpful?
 :ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
 If the system-call audit module were to ever need to reject stale data,
 one way to accomplish this would be to add a "deleted" flag and a "lock"
 spinlock to the audit_entry structure, and modify audit_filter_task()
 as follows::
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
 		rcu_read_lock();
 		list_for_each_entry_rcu(e, &audit_tsklist, list) {
 			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
 				spin_lock(&e->lock);
 				if (e->deleted) {
 					spin_unlock(&e->lock);
 					rcu_read_unlock();
 					return AUDIT_BUILD_CONTEXT;
 				}
 				rcu_read_unlock();
 				return state;
 			}
 		}
 		rcu_read_unlock();
 		return AUDIT_BUILD_CONTEXT;
 	}
 Note that this example assumes that entries are only added and deleted.
 Additional mechanism is required to deal correctly with the
 update-in-place performed by audit_upd_rule().  For one thing,
 audit_upd_rule() would need additional memory barriers to ensure
 that the list_add_rcu() was really executed before the list_del_rcu().
 The audit_del_rule() function would need to set the "deleted"
 flag under the spinlock as follows::
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
 	{
 		struct audit_entry  *e;
 		/* Do not need to use the _rcu iterator here, since this
 		 * is the only deletion routine. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				spin_lock(&e->lock);
 				list_del_rcu(&e->list);
 				e->deleted = 1;
 				spin_unlock(&e->lock);
 				call_rcu(&e->rcu, audit_free_rule);
 				return 0;
 			}
 		}
 		return -EFAULT;		/* No matching rule */
 	}
 Summary
 -------
 Read-mostly list-based data structures that can tolerate stale data are
 the most amenable to use of RCU.  The simplest case is where entries are
 either added or deleted from the data structure (or atomically modified
 in place), but non-atomic in-place modifications can be handled by making
 a copy, updating the copy, then replacing the original with the copy.
 If stale data cannot be tolerated, then a "deleted" flag may be used
 in conjunction with a per-entry spinlock in order to allow the search
 function to reject newly deleted data.
 .. _answer_quick_quiz_list:
 Answer to Quick Quiz:
 	Why does the search function need to return holding the per-entry
 	lock for this deleted-flag technique to be helpful?
 	If the search function drops the per-entry lock before returning,
 	then the caller will be processing stale data in any case.  If it
 	is really OK to be processing stale data, then you don't need a
 	"deleted" flag.  If processing stale data really is a problem,
 	then you need to hold the per-entry lock across all of the code
 	that uses the value that was returned.
--- a/Documentation/RCU/listRCU.txt
+++ b/Documentation/RCU/listRCU.txt
@ -1,315 +0,0 @@
 Using RCU to Protect Read-Mostly Linked Lists
 One of the best applications of RCU is to protect read-mostly linked lists
 ("struct list_head" in list.h).  One big advantage of this approach
 is that all of the required memory barriers are included for you in
 the list macros.  This document describes several applications of RCU,
 with the best fits first.
 Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
 The best applications are cases where, if reader-writer locking were
 used, the read-side lock would be dropped before taking any action
 based on the results of the search.  The most celebrated example is
 the routing table.  Because the routing table is tracking the state of
 equipment outside of the computer, it will at times contain stale data.
 Therefore, once the route has been computed, there is no need to hold
 the routing table static during transmission of the packet.  After all,
 you can hold the routing table static all you want, but that won't keep
 the external Internet from changing, and it is the state of the external
 Internet that really matters.  In addition, routing entries are typically
 added or deleted, rather than being modified in place.
 A straightforward example of this use of RCU may be found in the
 system-call auditing support.  For example, a reader-writer locked
 implementation of audit_filter_task() might be as follows:
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
 		read_lock(&auditsc_lock);
 		/* Note: audit_netlink_sem held by caller. */
 		list_for_each_entry(e, &audit_tsklist, list) {
 			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
 				read_unlock(&auditsc_lock);
 				return state;
 			}
 		}
 		read_unlock(&auditsc_lock);
 		return AUDIT_BUILD_CONTEXT;
 	}
 Here the list is searched under the lock, but the lock is dropped before
 the corresponding value is returned.  By the time that this value is acted
 on, the list may well have been modified.  This makes sense, since if
 you are turning auditing off, it is OK to audit a few extra system calls.
 This means that RCU can be easily applied to the read side, as follows:
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
 		rcu_read_lock();
 		/* Note: audit_netlink_sem held by caller. */
 		list_for_each_entry_rcu(e, &audit_tsklist, list) {
 			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
 				rcu_read_unlock();
 				return state;
 			}
 		}
 		rcu_read_unlock();
 		return AUDIT_BUILD_CONTEXT;
 	}
 The read_lock() and read_unlock() calls have become rcu_read_lock()
 and rcu_read_unlock(), respectively, and the list_for_each_entry() has
 become list_for_each_entry_rcu().  The _rcu() list-traversal primitives
 insert the read-side memory barriers that are required on DEC Alpha CPUs.
 The changes to the update side are also straightforward.  A reader-writer
 lock might be used as follows for deletion and insertion:
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
 	{
 		struct audit_entry  *e;
 		write_lock(&auditsc_lock);
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				list_del(&e->list);
 				write_unlock(&auditsc_lock);
 				return 0;
 			}
 		}
 		write_unlock(&auditsc_lock);
 		return -EFAULT;		/* No matching rule */
 	}
 	static inline int audit_add_rule(struct audit_entry *entry,
 					 struct list_head *list)
 	{
 		write_lock(&auditsc_lock);
 		if (entry->rule.flags & AUDIT_PREPEND) {
 			entry->rule.flags &= ~AUDIT_PREPEND;
 			list_add(&entry->list, list);
 		} else {
 			list_add_tail(&entry->list, list);
 		}
 		write_unlock(&auditsc_lock);
 		return 0;
 	}
 Following are the RCU equivalents for these two functions:
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
 	{
 		struct audit_entry  *e;
 		/* Do not use the _rcu iterator here, since this is the only
 		 * deletion routine. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				list_del_rcu(&e->list);
 				call_rcu(&e->rcu, audit_free_rule);
 				return 0;
 			}
 		}
 		return -EFAULT;		/* No matching rule */
 	}
 	static inline int audit_add_rule(struct audit_entry *entry,
 					 struct list_head *list)
 	{
 		if (entry->rule.flags & AUDIT_PREPEND) {
 			entry->rule.flags &= ~AUDIT_PREPEND;
 			list_add_rcu(&entry->list, list);
 		} else {
 			list_add_tail_rcu(&entry->list, list);
 		}
 		return 0;
 	}
 Normally, the write_lock() and write_unlock() would be replaced by
 a spin_lock() and a spin_unlock(), but in this case, all callers hold
 audit_netlink_sem, so no additional locking is required.  The auditsc_lock
 can therefore be eliminated, since use of RCU eliminates the need for
 writers to exclude readers.  Normally, the write_lock() calls would
 be converted into spin_lock() calls.
 The list_del(), list_add(), and list_add_tail() primitives have been
 replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
 The _rcu() list-manipulation primitives add memory barriers that are
 needed on weakly ordered CPUs (most of them!).  The list_del_rcu()
 primitive omits the pointer poisoning debug-assist code that would
 otherwise cause concurrent readers to fail spectacularly.
 So, when readers can tolerate stale data and when entries are either added
 or deleted, without in-place modification, it is very easy to use RCU!
 Example 2: Handling In-Place Updates
 The system-call auditing code does not update auditing rules in place.
 However, if it did, reader-writer-locked code to do so might look as
 follows (presumably, the field_count is only permitted to decrease,
 otherwise, the added fields would need to be filled in):
 	static inline int audit_upd_rule(struct audit_rule *rule,
 					 struct list_head *list,
 					 __u32 newaction,
 					 __u32 newfield_count)
 	{
 		struct audit_entry  *e;
 		struct audit_newentry *ne;
 		write_lock(&auditsc_lock);
 		/* Note: audit_netlink_sem held by caller. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				e->rule.action = newaction;
 				e->rule.file_count = newfield_count;
 				write_unlock(&auditsc_lock);
 				return 0;
 			}
 		}
 		write_unlock(&auditsc_lock);
 		return -EFAULT;		/* No matching rule */
 	}
 The RCU version creates a copy, updates the copy, then replaces the old
 entry with the newly updated entry.  This sequence of actions, allowing
 concurrent reads while doing a copy to perform an update, is what gives
 RCU ("read-copy update") its name.  The RCU code is as follows:
 	static inline int audit_upd_rule(struct audit_rule *rule,
 					 struct list_head *list,
 					 __u32 newaction,
 					 __u32 newfield_count)
 	{
 		struct audit_entry  *e;
 		struct audit_newentry *ne;
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
 				if (ne == NULL)
 					return -ENOMEM;
 				audit_copy_rule(&ne->rule, &e->rule);
 				ne->rule.action = newaction;
 				ne->rule.file_count = newfield_count;
 				list_replace_rcu(&e->list, &ne->list);
 				call_rcu(&e->rcu, audit_free_rule);
 				return 0;
 			}
 		}
 		return -EFAULT;		/* No matching rule */
 	}
 Again, this assumes that the caller holds audit_netlink_sem.  Normally,
 the reader-writer lock would become a spinlock in this sort of code.
 Example 3: Eliminating Stale Data
 The auditing examples above tolerate stale data, as do most algorithms
 that are tracking external state.  Because there is a delay from the
 time the external state changes before Linux becomes aware of the change,
 additional RCU-induced staleness is normally not a problem.
 However, there are many examples where stale data cannot be tolerated.
 One example in the Linux kernel is the System V IPC (see the ipc_lock()
 function in ipc/util.c).  This code checks a "deleted" flag under a
 per-entry spinlock, and, if the "deleted" flag is set, pretends that the
 entry does not exist.  For this to be helpful, the search function must
 return holding the per-entry spinlock, as ipc_lock() does in fact do.
 Quick Quiz:  Why does the search function need to return holding the
 	per-entry lock for this deleted-flag technique to be helpful?
 If the system-call audit module were to ever need to reject stale data,
 one way to accomplish this would be to add a "deleted" flag and a "lock"
 spinlock to the audit_entry structure, and modify audit_filter_task()
 as follows:
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
 		rcu_read_lock();
 		list_for_each_entry_rcu(e, &audit_tsklist, list) {
 			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
 				spin_lock(&e->lock);
 				if (e->deleted) {
 					spin_unlock(&e->lock);
 					rcu_read_unlock();
 					return AUDIT_BUILD_CONTEXT;
 				}
 				rcu_read_unlock();
 				return state;
 			}
 		}
 		rcu_read_unlock();
 		return AUDIT_BUILD_CONTEXT;
 	}
 Note that this example assumes that entries are only added and deleted.
 Additional mechanism is required to deal correctly with the
 update-in-place performed by audit_upd_rule().  For one thing,
 audit_upd_rule() would need additional memory barriers to ensure
 that the list_add_rcu() was really executed before the list_del_rcu().
 The audit_del_rule() function would need to set the "deleted"
 flag under the spinlock as follows:
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
 	{
 		struct audit_entry  *e;
 		/* Do not need to use the _rcu iterator here, since this
 		 * is the only deletion routine. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				spin_lock(&e->lock);
 				list_del_rcu(&e->list);
 				e->deleted = 1;
 				spin_unlock(&e->lock);
 				call_rcu(&e->rcu, audit_free_rule);
 				return 0;
 			}
 		}
 		return -EFAULT;		/* No matching rule */
 	}
 Summary
 Read-mostly list-based data structures that can tolerate stale data are
 the most amenable to use of RCU.  The simplest case is where entries are
 either added or deleted from the data structure (or atomically modified
 in place), but non-atomic in-place modifications can be handled by making
 a copy, updating the copy, then replacing the original with the copy.
 If stale data cannot be tolerated, then a "deleted" flag may be used
 in conjunction with a per-entry spinlock in order to allow the search
 function to reject newly deleted data.
 Answer to Quick Quiz
 	Why does the search function need to return holding the per-entry
 	lock for this deleted-flag technique to be helpful?
 	If the search function drops the per-entry lock before returning,
 	then the caller will be processing stale data in any case.  If it
 	is really OK to be processing stale data, then you don't need a
 	"deleted" flag.  If processing stale data really is a problem,
 	then you need to hold the per-entry lock across all of the code
 	that uses the value that was returned.
--- a/Documentation/RCU/rcu.rst
+++ b/Documentation/RCU/rcu.rst
@ -0,0 +1,92 @@
 .. _rcu_doc:
 RCU Concepts
 ============
 The basic idea behind RCU (read-copy update) is to split destructive
 operations into two parts, one that prevents anyone from seeing the data
 item being destroyed, and one that actually carries out the destruction.
 A "grace period" must elapse between the two parts, and this grace period
 must be long enough that any readers accessing the item being deleted have
 since dropped their references.  For example, an RCU-protected deletion
 from a linked list would first remove the item from the list, wait for
 a grace period to elapse, then free the element.  See the
 Documentation/RCU/listRCU.rst file for more information on using RCU with
 linked lists.
 Frequently Asked Questions
 --------------------------
 - Why would anyone want to use RCU?
  The advantage of RCU's two-part approach is that RCU readers need
  not acquire any locks, perform any atomic instructions, write to
  shared memory, or (on CPUs other than Alpha) execute any memory
  barriers.  The fact that these operations are quite expensive
  on modern CPUs is what gives RCU its performance advantages
  in read-mostly situations.  The fact that RCU readers need not
  acquire locks can also greatly simplify deadlock-avoidance code.
 - How can the updater tell when a grace period has completed
  if the RCU readers give no indication when they are done?
  Just as with spinlocks, RCU readers are not permitted to
  block, switch to user-mode execution, or enter the idle loop.
  Therefore, as soon as a CPU is seen passing through any of these
  three states, we know that that CPU has exited any previous RCU
  read-side critical sections.  So, if we remove an item from a
  linked list, and then wait until all CPUs have switched context,
  executed in user mode, or executed in the idle loop, we can
  safely free up that item.
  Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
  same effect, but require that the readers manipulate CPU-local
  counters.  These counters allow limited types of blocking within
  RCU read-side critical sections.  SRCU also uses CPU-local
  counters, and permits general blocking within RCU read-side
  critical sections.  These variants of RCU detect grace periods
  by sampling these counters.
 - If I am running on a uniprocessor kernel, which can only do one
  thing at a time, why should I wait for a grace period?
  See the Documentation/RCU/UP.rst file for more information.
 - How can I see where RCU is currently used in the Linux kernel?
  Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
  "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
  "srcu_read_unlock", "synchronize_rcu", "synchronize_net",
  "synchronize_srcu", and the other RCU primitives.  Or grab one
  of the cscope databases from:
  (http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
 - What guidelines should I follow when writing code that uses RCU?
  See the checklist.txt file in this directory.
 - Why the name "RCU"?
  "RCU" stands for "read-copy update".  The file Documentation/RCU/listRCU.rst
  has more information on where this name came from, search for
  "read-copy update" to find it.
 - I hear that RCU is patented?  What is with that?
  Yes, it is.  There are several known patents related to RCU,
  search for the string "Patent" in RTFP.txt to find them.
  Of these, one was allowed to lapse by the assignee, and the
  others have been contributed to the Linux kernel under GPL.
  There are now also LGPL implementations of user-level RCU
  available (http://liburcu.org/).
 - I hear that RCU needs work in order to support realtime kernels?
  Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
  kernel configuration parameter.
 - Where can I find more information on RCU?
  See the RTFP.txt file in this directory.
  Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@ -1,89 +0,0 @@
 RCU Concepts
 The basic idea behind RCU (read-copy update) is to split destructive
 operations into two parts, one that prevents anyone from seeing the data
 item being destroyed, and one that actually carries out the destruction.
 A "grace period" must elapse between the two parts, and this grace period
 must be long enough that any readers accessing the item being deleted have
 since dropped their references.  For example, an RCU-protected deletion
 from a linked list would first remove the item from the list, wait for
 a grace period to elapse, then free the element.  See the listRCU.txt
 file for more information on using RCU with linked lists.
 Frequently Asked Questions
 o	Why would anyone want to use RCU?
 	The advantage of RCU's two-part approach is that RCU readers need
 	not acquire any locks, perform any atomic instructions, write to
 	shared memory, or (on CPUs other than Alpha) execute any memory
 	barriers.  The fact that these operations are quite expensive
 	on modern CPUs is what gives RCU its performance advantages
 	in read-mostly situations.  The fact that RCU readers need not
 	acquire locks can also greatly simplify deadlock-avoidance code.
 o	How can the updater tell when a grace period has completed
 	if the RCU readers give no indication when they are done?
 	Just as with spinlocks, RCU readers are not permitted to
 	block, switch to user-mode execution, or enter the idle loop.
 	Therefore, as soon as a CPU is seen passing through any of these
 	three states, we know that that CPU has exited any previous RCU
 	read-side critical sections.  So, if we remove an item from a
 	linked list, and then wait until all CPUs have switched context,
 	executed in user mode, or executed in the idle loop, we can
 	safely free up that item.
 	Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
 	same effect, but require that the readers manipulate CPU-local
 	counters.  These counters allow limited types of blocking within
 	RCU read-side critical sections.  SRCU also uses CPU-local
 	counters, and permits general blocking within RCU read-side
 	critical sections.  These variants of RCU detect grace periods
 	by sampling these counters.
 o	If I am running on a uniprocessor kernel, which can only do one
 	thing at a time, why should I wait for a grace period?
 	See the UP.txt file in this directory.
 o	How can I see where RCU is currently used in the Linux kernel?
 	Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
 	"rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
 	"srcu_read_unlock", "synchronize_rcu", "synchronize_net",
 	"synchronize_srcu", and the other RCU primitives.  Or grab one
 	of the cscope databases from:
 	http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html
 o	What guidelines should I follow when writing code that uses RCU?
 	See the checklist.txt file in this directory.
 o	Why the name "RCU"?
 	"RCU" stands for "read-copy update".  The file listRCU.txt has
 	more information on where this name came from, search for
 	"read-copy update" to find it.
 o	I hear that RCU is patented?  What is with that?
 	Yes, it is.  There are several known patents related to RCU,
 	search for the string "Patent" in RTFP.txt to find them.
 	Of these, one was allowed to lapse by the assignee, and the
 	others have been contributed to the Linux kernel under GPL.
 	There are now also LGPL implementations of user-level RCU
 	available (http://liburcu.org/).
 o	I hear that RCU needs work in order to support realtime kernels?
 	Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
 	kernel configuration parameter.
 o	Where can I find more information on RCU?
 	See the RTFP.txt file in this directory.
 	Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.
--- a/Documentation/RCU/rcuref.txt
+++ b/Documentation/RCU/rcuref.txt
@ -12,6 +12,7 @@ please read on.
 Reference counting on elements of lists which are protected by traditional
 reader/writer spinlocks or semaphores are straightforward:
 CODE LISTING A:
 1.				2.
 add()				search_and_reference()
 {				{
@ -28,7 +29,8 @@ add()				search_and_reference()
 release_referenced()			delete()
 {					{
    ...					    write_lock(&list_lock);
-    atomic_dec(&el->rc, relfunc)	    ...
+    if(atomic_dec_and_test(&el->rc))	    ...
 	kfree(el);
    ...					    remove_element
 }					    write_unlock(&list_lock);
 					    ...
@ -44,6 +46,7 @@ search_and_reference() could potentially hold reference to an element which
 has already been deleted from the list/array.  Use atomic_inc_not_zero()
 in this scenario as follows:
 CODE LISTING B:
 1.					2.
 add()					search_and_reference()
 {					{
@ -79,6 +82,7 @@ search_and_reference() code path.  In such cases, the
 atomic_dec_and_test() may be moved from delete() to el_free()
 as follows:
 CODE LISTING C:
 1.					2.
 add()					search_and_reference()
 {					{
@ -114,6 +118,17 @@ element can therefore safely be freed.  This in turn guarantees that if
 any reader finds the element, that reader may safely acquire a reference
 without checking the value of the reference counter.
 A clear advantage of the RCU-based pattern in listing C over the one
 in listing B is that any call to search_and_reference() that locates
 a given object will succeed in obtaining a reference to that object,
 even given a concurrent invocation of delete() for that same object.
 Similarly, a clear advantage of both listings B and C over listing A is
 that a call to delete() is not delayed even if there are an arbitrarily
 large number of calls to search_and_reference() searching for the same
 object that delete() was invoked on.  Instead, all that is delayed is
 the eventual invocation of kfree(), which is usually not a problem on
 modern computer systems, even the small ones.
 In cases where delete() can sleep, synchronize_rcu() can be called from
 delete(), so that el_free() can be subsumed into delete as follows:
@ -130,3 +145,7 @@ delete()
    	kfree(el);
    ...
 }
 As additional examples in the kernel, the pattern in listing C is used by
 reference counting of struct pid, while the pattern in listing B is used by
 struct posix_acl.
--- a/Show more
+++ b/Show more