linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
James Morse	ee2eb3d4ee	ACPI / APEI: Generalise the estatus queue's notify code Refactor the estatus queue's pool notification routine from NOTIFY_NMI's handlers. This will allow another notification method to use the estatus queue without duplicating this code. Add rcu_read_lock()/rcu_read_unlock() around the list list_for_each_entry_rcu() walker. These aren't strictly necessary as the whole nmi_enter/nmi_exit() window is a spooky RCU read-side critical section. in_nmi_queue_one_entry() is separate from the rcu-list walker for a later caller that doesn't need to walk a list. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> [ rjw: Drop unnecessary err variable in two places ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:45 +01:00
James Morse	5cc6c68287	ACPI / APEI: Don't update struct ghes' flags in read/clear estatus ghes_read_estatus() sets a flag in struct ghes if the buffer of CPER records needs to be cleared once the records have been processed. This flag value is a problem if a struct ghes can be processed concurrently, as happens at probe time if an NMI arrives for the same error source. The NMI clears the flag, meaning the interrupted handler may never do the ghes_estatus_clear() work. The GHES_TO_CLEAR flags is only set at the same time as buffer_paddr, which is now owned by the caller and passed to ghes_clear_estatus(). Use this value as the flag. A non-zero buf_paddr returned by ghes_read_estatus() means ghes_clear_estatus() should clear this address. ghes_read_estatus() already checks for a read of error_status_address being zero, so CPER records cannot be written here. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:45 +01:00
James Morse	7d49f2c75a	ACPI / APEI: Remove spurious GHES_TO_CLEAR check ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going on to __process_error(). This is pointless as ghes_read_estatus() will always set this flag if it returns success, which was checked earlier in the loop. Remove it. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:45 +01:00
James Morse	eeb2555779	ACPI / APEI: Don't store CPER records physical address in struct ghes When CPER records are found the address of the records is stashed in the struct ghes. Once the records have been processed, this address is overwritten with zero so that it won't be processed again without being re-populated by firmware. This goes wrong if a struct ghes can be processed concurrently, as can happen at probe time when an NMI occurs. If the NMI arrives on another CPU, the probing CPU may call ghes_clear_estatus() on the records before the handler had finished with them. Even on the same CPU, once the interrupted handler is resumed, it will call ghes_clear_estatus() on the NMIs records, this memory may have already been re-used by firmware. Avoid this stashing by letting the caller hold the address. A later patch will do away with the use of ghes->flags in the read/clear code too. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:45 +01:00
James Morse	fb7be08f1a	ACPI / APEI: Make estatus pool allocation a static size Adding new NMI-like notifications duplicates the calls that grow and shrink the estatus pool. This is all pretty pointless, as the size is capped to 64K. Allocate this for each ghes and drop the code that grows and shrinks the pool. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:45 +01:00
James Morse	e147133a42	ACPI / APEI: Make hest.c manage the estatus memory pool ghes.c has a memory pool it uses for the estatus cache and the estatus queue. The cache is initialised when registering the platform driver. For the queue, an NMI-like notification has to grow/shrink the pool as it is registered and unregistered. This is all pretty noisy when adding new NMI-like notifications, it would be better to replace this with a static pool size based on the number of users. As a precursor, move the call that creates the pool from ghes_init(), into hest.c. Later this will take the number of ghes entries and consolidate the queue allocations. Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put this. The pool is now initialised as part of ACPI's subsys_initcall(): (acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init()) Before this patch it happened later as a GHES specific device_initcall(). Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:45 +01:00
James Morse	0ac234be1a	ACPI / APEI: Switch estatus pool to use vmalloc memory The ghes code is careful to parse and round firmware's advertised memory requirements for CPER records, up to a maximum of 64K. However when ghes_estatus_pool_expand() does its work, it splits the requested size into PAGE_SIZE granules. This means if firmware generates 5K of CPER records, and correctly describes this in the table, __process_error() will silently fail as it is unable to allocate more than PAGE_SIZE. Switch the estatus pool to vmalloc() memory. On x86 vmalloc() memory may fault and be fixed up by vmalloc_fault(). To prevent this call vmalloc_sync_all() before an NMI handler could discover the memory. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:44 +01:00
James Morse	93066e9aef	ACPI / APEI: Remove silent flag from ghes_read_estatus() Subsequent patches will split up ghes_read_estatus(), at which point passing around the 'silent' flag gets annoying. This is to suppress prink() messages, which prior to commit `42a0bb3f71` ("printk/nmi: generic solution for safe printk in NMI"), were unsafe in NMI context. This is no longer necessary, remove the flag. printk() messages are batched in a per-cpu buffer and printed via irq-work, or a call back from panic(). Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:44 +01:00
James Morse	78b0b690f6	ACPI / APEI: Don't wait to serialise with oops messages when panic()ing oops_begin() exists to group printk() messages with the oops message printed by die(). To reach this caller we know that platform firmware took this error first, then notified the OS via NMI with a 'panic' severity. Don't wait for another CPU to release the die-lock before panic()ing, our only goal is to print this fatal error and panic(). This code is always called in_nmi(), and since commit `42a0bb3f71` ("printk/nmi: generic solution for safe printk in NMI"), it has been safe to call printk() from this context. Messages are batched in a per-cpu buffer and printed via irq-work, or a call back from panic(). Link: https://patchwork.kernel.org/patch/10313555/ Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-02-07 23:10:44 +01:00
Lenny Szubowicz	98cff8b23e	ACPI/APEI: Clear GHES block_status before panic() In __ghes_panic() clear the block status in the APEI generic error status block for that generic hardware error source before calling panic() to prevent a second panic() in the crash kernel for exactly the same fatal error. Otherwise ghes_probe(), running in the crash kernel, would see an unhandled error in the APEI generic error status block and panic again, thereby precluding any crash dump. Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com> Signed-off-by: David Arcari <darcari@redhat.com> Tested-by: Tyler Baicar <baicar.tyler@gmail.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-12-20 21:35:45 +01:00
Alexandru Gagniuc	305d0e006a	EDAC, ghes: Remove unused argument to ghes_edac_report_mem_error() The use of the @ghes argument was removed in a previous commit, but function signature was not updated to reflect this. Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2018-05-12 10:59:15 +02:00
Borislav Petkov	cc7f3f1326	ghes, EDAC: Fix ghes_edac registration Tony reported seeing "Internal error: Can't find EDAC structure" when injecting correctable errors due to the fact that ghes_edac would still load even if the whitelist won't hit. Drop the pr_err() in ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac depends on ghes.c. While at it, make ghes_edac_register() return an error if it doesn't hit in the whitelist as it is the only sensible thing to do in that situation. Furthermore, move the call to it to happen last in ghes_probe() so that GHES initializing properly does not depend on ghes_edac init at all as latter is only reporting errors and not required for GHES's proper functioning. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Tested-by: Sughosh Ganu <sughosh.ganu@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk	2018-05-02 13:57:30 +02:00
Linus Torvalds	d4173023e6	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo cleanups from Eric Biederman: "Long ago when 2.4 was just a testing release copy_siginfo_to_user was made to copy individual fields to userspace, possibly for efficiency and to ensure initialized values were not copied to userspace. Unfortunately the design was complex, it's assumptions unstated, and humans are fallible and so while it worked much of the time that design failed to ensure unitialized memory is not copied to userspace. This set of changes is part of a new design to clean up siginfo and simplify things, and hopefully make the siginfo handling robust enough that a simple inspection of the code can be made to ensure we don't copy any unitializied fields to userspace. The design is to unify struct siginfo and struct compat_siginfo into a single definition that is shared between all architectures so that anyone adding to the set of information shared with struct siginfo can see the whole picture. Hopefully ensuring all future si_code assignments are arch independent. The design is to unify copy_siginfo_to_user32 and copy_siginfo_from_user32 so that those function are complete and cope with all of the different cases documented in signinfo_layout. I don't think there was a single implementation of either of those functions that was complete and correct before my changes unified them. The design is to introduce a series of helpers including force_siginfo_fault that take the values that are needed in struct siginfo and build the siginfo structure for their callers. Ensuring struct siginfo is built correctly. The remaining work for 4.17 (unless someone thinks it is post -rc1 material) is to push usage of those helpers down into the architectures so that architecture specific code will not need to deal with the fiddly work of intializing struct siginfo, and then when struct siginfo is guaranteed to be fully initialized change copy siginfo_to_user into a simple wrapper around copy_to_user. Further there is work in progress on the issues that have been documented requires arch specific knowledge to sort out. The changes below fix or at least document all of the issues that have been found with siginfo generation. Then proceed to unify struct siginfo the 32 bit helpers that copy siginfo to and from userspace, and generally clean up anything that is not arch specific with regards to siginfo generation. It is a lot but with the unification you can of siginfo you can already see the code reduction in the kernel" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits) signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr mm/memory_failure: Remove unused trapno from memory_failure signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal: Helpers for faults with specialized siginfo layouts signal: Add send_sig_fault and force_sig_fault signal: Replace memset(info,...) with clear_siginfo for clarity signal: Don't use structure initializers for struct siginfo signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered ptrace: Use copy_siginfo in setsiginfo and getsiginfo signal: Unify and correct copy_siginfo_to_user32 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32 signal: Unify and correct copy_siginfo_from_user32 signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h signal/powerpc: Remove redefinition of NSIGTRAP on powerpc signal: Move addr_lsb into the _sigfault union for clarity ...	2018-01-30 14:18:52 -08:00
Eric W. Biederman	83b57531c5	mm/memory_failure: Remove unused trapno from memory_failure Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc, powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO (alpha, metag, sparc, and tile). These two sets of architectures do not interesect so remove the trapno paramater to remove confusion. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-01-23 12:17:42 -06:00
Colin Ian King	24bc8f03be	ACPI / APEI: remove redundant variables len and node_len Variables len and node_len are redundant and can be removed. Cleans up clang warning: node_len = GHES_ESTATUS_NODE_LEN(len); Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-12-05 02:11:50 +01:00
Tyler Baicar	9852ce9ae2	ACPI: APEI: call into AER handling regardless of severity Currently the GHES code only calls into the AER driver for recoverable type errors. This is incorrect because errors of other severities do not get logged by the AER driver and do not get exposed to user space via the AER trace event. So, call into the AER driver for PCIe errors regardless of the severity Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-12-05 02:10:13 +01:00
Tyler Baicar	3c5b977f06	ACPI: APEI: handle PCIe AER errors in separate function Move PCIe AER error handling code into a separate function. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-12-05 02:10:13 +01:00
Linus Torvalds	04ed510988	ACPI updates for v4.15-rc1 - Update the ACPICA code to upstream revision 20170831 including * PDTT table header support (Bob Moore). * Cleanup and extension of internal string-to-integer conversion functions (Bob Moore). * Support for 64-bit hardware accesses (Lv Zheng). * ACPI PM Timer code adjustment to deal with 64-bit return values of acpi_hw_read() (Bob Moore). * Support for deferred table verification in acpiexec (Lv Zheng). - Fix APEI to use the fixmap instead of ioremap_page_range() which cannot work correctly the way the code in there attempted to use it and drop some code that's not necessary any more after that change (James Morse). - Clean up the APEI support code and make it use 64-bit timestamps (Arnd Bergmann, Dongjiu Geng, Jan Beulich). - Add operation region driver for TI PMIC TPS68470 (Rajmohan Mani). - Add support for PCC subspace IDs to the ACPI CPPC driver (George Cherian). - Fix an ACPI EC driver regression related to the handling of EC events during the "noirq" phases of system suspend/resume (Lv Zheng). - Delay the initialization of the lid state in the ACPI button driver to fix issues appearing on some systems (Hans de Goede). - Extend the KIOX000A "device always present" quirk to cover all affected BIOS versions (Hans de Goede). - Clean up some code in the ACPI core and drivers (Colin Ian King, Gustavo Silva). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJaCg33AAoJEILEb/54YlRxTe0P/jEFsSXCmAussc0DoqcXuep/ GEzsMHZLBU59oVTVqiji19vkVEiJldANmnFniMTr3sJ52QSgLQH4Wtv5QGzTCmUq C3VzfSye5QS726f/Fk4tgZIFy5WL3EzweEbPmrcsFQvShU/vNHzvGUNcnPy9IWXE O+kISx8YTB6z4laa9cJLjTMEuDgRUpyubb9dZBBvXC7RIuHstk8+GyLvvPImKGBL sk5PNChP0WGLLSG7BayOUG3/7Q2RaFpbgjCos2dounPAJW5TXmMJUsZ0gvtXy0Z8 ZoPmqgPlYYlHVBlpy7oO4WGFLYJ+KZ+w27aEN1n0C3n9BU9AqWBKw8nkxfpCgPxy 3p2dwuh1igHsCAEVaaGjw02bewszIdl68q3ZfC7xujE401SG+Py7VCnTAbkffC0M nXP8RlGg4V3blwvNM47g3Hh8VG7vJgsW2fvBdSQa/Za7ML8aqxkvtk1BzhDCN19X tIqn9RMLWoPSnrEdqSi4HK88iHRvagPJncemFyDQl4LE+V5rWBCUqNumLLjL2i4L uBxqlK3tBVWKM0iKmISDHpjUZHaqM3g/Lmyo3aExWTog06OB81hMG3b57RrbWj9t 1PIbQOtAazhqM4Scdg1mWTaRNR1p40V9RyA6YvqTIbjDRDPkxEfIUECvRBFwgWkd JtFkKwR65EFkH8bGNXQi =7mrU -----END PGP SIGNATURE----- Merge tag 'acpi-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update ACPICA to upstream revision 20170831, fix APEI to use the fixmap instead of ioremap_page_range(), add an operation region driver for TI PMIC TPS68470, add support for PCC subspace IDs to the ACPI CPPC driver, fix a few assorted issues and clean up some code. Specifics: - Update the ACPICA code to upstream revision 20170831 including * PDTT table header support (Bob Moore). * Cleanup and extension of internal string-to-integer conversion functions (Bob Moore). * Support for 64-bit hardware accesses (Lv Zheng). * ACPI PM Timer code adjustment to deal with 64-bit return values of acpi_hw_read() (Bob Moore). * Support for deferred table verification in acpiexec (Lv Zheng). - Fix APEI to use the fixmap instead of ioremap_page_range() which cannot work correctly the way the code in there attempted to use it and drop some code that's not necessary any more after that change (James Morse). - Clean up the APEI support code and make it use 64-bit timestamps (Arnd Bergmann, Dongjiu Geng, Jan Beulich). - Add operation region driver for TI PMIC TPS68470 (Rajmohan Mani). - Add support for PCC subspace IDs to the ACPI CPPC driver (George Cherian). - Fix an ACPI EC driver regression related to the handling of EC events during the "noirq" phases of system suspend/resume (Lv Zheng). - Delay the initialization of the lid state in the ACPI button driver to fix issues appearing on some systems (Hans de Goede). - Extend the KIOX000A "device always present" quirk to cover all affected BIOS versions (Hans de Goede). - Clean up some code in the ACPI core and drivers (Colin Ian King, Gustavo Silva)" * tag 'acpi-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) ACPI: Mark expected switch fall-throughs ACPI / LPSS: Remove redundant initialization of clk ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs mailbox: PCC: Move the MAX_PCC_SUBSPACES definition to header file ACPI / sysfs: Make function param_set_trace_method_name() static ACPI / button: Delay acpi_lid_initialize_state() until first user space open ACPI / EC: Fix regression related to triggering source of EC event handling APEI / ERST: use 64-bit timestamps ACPI / APEI: Remove arch_apei_flush_tlb_one() arm64: mm: Remove arch_apei_flush_tlb_one() ACPI / APEI: Remove ghes_ioremap_area ACPI / APEI: Replace ioremap_page_range() with fixmap ACPI / APEI: remove the unused dead-code for SEA/NMI notification type ACPI / x86: Extend KIOX000A quirk to cover all affected BIOS versions ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq() ACPICA: Update version to 20170831 ACPICA: Update acpi_get_timer for 64-bit interface to acpi_hw_read ACPICA: String conversions: Update to add new behaviors ACPICA: String conversions: Cleanup/format comments. No functional changes ACPICA: Restructure/cleanup all string-to-integer conversion functions ...	2017-11-13 20:08:22 -08:00
James Morse	520e18a508	ACPI / APEI: Remove ghes_ioremap_area Now that nothing is using the ghes_ioremap_area pages, rip them out. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: All applicable <stable@vger.kernel.org>	2017-11-07 12:13:21 +01:00
James Morse	4f89fa286f	ACPI / APEI: Replace ioremap_page_range() with fixmap Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Tested-by: Toshi Kani <toshi.kani@hpe.com> [ For the arm64 bits: ] Acked-by: Will Deacon <will.deacon@arm.com> [ For the x86 bits: ] Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: All applicable <stable@vger.kernel.org>	2017-11-07 12:12:44 +01:00
Kees Cook	d5272003b8	ACPI / APEI: Convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tyler Baicar <tbaicar@codeaurora.org> Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org> Cc: Shiju Jose <shiju.jose@huawei.com> Cc: linux-acpi@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Tyler Baicar <tbaicar@codeaurora.org>	2017-11-02 15:50:14 -07:00
Dongjiu Geng	c49870e89f	ACPI / APEI: remove the unused dead-code for SEA/NMI notification type For the SEA notification, the two functions ghes_sea_add() and ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA is defined. If not, it will return errors in the ghes_probe() and not continue. If the probe is failed, the ghes_sea_remove() also has no chance to be called. Hence, remove the unnecessary handling when CONFIG_ACPI_APEI_SEA is not defined. For the NMI notification, it has the same issue as SEA notification, so also remove the unused dead-code for it. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-10-23 13:05:21 +02:00
Jan Beulich	095f613c6b	ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq() Match up with what `7edda0886b` ("acpi: apei: handle SEA notification type for ARMv8") did for ghes_ioremap_pfn_nmi(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-10-11 01:44:42 +02:00
Tyler Baicar	aaf2c2fb0f	ACPI / APEI: clear error status before acknowledging the error Currently we acknowledge errors before clearing the error status. This could cause a new error to be populated by firmware in-between the error acknowledgment and the error status clearing which would cause the second error's status to be cleared without being handled. So, clear the error status before acknowledging the errors. Also, make sure to acknowledge the error if the error status read fails. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-09-27 23:13:06 +02:00
Punit Agrawal	e931d0dab4	ACPI / APEI: Suppress message if HEST not present According to the ACPI specification, firmware is not required to provide the Hardware Error Source Table (HEST). When HEST is not present, the following superfluous message is printed to the kernel boot log - [ 3.460067] GHES: HEST is not enabled! Extend hest_disable variable to track whether the firmware provides this table and if it is not present skip any log output. The existing behaviour is preserved in all other cases. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-08-30 03:11:21 +02:00
Loc Ho	bdb9458a33	ACPI: APEI: Enable APEI multiple GHES source to share a single external IRQ X-Gene platforms describe multiple GHES error sources with the same hardware error notification type (external interrupt) and interrupt number. Change the GHES interrupt request to support sharing the same IRQ. This change includs contributions from Tuan Phan <tphan@apm.com>. Signed-off-by: Loc Ho <lho@apm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-31 13:09:39 +02:00
Linus Torvalds	55a7b2125c	arm64 updates for 4.13: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJZWiuVAAoJELescNyEwWM0g/gIAIRpVEzjE61zfm/KCsVuIu4O p6F/HrvF/ApvlFcth8LDpTDYUholzT1e9wmx/O0Ll37UvFUrReT03R5MMJ02WU8s hRg0N4izdg2BPa9zuaP/XE5i6WmFfRAwFsv6PzX77FjNGk0M4zhW8acNpWHYMBQT DwXT/xCvg6045Sj6CuwfcIqqVHrz6/kpBmvdbW7G3/WpIHpUGIWM9EO3mkuLGMj0 j0VSCxfAVJvWwmKEBdFExLNjqxvSlVAMOIEAw7yBNLjuheiL+afK+Y1BggB00oe8 14+6viOgW6L97VmPpYVn0YDseqeGg5DqlNF3NqjTqdmzWH/ApAvL4WXN7SL2jbU= =RNzb -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: fix endianness annotation for 'struct jit_ctx' and friends arm64: cpuinfo: constify attribute_group structures. arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() arm64: ptrace: Remove redundant overrun check from compat_vfp_set() arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn() arm64: fix endianness annotation in get_kaslr_seed() arm64: add missing conversion to __wsum in ip_fast_csum() arm64: fix endianness annotation in acpi_parking_protocol.c arm64: use readq() instead of readl() to read 64bit entry_point arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm() arm64: fix endianness annotation for aarch64_insn_write() arm64: fix endianness annotation in aarch64_insn_read() arm64: fix endianness annotation in call_undef_hook() arm64: fix endianness annotation for debug-monitors.c ras: mark stub functions as 'inline' arm64: pass endianness info to sparse arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels arm64: signal: Allow expansion of the signal frame acpi: apei: check for pending errors when probing GHES entries ...	2017-07-05 17:09:27 -07:00
Linus Torvalds	4422d80ed7	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Thomas Gleixner: "The RAS updates for the 4.13 merge window: - Cleanup of the MCE injection facility (Borsilav Petkov) - Rework of the AMD/SMCA handling (Yazen Ghannam) - Enhancements for ACPI/APEI to handle new notitication types (Shiju Jose) - atomic_t to refcount_t conversion (Elena Reshetova) - A few fixes and enhancements all over the place" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS/CEC: Check the correct variable in the debugfs error handling x86/mce: Always save severity in machine_check_poll() x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise x86/mce: Update bootlog description to reflect behavior on AMD x86/mce: Don't disable MCA banks when offlining a CPU on AMD x86/mce/mce-inject: Preset the MCE injection struct x86/mce: Clean up include files x86/mce: Get rid of register_mce_write_callback() x86/mce: Merge mce_amd_inj into mce-inject x86/mce/AMD: Use saved threshold block info in interrupt handler x86/mce/AMD: Use msr_stat when clearing MCA_STATUS x86/mce/AMD: Carve out SMCA bank configuration x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t RAS: Make local function parse_ras_param() static ACPI/APEI: Handle GSIV and GPIO notification types	2017-07-03 18:33:03 -07:00
Tyler Baicar	77b246b32b	acpi: apei: check for pending errors when probing GHES entries Check for pending errors when probing GHES entries. It is possible that a fatal error is already pending at this point, so we should handle it as soon as the driver is probed. This also avoids a potential issue if there was an interrupt that was already cleared for an error since the GHES driver wasn't present. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 20:19:59 +01:00
Tyler Baicar	621f48e40e	arm/arm64: KVM: add guest SEA support Currently external aborts are unsupported by the guest abort handling. Add handling for SEAs so that the host kernel reports SEAs which occur in the guest kernel. When an SEA occurs in the guest kernel, the guest exits and is routed to kvm_handle_guest_abort(). Prior to this patch, a print message of an unsupported FSC would be printed and nothing else would happen. With this patch, the code gets routed to the APEI handling of SEAs in the host kernel to report the SEA information. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 18:22:05 +01:00
Tyler Baicar	e9279e83ad	trace, ras: add ARM processor error trace event Currently there are trace events for the various RAS errors with the exception of ARM processor type errors. Add a new trace event for such errors so that the user will know when they occur. These trace events are consistent with the ARM processor error section type defined in UEFI 2.6 spec section N.2.4.4. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 18:22:05 +01:00
Tyler Baicar	297b64c743	ras: acpi / apei: generate trace event for unrecognized CPER section The UEFI spec includes non-standard section type support in the Common Platform Error Record. This is defined in section N.2.3 of UEFI version 2.5. Currently if the CPER section's type (UUID) does not match any section type that the kernel knows how to parse, a trace event is not generated. Generate a trace event which contains the raw error data for non-standard section type error records. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Tested-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 18:22:04 +01:00
Jonathan (Zhixiong) Zhang	2fb5853e44	acpi: apei: panic OS with fatal error status block Even if an error status block's severity is fatal, the kernel does not honor the severity level and panic. With the firmware first model, the platform could inform the OS about a fatal hardware error through the non-NMI GHES notification type. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 18:22:04 +01:00
Tyler Baicar	7edda0886b	acpi: apei: handle SEA notification type for ARMv8 ARM APEI extension proposal added SEA (Synchronous External Abort) notification type for ARMv8. Add a new GHES error source handling function for SEA. If an error source's notification type is SEA, then this function can be registered into the SEA exception handler. That way GHES will parse and report SEA exceptions when they occur. An SEA can interrupt code that had interrupts masked and is treated as an NMI. To aid this the page of address space for mapping APEI buffers while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq(). Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 18:22:03 +01:00
Tyler Baicar	bbcc2e7b64	ras: acpi/apei: cper: add support for generic data v3 structure The ACPI 6.1 spec adds a new revision of the generic error data entry structure. Add support to handle the new structure as well as properly verify and iterate through the generic data entries. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 15:43:47 +01:00
Tyler Baicar	42aa560446	acpi: apei: read ack upon ghes record consumption A RAS (Reliability, Availability, Serviceability) controller may be a separate processor running in parallel with OS execution, and may generate error records for consumption by the OS. If the RAS controller produces multiple error records, then they may be overwritten before the OS has consumed them. The Generic Hardware Error Source (GHES) v2 structure introduces the capability for the OS to acknowledge the consumption of the error record generated by the RAS controller. A RAS controller supporting GHESv2 shall wait for the acknowledgment before writing a new error record, thus eliminating the race condition. Add support for parsing of GHESv2 sub-tables as well. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2017-06-22 15:43:47 +01:00
Andy Shevchenko	5b53696a30	ACPI / APEI: Switch to use new generic UUID API There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-05 19:42:02 +02:00
Shiju Jose	7bf130e4a0	ACPI/APEI: Handle GSIV and GPIO notification types System Controller Interrupts are received by ACPI's error device, which in turn notifies the GHES code. The same is true of APEI's GSIV and GPIO notification types. Add support for GSIV and GPIO sharing the SCI register/unregister/notifier code. Rename the list and notifier to show this is no longer just SCI, but anything from the Hardware Error Device. Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ Rewrite commit log. ] Signed-off-by: James Morse <james.morse@arm.com> [ Some small cleanups ontop. ] Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Link: http://lkml.kernel.org/r/86258A5CC0A3704780874CF6004BA8A62E695201@FRAEML521-MBX.china.huawei.com Cc: "Guohanjun (Hanjun Guo)" <guohanjun@huawei.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: "Zhengqiang (turing)" <zhengqiang10@huawei.com> Cc: "fu.wei@linaro.org" <fu.wei@linaro.org> Cc: "xuwei (O)" <xuwei5@hisilicon.com> Cc: Gabriele Paoloni <gabriele.paoloni@huawei.com> Cc: Geliang Tang <geliangtang@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: Len Brown <lenb@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: linux-acpi@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-21 21:55:12 +02:00
Linus Torvalds	3dee9fb2a4	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - add the 'Corrected Errors Collector' kernel feature which collect and monitor correctable errors statistics and will preemptively (soft-)offline physical pages that have a suspiciously high error count. - handle MCE errors during kexec() more gracefully - factor out and deprecate the /dev/mcelog driver - ... plus misc fixes and cleanpus" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only ACPI/APEI: Use setup_deferrable_timer() x86/mce: Update notifier priority check x86/mce: Enable PPIN for Knights Landing/Mill x86/mce: Do not register notifiers with invalid prio x86/mce: Factor out and deprecate the /dev/mcelog driver RAS: Add a Corrected Errors Collector x86/mce: Rename mce_log to mce_log_buffer x86/mce: Rename mce_log()'s argument x86/mce: Init some CPU features early x86/mce: Handle broadcasted MCE gracefully with kexec	2017-05-01 20:48:33 -07:00
Geliang Tang	7237c75b27	ACPI/APEI: Use setup_deferrable_timer() Use setup_deferrable_timer() instead of init_timer_deferrable() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/3afa5498142ef68256023257dad37b9f8352e65e.1489060803.git.geliangtang@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-19 12:00:14 +02:00
James Morse	7d64f82cce	ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal When removing a GHES device notified by SCI, list_del_rcu() is used, ghes_remove() should call synchronize_rcu() before it goes on to call kfree(ghes), otherwise concurrent RCU readers may still hold this list entry after it has been freed. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Fixes: `81e88fdc43` (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-03-28 23:43:42 +02:00
Ingo Molnar	e601757102	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:27 +01:00
Prarit Bhargava	a545715d2d	ACPI / APEI: Fix NMI notification handling When removing and adding cpu 0 on a system with GHES NMI the following stack trace is seen when re-adding the cpu: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+ Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2 Call Trace: dump_stack+0x63/0x8e __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 setup_local_APIC+0x275/0x370 apic_ap_setup+0xe/0x20 start_secondary+0x48/0x180 set_init_arg+0x55/0x55 early_idt_handler_array+0x120/0x120 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x13d/0x14c During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR (0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms that something has set the IRQ_WORK_VECTOR line prior to the APIC being initialized. Commit `2383844d48` ("GHES: Elliminate double-loop in the NMI handler") incorrectly modified the behavior such that the handler returns NMI_HANDLED only if an error was processed, and incorrectly runs the ghes work queue for every NMI. This patch modifies the ghes_proc_irq_work() to run as it did prior to `2383844d48` ("GHES: Elliminate double-loop in the NMI handler") by properly returning NMI_HANDLED and only calling the work queue if NMI_HANDLED has been set. Fixes: `2383844d48` (GHES: Elliminate double-loop in the NMI handler) Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-02 00:20:02 +01:00
Punit Agrawal	806487a8fc	ACPI / APEI: Fix incorrect return value of ghes_proc() Although ghes_proc() tests for errors while reading the error status, it always return success (0). Fix this by propagating the return value. Fixes: `d334a49113` (ACPI, APEI, Generic Hardware Error Source memory error support) Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-24 14:32:14 +02:00
Tyler Baicar	2458d66b24	ACPI / APEI: Send correct severity to calculate AER severity Currently the AER severity is calculated by calling cper_severity_to_aer(), but the parameter sent is actually the GHES severity. This causes the AER severity to be incorrect. Fix the parameter to be the CPER severity instead of the GHES severity. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Borislav Petkov <bp@suse.de>	2016-09-20 14:36:59 -05:00
Paul Gortmaker	020bf066a6	drivers/acpi: make apei/ghes.c more explicitly non-modular The Kconfig currently controlling compilation of this code is: config ACPI_APEI_GHES bool "APEI Generic Hardware Error Source" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We replace module.h with moduleparam.h as we are keeping the pre-existing module_param that the file has, as currently that is the easiest way to maintain compatibility with the existing boot arg use cases. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 23:46:07 +01:00
Ingo Molnar	c7d77a7980	Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-14 16:05:18 +02:00
Jonathan (Zhixiong) Zhang	8ece249a81	acpi/apei: Use appropriate pgprot_t to map GHES memory If the ACPI APEI firmware handles hardware error first (called "firmware first handling"), the firmware updates the GHES memory region with hardware error record (called "generic hardware error record"). Essentially the firmware writes hardware error records in the GHES memory region, triggers an NMI/interrupt, then the GHES driver goes off and grabs the error record from the GHES region. The kernel currently maps the GHES memory region as cacheable (PAGE_KERNEL) for all architectures. However, on some arm64 platforms, there is a mismatch between how the kernel maps the GHES region (PAGE_KERNEL) and how the firmware maps it (EFI_MEMORY_UC, ie. uncacheable), leading to the possibility of the kernel GHES driver reading stale data from the cache when it receives the interrupt. With stale data being read, the kernel is unaware there is new hardware error to be handled when there actually is; this may lead to further damage in various scenarios, such as error propagation caused data corruption. If uncorrected error (such as double bit ECC error) happened in memory operation and if the kernel is unaware of such an event happening, errorneous data may be propagated to the disk. Instead GHES memory region should be mapped with page protection type according to what is returned from arch_apei_get_mem_attribute(). Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [ Small stylistic tweaks. ] Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441372302-23242-3-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-09-14 11:40:04 +02:00
Jarkko Nikula	4c62dbbce9	ACPI: Remove FSF mailing addresses There is no need to carry potentially outdated Free Software Foundation mailing address in file headers since the COPYING file includes it. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-08 02:27:32 +02:00
Jiri Kosina	6fe9e7c26a	GHES: Make NMI handler have a single reader Since GHES sources are global, we theoretically need only a single CPU reading them per NMI instead of a thundering herd of CPUs waiting on a spinlock in NMI context for no reason at all. Do that. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-04-27 21:35:33 +02:00

1 2 3 4

153 commits