linux/drivers/ras/Kconfig

# SPDX-License-Identifier: GPL-2.0-only
menuconfig RAS
	bool "Reliability, Availability and Serviceability (RAS) features"
	help
	  Reliability, availability and serviceability (RAS) is a computer
	  hardware engineering term. Computers designed with higher levels
	  of RAS have a multitude of features that protect data integrity
	  and help them stay available for long periods of time without
	  failure.

	  Reliability can be defined as the probability that the system will
	  produce correct outputs up to some given time. Reliability is
	  enhanced by features that help to avoid, detect and repair hardware
	  faults.

	  Availability is the probability a system is operational at a given
	  time, i.e. the amount of time a device is actually operating as the
	  percentage of total time it should be operating.

	  Serviceability or maintainability is the simplicity and speed with
	  which a system can be repaired or maintained; if the time to repair
	  a failed system increases, then availability will decrease.

	  Note that Reliability and Availability are distinct concepts:
	  Reliability is a measure of the ability of a system to function
	  correctly, including avoiding data corruption, whereas Availability
	  measures how often it is available for use, even though it may not
	  be functioning correctly. For example, a server may run forever and
	  so have ideal availability, but may be unreliable, with frequent
	  data corruption.

if RAS

source "arch/x86/ras/Kconfig"
source "drivers/ras/amd/atl/Kconfig"

config RAS_FMPM
	tristate "FRU Memory Poison Manager"
	default m
	depends on AMD_ATL && ACPI_APEI
	help
	  Support saving and restoring memory error information across reboot
	  using ACPI ERST as persistent storage. Error information is saved with
	  the UEFI CPER "FRU Memory Poison" section format.

	  Memory will be retired during boot time and run time depending on
	  platform-specific policies.

endif
treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2019-05-19 12:07:45 +00:00			`# SPDX-License-Identifier: GPL-2.0-only`
RAS: Add a menuconfig option with descriptive text Text taken a previous patch from "Gong Chen" <gong.chen@linux.intel.com>. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1439396985-12812-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-08-12 16:29:42 +00:00			`menuconfig RAS`
			`bool "Reliability, Availability and Serviceability (RAS) features"`
			`help`
			`Reliability, availability and serviceability (RAS) is a computer`
			`hardware engineering term. Computers designed with higher levels`
			`of RAS have a multitude of features that protect data integrity`
			`and help them stay available for long periods of time without`
			`failure.`

			`Reliability can be defined as the probability that the system will`
			`produce correct outputs up to some given time. Reliability is`
			`enhanced by features that help to avoid, detect and repair hardware`
			`faults.`

			`Availability is the probability a system is operational at a given`
			`time, i.e. the amount of time a device is actually operating as the`
			`percentage of total time it should be operating.`

			`Serviceability or maintainability is the simplicity and speed with`
			`which a system can be repaired or maintained; if the time to repair`
			`a failed system increases, then availability will decrease.`

			`Note that Reliability and Availability are distinct concepts:`
			`Reliability is a measure of the ability of a system to function`
			`correctly, including avoiding data corruption, whereas Availability`
			`measures how often it is available for use, even though it may not`
			`be functioning correctly. For example, a server may run forever and`
			`so have ideal availability, but may be unreliable, with frequent`
			`data corruption.`

			`if RAS`
x86/ras: Move AMD MCE injector to arch/x86/ras/ This is an x86-specific module and would benefit from being closer to the arch code. Move it there. Update copyright while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-08-12 16:29:45 +00:00
treewide: surround Kconfig file paths with double quotes The Kconfig lexer supports special characters such as '.' and '/' in the parameter context. In my understanding, the reason is just to support bare file paths in the source statement. I do not see a good reason to complicate Kconfig for the room of ambiguity. The majority of code already surrounds file paths with double quotes, and it makes sense since file paths are constant string literals. Make it treewide consistent now. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ingo Molnar <mingo@kernel.org> 2018-12-11 11:01:04 +00:00			`source "arch/x86/ras/Kconfig"`
RAS: Introduce AMD Address Translation Library AMD Zen-based systems report memory errors through Machine Check banks representing Unified Memory Controllers (UMCs). The address value reported for DRAM ECC errors is a "normalized address" that is relative to the UMC. This normalized address must be converted to a system physical address to be usable by the OS. Support for this address translation was introduced to the MCA subsystem with Zen1 systems. The code was later moved to the AMD64 EDAC module, since this was the only user of the code at the time. However, there are uses for this translation outside of EDAC. The system physical address can be used in MCA for preemptive page offlining as done in some MCA notifier functions. Also, this translation is needed as the basis of similar functionality needed for some CXL configurations on AMD systems. Introduce a common address translation library that can be used for multiple subsystems including MCA, EDAC, and CXL. Include support for UMC normalized to system physical address translation for current CPU systems. The Data Fabric Indirect register access offsets and one of the register fields were changed. Default to the current offsets and register field definition. And fallback to the older values if running on a "legacy" system. Provide built-in code to facilitate the loading and unloading of the library module without affecting other modules or built-in code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240123041401.79812-2-yazen.ghannam@amd.com 2024-01-23 04:13:59 +00:00			`source "drivers/ras/amd/atl/Kconfig"`
x86/ras: Move AMD MCE injector to arch/x86/ras/ This is an x86-specific module and would benefit from being closer to the arch code. Move it there. Update copyright while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-08-12 16:29:45 +00:00
RAS: Introduce a FRU memory poison manager Memory errors are an expected occurrence on systems with high memory density. Generally, errors within a small number of unique physical locations are acceptable, based on manufacturer and/or admin policy. During run time, memory with errors may be retired so it is no longer used by the system. This is done in mm through page poisoning, and the effect will remain until the system is restarted. If a memory location is consistently faulty, then the same run time error handling may occur in the next reboot cycle, leading to terminating jobs due to that already known bad memory. This could be prevented if information from the previous boot was not lost. Some add-in cards with driver-managed memory have on-board persistent storage. Their driver saves memory error information to the persistent storage during run time. The information is then restored after reset, and known bad memory will be retired before the hardware is used. A running log of bad memory locations is kept across multiple resets. A similar solution is desirable for CPUs. However, this solution should leverage industry-standard components as much as possible, rather than a bespoke platform driver. Two components are needed: a record format and a persistent storage interface. Implement a new module to manage the record formats on persistent storage. Use the requirements for an AMD MI300-based system to start. Vendor- and platform-specific details can be abstracted later as needed. [ bp: Massage commit message and code, squash 30-ish more fixes from Yazen and me. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: <naveenkrishna.chatradhi@amd.com> Signed-off-by: <naveenkrishna.chatradhi@amd.com> Co-developed-by: <muralidhara.mk@amd.com> Signed-off-by: <muralidhara.mk@amd.com> Tested-by: <sathyapriya.k@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240214033516.1344948-3-yazen.ghannam@amd.com 2024-02-14 03:35:16 +00:00			`config RAS_FMPM`
			`tristate "FRU Memory Poison Manager"`
			`default m`
			`depends on AMD_ATL && ACPI_APEI`
			`help`
			`Support saving and restoring memory error information across reboot`
			`using ACPI ERST as persistent storage. Error information is saved with`
			`the UEFI CPER "FRU Memory Poison" section format.`

			`Memory will be retired during boot time and run time depending on`
			`platform-specific policies.`

RAS: Add a menuconfig option with descriptive text Text taken a previous patch from "Gong Chen" <gong.chen@linux.intel.com>. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1439396985-12812-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-08-12 16:29:42 +00:00			`endif`