In the Linux kernel, the following vulnerability has been resolved:
x86/sgx: Prevent attempts to reclaim poisoned pages
TL;DR: SGX page reclaim touches the page to copy its contents to secondary storage. SGX instructions do not gracefully handle machine checks. Despite this, the existing SGX code will try to reclaim pages that it knows are poisoned. Avoid even trying to reclaim poisoned pages.
The longer story:
Pages used by an enclave only get epc_page->poison set in arch_memory_failure() but they currently stay on sgx_active_page_list until sgx_encl_release(), with the SGX_EPC_PAGE_RECLAIMER_TRACKED flag untouched.
epc_page->poison is not checked in the reclaimer logic meaning that, if other conditions are met, an attempt will be made to reclaim an EPC page that was poisoned. This is bad because 1. we don't want that page to end up added to another enclave and 2. it is likely to cause one core to shut down and the kernel to panic.
Specifically, reclaiming uses microcode operations including "EWB" which accesses the EPC page contents to encrypt and write them out to non-SGX memory. Those operations cannot handle MCEs in their accesses other than by putting the executing core into a special shutdown state (affecting both threads with HT.) The kernel will subsequently panic on the remaining cores seeing the core didn't enter MCE handler(s) in time.
Call sgx_unmark_page_reclaimable() to remove the affected EPC page from sgx_active_page_list on memory error to stop it being considered for reclaiming.
Testing epc_page->poison in sgx_reclaim_pages() would also work but I assume it's better to add code in the less likely paths.
The affected EPC page is not added to &node->sgx_poison_page_list until later in sgx_encl_release()->sgx_free_epc_page() when it is EREMOVEd. Membership on other lists doesn't change to avoid changing any of the lists' semantics except for sgx_active_page_list. There's a "TBD" comment in arch_memory_failure() about pre-emptive actions, the goal here is not to address everything that it may imply.
This also doesn't completely close the time window when a memory error notification will be fatal (for a not previously poisoned EPC page) -- the MCE can happen after sgx_reclaim_pages() has selected its candidates or even inside a microcode operation (actually easy to trigger due to the amount of time spent in them.)
The spinlock in sgx_unmark_page_reclaimable() is safe because memory_failure() runs in process context and no spinlocks are held, explicitly noted in a mm/memory-failure.c comment.
| Software | From | Fixed in |
|---|---|---|
| linux / linux_kernel | 5.11 | 6.1.142 |
| linux / linux_kernel | 6.2 | 6.6.95 |
| linux / linux_kernel | 6.7 | 6.12.35 |
| linux / linux_kernel | 6.13 | 6.15.4 |
| debian / debian_linux | 11.0 | 11.0.x |
A security vulnerability is a weakness in software, hardware, or configuration that can be exploited to compromise confidentiality, integrity, or availability. Many vulnerabilities are tracked as CVEs (Common Vulnerabilities and Exposures), which provide a standardized identifier so teams can coordinate patching, mitigation, and risk assessment across tools and vendors.
CVSS (Common Vulnerability Scoring System) estimates technical severity, but it doesn't automatically equal business risk. Prioritize using context like internet exposure, affected asset criticality, known exploitation (proof-of-concept or in-the-wild), and whether compensating controls exist. A "Medium" CVSS on an exposed, production system can be more urgent than a "Critical" on an isolated, non-production host.
A vulnerability is the underlying weakness. An exploit is the method or code used to take advantage of it. A zero-day is a vulnerability that is unknown to the vendor or has no publicly available fix when attackers begin using it. In practice, risk increases sharply when exploitation becomes reliable or widespread.
Recurring findings usually come from incomplete Asset Discovery, inconsistent patch management, inherited images, and configuration drift. In modern environments, you also need to watch the software supply chain: dependencies, containers, build pipelines, and third-party services can reintroduce the same weakness even after you patch a single host. Unknown or unmanaged assets (often called Shadow IT) are a common reason the same issues resurface.
Use a simple, repeatable triage model: focus first on externally exposed assets, high-value systems (identity, VPN, email, production), vulnerabilities with known exploits, and issues that enable remote code execution or privilege escalation. Then enforce patch SLAs and track progress using consistent metrics so remediation is steady, not reactive.
SynScan combines attack surface monitoring and continuous security auditing to keep your inventory current, flag high-impact vulnerabilities early, and help you turn raw findings into a practical remediation plan.