Vulnerability Database

351,760

Total vulnerabilities in the database

Crawl4AI: Arbitrary file write (path traversal) in crawler downloads can lead to RCE — Crawl4AI

Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

Summary

When the crawler saves a downloaded file, the destination filename was taken from attacker-influenced input and joined to the downloads directory with no confinement. A filename containing an absolute path (e.g. /etc/cron.d/evil) or ../ traversal escaped the downloads directory, giving an arbitrary file write with attacker-controlled contents. Because the written bytes are attacker-controlled, this escalates to remote code execution (overwriting a shell rc-file, ~/.ssh/authorized_keys, a cron entry, or a Python module on the import path).

Affected paths

Two download sinks in crawl4ai/async_crawler_strategy.py:

  • HTTP crawler (AsyncHTTPCrawlerStrategy): the filename is parsed from the response Content-Disposition header by _extract_filename() and written via aiofiles.open(filepath, 'wb'). Reachable directly via the SDK, and via the unauthenticated Docker /crawl endpoint when an HTTPCrawlerConfig is supplied.
  • Browser crawler (AsyncPlaywrightCrawlerStrategy): the download's suggested_filename (controllable by the visited page) is joined to downloads_path and written via download.save_as().

The HTTP-strategy sink is reachable pre-auth on the default Docker deployment; both are reachable for SDK users simply by crawling an attacker-controlled URL. The default Playwright crawl path that does not trigger a download is unaffected.

Impact

Arbitrary file write with attacker-controlled content as the user running the crawler, escalating to remote code execution.

Fix

Both sinks now resolve the destination through a single hardened helper (_safe_download_filepath) that reduces the attacker-influenced name to a bare basename (dropping absolute paths and .. components) and re-checks, via realpath, that the resolved path stays inside the downloads root (defeating symlink/TOCTOU escapes). A traversal attempt is rejected; normal downloads are unchanged.

Workarounds

  • Upgrade to the patched version (0.9.0).
  • Run the crawler as an unprivileged user with a dedicated, isolated downloads directory on a volume with no sensitive paths writable.
  • Enable authentication (CRAWL4AI_API_TOKEN) on the Docker server.

Credits

Y4tacker - reported the Content-Disposition path traversal in the HTTP crawler with a clear PoC and a basename + realpath-containment fix recommendation.

  • Published: Jun 18, 2026
  • Updated: Jun 19, 2026
  • GHSA: GHSA-2jq4-q6vv-4cp3
  • Severity: Critical
  • Exploit:
  • CISA KEV:

CVSS v3:

  • Severity: Critical
  • Score: 9.6
  • AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Frequently Asked Questions

A security vulnerability is a weakness in software, hardware, or configuration that can be exploited to compromise confidentiality, integrity, or availability. Many vulnerabilities are tracked as CVEs (Common Vulnerabilities and Exposures), which provide a standardized identifier so teams can coordinate patching, mitigation, and risk assessment across tools and vendors.

CVSS (Common Vulnerability Scoring System) estimates technical severity, but it doesn't automatically equal business risk. Prioritize using context like internet exposure, affected asset criticality, known exploitation (proof-of-concept or in-the-wild), and whether compensating controls exist. A "Medium" CVSS on an exposed, production system can be more urgent than a "Critical" on an isolated, non-production host.

A vulnerability is the underlying weakness. An exploit is the method or code used to take advantage of it. A zero-day is a vulnerability that is unknown to the vendor or has no publicly available fix when attackers begin using it. In practice, risk increases sharply when exploitation becomes reliable or widespread.

Recurring findings usually come from incomplete Asset Discovery, inconsistent patch management, inherited images, and configuration drift. In modern environments, you also need to watch the software supply chain: dependencies, containers, build pipelines, and third-party services can reintroduce the same weakness even after you patch a single host. Unknown or unmanaged assets (often called Shadow IT) are a common reason the same issues resurface.

Use a simple, repeatable triage model: focus first on externally exposed assets, high-value systems (identity, VPN, email, production), vulnerabilities with known exploits, and issues that enable remote code execution or privilege escalation. Then enforce patch SLAs and track progress using consistent metrics so remediation is steady, not reactive.

SynScan combines attack surface monitoring and continuous security auditing to keep your inventory current, flag high-impact vulnerabilities early, and help you turn raw findings into a practical remediation plan.