Apache Parquet – Critical RCE via Deserialization
Apache Parquet – Critical RCE via Deserialization
Summary
On April 5, 2025, a critical deserialization vulnerability (CVE-2025-30065) affecting Apache Parquet was disclosed. Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It is widely used in big data environments and supported by many programming languages and analytics tools. This vulnerability can be exploited to achieve remote code execution (RCE) under specific conditions, making it a high-priority issue for organizations using the affected versions.
The flaw was discovered by independent security researchers who noted that insecure deserialization paths within the Parquet engine can be triggered by specially crafted input files. This vulnerability is especially dangerous in scenarios where Parquet files are automatically ingested and processed without strict validation. While deserialization issues are well-known vulnerabilities, this case represents a modern exploitation opportunity due to Parquet’s broad integration across data processing tools.
Affected Systems and Applications
CVE-2025-30065 impacts Apache Parquet libraries and any systems that utilize them to ingest or process untrusted Parquet files. This includes:
- Apache Parquet <= 1.15.0
- Frameworks and applications such as Apache Spark, Apache Hive, Presto/Trino, and various Python data libraries (e.g., PyArrow, Pandas with Parquet integration) when they rely on the vulnerable version.
- Potentially Delta Lake, if built on top of affected Parquet versions.
- Any pipeline that automatically consumes Parquet files from unmaintained AND/OR untrusted sources is at heightened risk.
Technical Details / Attack Overview
This vulnerability arises from unsafe Java object deserialization logic in Apache Parquet’s file parsing mechanisms. An attacker can craft a Parquet file with metadata fields that contain serialized Java objects. When this metadata is parsed, it may be deserialized without proper validation, potentially leading to remote code execution.
In affected versions, deserialization is performed without validation or sandboxing, leading to arbitrary code execution if certain classpaths are available. The attack chain typically involves:
- File – An attacker creates a malicious Parquet file with hidden harmful code inside its metadata.
- Upload – That file is uploaded or exposed to a system that processes data.
- Trigger – A job or query reads the malicious metadata from the file.
- Run – The embedded code runs on the system, allowing the attacker to take control.
Exploitation requires that the runtime environment allows for the execution of the deserialized object. Tools like ysoserial can be used to generate PoCs. This mirrors similar historic attacks in Java ecosystems, including those seen in Apache Commons Collections.
Temporary Workarounds and Mitigations
Until patches are applied, organizations should:
- Immediately upgrade to Apache Parquet 1.15.1
- Avoid processing Parquet files from untrusted or unknown sources.
- Apply strict allowlists to file ingestion workflows.
- Utilize sandboxing techniques (e.g., Java SecurityManager, containers) when processing files.
- Review classpath configurations to prevent deserialization of arbitrary classes.
Unfortunately, no quick mitigation fully removes risk aside from applying the official patch.
Detection Guidance
Detection strategies include:
- Monitoring data ingestion workflows for anomalies, especially newly introduced or unexpected Parquet files.
- Pay close attention to unmaintained sources using a vulnerable version of Apache Parquet.
- Reviewing historical data ingestion logs for signs of prior exploitation.
- Guidance for handling CVE-2025-30065 using Microsoft Security capabilities
What the Cyber Fusion Center is Doing
Vulnerability plugin has been released for Tenable but is currently under development and vulnerability scanning definitions for Qualys is being tracked closely.
The CFC is currently investigating whether large-scale threat hunting rules can be deployed across environments.
References
- https://cve.org/CVERecord?id=CVE-2025-30065
- https://nvd.nist.gov/vuln/detail/CVE-2025-30065
- https://parquet.apache.org/
- https://owasp.org/www-community/vulnerabilities/Deserialization_of_untrusted_data
- https://github.com/frohoff/ysoserial
- https://www.tenable.com/cve/CVE-2025-30065/plugins
- https://techcommunity.microsoft.com/blog/microsoftdefendercloudblog/guidance-for-handling-cve-2025-30065-using-microsoft-security-capabilities/4401362