Cosmian maintains a Python UDF library that enables efficient, large-scale cryptographic operations with the Cosmian KMS.
These User Defined Functions (UDFs) are designed specifically for:
- Data processing platforms like PySpark, Databricks, Snowflake, and Kafka
- Implementing application-level encryption in Big Data environments
- Protecting sensitive data by encrypting it before storage and only decrypting during processing
- Minimizing clear-text exposure of sensitive information
The library provides:
- Batch processing capabilities for efficient encryption/decryption
- Parallelization to optimize throughput
- Support for multiple encryption contexts and key identifiers
- Performance optimizations for Big Data workloads
Expected throughput is in the range of 5 million decryption in 5 seconds with a 10 vCPU KMS instance.
The UDFs support several NIST-approved and RFC-standardized algorithms:
- AES-GCM (NIST SP 800-38D): Standard authenticated encryption
- AES-GCM-SIV (RFC 8452): Deterministic authenticated encryption with nonce misuse resistance
- AES-XTS (NIST SP 800-38E): Storage encryption for fixed-length data
- ChaCha20-Poly1305 (RFC 8439): High-performance authenticated encryption
Key performance factors include:
- Network latency between the UDF execution environment and KMS server
- Batch size optimization for your specific workload
- Algorithm selection based on security requirements
- Potential use of deterministic encryption (like AES-GCM-SIV) for performance-critical applications
Check the encryption and decryption a scale section for details.
For implementation details, examples, and best practices, refer to
the GitHub repository, particularly the tests directory which
contains practical code samples for various use cases.