Support image tar, without accessing Docker daemon#256
Support image tar, without accessing Docker daemon#256vulyon wants to merge 6 commits intogoldmann:mainfrom vulyon:support-image-tar
Conversation
|
@lyon-v I've fixed the runners and rebased. However I think you need to run |
|
@rnc Hi there! Apologies for the delayed response. I've fixed the code formatting and it has passed the checks now. |
rnc
left a comment
There was a problem hiding this comment.
In general I think this is a great idea. However there are some areas I have questions/comments on. And tests are also required please. Thanks very much for the PR!
README.rst
Outdated
|
|
||
| :: | ||
|
|
||
| $ python -m docker_squash.cli --input-tar source.tar --tag jboss/wildfly:squashed -f 8 --output-path squashed.tar --load-image false |
There was a problem hiding this comment.
I think this should be run without the -f parameter as the log out below has the squashed image larger than the original which is a confusing result for a README. Also, if both docker squash and tar squash have an example showing the same result IMHO its more inituitive.
There was a problem hiding this comment.
Because, jboss/wildfly:latest this image has changed.
(base) root@master:~# docker pull jboss/wildfly:latest
latest: Pulling from jboss/wildfly
f87ff222252e: Pull complete
8116b2f7ca5a: Pull complete
0b43aea4eeb1: Pull complete
13776e8da872: Pull complete
f26d32e28c29: Pull complete
Digest: sha256:35320abafdec6d360559b411aff466514d5741c3c527221445f48246350fdfe5
Status: Downloaded newer image for jboss/wildfly:latest
docker.io/jboss/wildfly:latest
(base) root@master:~# docker history jboss/wildfly:latest
IMAGE CREATED CREATED BY SIZE COMMENT
35320abafdec 3 years ago /bin/sh -c #(nop) CMD ["/opt/jboss/wildfly/… 0B
3 years ago /bin/sh -c #(nop) EXPOSE 8080 0B
3 years ago /bin/sh -c #(nop) USER jboss 0B
3 years ago /bin/sh -c #(nop) ENV LAUNCH_JBOSS_IN_BACKG… 0B
3 years ago /bin/sh -c cd $HOME && curl -L -O https:… 270MB
3 years ago /bin/sh -c #(nop) USER root 0B
3 years ago /bin/sh -c #(nop) ENV JBOSS_HOME=/opt/jboss… 0B
3 years ago /bin/sh -c #(nop) ENV WILDFLY_SHA1=238e67f4… 0B
3 years ago /bin/sh -c #(nop) ENV WILDFLY_VERSION=25.0.… 0B
4 years ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/jv… 0B
4 years ago /bin/sh -c #(nop) USER jboss 0B
4 years ago /bin/sh -c yum -y install java-11-openjdk-de… 239MB
4 years ago /bin/sh -c #(nop) USER root 0B
4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
4 years ago /bin/sh -c #(nop) USER jboss 0B
4 years ago /bin/sh -c #(nop) WORKDIR /opt/jboss 0B
4 years ago /bin/sh -c groupadd -r jboss -g 1000 && user… 406kB
4 years ago /bin/sh -c yum update -y && yum -y install x… 33.5MB
4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
5 years ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
5 years ago /bin/sh -c #(nop) LABEL org.label-schema.sc… 0B
5 years ago /bin/sh -c #(nop) ADD file:61908381d3142ffba… 222MB
| parser.add_argument( | ||
| "--input-tar", | ||
| help="Path to tar file created by 'docker save'. Process tar file directly without requiring Docker daemon.", | ||
| ) |
There was a problem hiding this comment.
I think we should investigate using exclusive groups for argparse - as that has built in support for having either the --input-tar or image option and would avoid the manual checks below.
There was a problem hiding this comment.
Also - I think its valid for output-path to be the same as input-tar (?) , should, in tar mode, this be the default?
There was a problem hiding this comment.
Great ! I have the code changes.
|
|
||
| def __init__( | ||
| self, log, tar_path, from_layer=None, tmp_dir=None, tag=None, comment="" | ||
| ): |
There was a problem hiding this comment.
TarImage derives from Image (which is good) but isn't calling super. Further I think it duplicates some code from image.py (and potentially v2_image). Could there be more attempt at normalising the code to avoid duplication?
| - Works in CI/CD pipelines and restricted environments | ||
| - Supports both Docker format and OCI format images | ||
| - Maintains complete layer history compatibility | ||
| - Can process images on systems where Docker is not installed |
There was a problem hiding this comment.
I would imagine that its helpful when working with podman as well
There was a problem hiding this comment.
Absolutely! That's a great point. The --input-tar feature is indeed very helpful for Podman users.
Since Podman uses podman save to export images in the same tar format as docker save, users can now:
# Export image with Podman podman save myimage:latest -o image.tar # Squash with docker-squash (no Docker daemon required) docker-squash --input-tar image.tar --tag myimage:squashed --output-path squashed.tar # Import back to Podman podman load -i squashed.tarThis workflow is particularly valuable in environments where:
- Only Podman is available (no Docker daemon)
- Running in CI/CD pipelines with Podman
- Working in rootless containers or restricted environments
- Processing images offline without any container runtime
Should I add a Podman example to the documentation to highlight this use case?
| self.log.info("Detected Docker format image") | ||
| self.oci_format = False | ||
| else: | ||
| raise SquashError("Unable to detect image format - missing manifest files") |
There was a problem hiding this comment.
Is this duplicating v2_image::_get_manifest ?
There was a problem hiding this comment.
You're absolutely right! There is indeed duplication with v2_image::_get_manifest. Both methods:
- Check for index.json to detect OCI format
- Set self.oci_format = True/False
- Handle manifest file reading
I should refactor this to reuse the existing logic. A few options:
Option 1: Extract common logic to base class
# In Image base class def detect_image_format(self): if os.path.exists(os.path.join(self.old_image_dir, "index.json")): self.oci_format = True return "oci" elif os.path.exists(os.path.join(self.old_image_dir, "manifest.json")): self.oci_format = False return "docker" else: raise SquashError("Unable to detect image format")
Option 2: Have TarImage reuse v2_image's get_manifest
# In TarImage def detect_image_format(self): try: # This will set self.oci_format as a side effect self.manifest = self.get_manifest() # Inherit from v2_image logic except SquashError: raise SquashError("Unable to detect image format")
I lean toward Option 1 as it's cleaner separation of concerns. What do you think?
| self.log.info( | ||
| "💡 Tip: Consider using --tag to specify a name for your squashed image" | ||
| ) | ||
| self.log.info(" Example: --tag myimage:squashed") |
There was a problem hiding this comment.
Does a tag make sense for an output tar? It is probably of only relevance if --load-image has been specified?
There was a problem hiding this comment.
I respectfully disagree with this assessment. The --tag parameter is meaningful for output tar files regardless of the --load-image setting, here's why:
Tag is part of image metadata in tar format:
- Docker/Podman tar format stores tags in manifest.json under RepoTags field
- This metadata becomes part of the squashed tar file
Tag is useful in all scenarios:
- --load-image true: Image gets loaded with the specified tag
- --load-image false + --output-path: The output tar contains tag metadata, so when someone later runs docker load -i squashed.tar, the image will have the proper tag
- Distribution: Tagged tar files are more useful when shared with others
Without --tag, the consequences are significant:
# Without tag - image loads but has no name $ docker load -i squashed.tar Loaded image ID: sha256:abc123... $ docker images REPOSITORY TAG IMAGE ID <none> <none> sha256:abc123... # Hard to identify! # With tag - much more usable $ docker load -i squashed.tar Loaded image: myapp:squashed $ docker images REPOSITORY TAG IMAGE ID myapp squashed sha256:abc123... # Clear identificationThe tip message encourages good practices for tar-based workflows, not just --load-image scenarios. The tag becomes part of the portable tar artifact.
|
@lyon-v Did you wish to discuss any of the comments? |
|
sir, my apologies for the slow response. I've been swamped with work lately, but I'll reply to or fix these issues shortly. |
Enable Docker Daemon-Free Image Squashing
This PR directly addresses and resolves Issue #24: "Make it possible to run squashing without accessing Docker daemon" by introducing the ability for
docker-squashto directly process Docker images from tar files. This eliminates the need for a running Docker daemon, significantly enhancing flexibility for image optimization in CI/CD pipelines, air-gapped environments, and systems without Docker installed.Key Benefits
How to Use
Export the Image:
Squash from Tar:
--input-tarfor your source image file.--tagis recommended for the new image name.--output-pathspecifies where to save the squashed tar file.--load-image falseprevents the tool from attempting to load the image directly into a Docker daemon.Load into Docker (Optional):
This enhancement significantly broadens
docker-squash's utility, making image size optimization more accessible across diverse development and deployment scenarios.