Overview of env
env is used for several things right now:
- arguments for the kernel (setting features)
- arguments for penguin init (setting features)
- environment variables passed to programs
- Adjusts output of
/proc/cmdline - which is parsed by some systems
All of these values are passed as key/value pairs on the kernel cmdline:
|
# XXX HACK: normally panda args are set at the constructor. But we want to load |
|
# our plugins first and these need a handle to panda. So after we've constructed |
|
# our panda object, we'll directly insert our args into panda.panda_args in |
|
# the string entry after the "-append" argument which is a string list of |
|
# the kernel append args. We put our values at the start of this list |
|
|
|
# Find the argument after '-append' in the list and re-render it based on updated env |
|
append_idx = panda.panda_args.index("-append") + 1 |
|
|
|
config_args = [ |
|
f"{k}" + (f"={v}" if v is not None else "") for k, v in conf["env"].items() |
|
] |
|
|
The timeline looks like:
On boot, some parameters are handled by kernel param parsers and dropped moving forward. (e.g. console=ttyS0)
Our igloobase module has some of these as well (dropped at this point): https://github.com/rehosting/linux/blob/996b6807416e345bd242774ef7808c63e274b524/drivers/igloobase/igloo_args.c#L17-L29
Next, any values that have not been consumed by an earlier step are passed to the penguin init process as an environment variable.
In our init process we use and then unset some of these values:
|
if [ ! -z "${SHARED_DIR}" ]; then |
|
unset SHARED_DIR |
everything else in env that does not otherwise get matched to one of the previous cases is passed along to the new init task as an environment variable.
Is your feature request related to a problem? Please describe.
There are several problems with this approach:
- limits on kernel cmdline size - this depends on architecture, but often times these limits are 1024 bytes of 4K bytes. Running into this issue is very hard to debug.
- variable shadowing - if something happens to match a previous param it won't get passed through (e.g. impossible to set console=X for env)
- confusing stages/differentiation
Describe the solution you'd like
I'd like us to make all these applications distinct.
-
Non-IGLOO kernel arguments
These should be allowed to be specified in our config. They need to be able to provide non key=value format (e.g. nokaslr). They also need to handle our default arguments and have warnings around them.
-
IGLOO kernel arguments
Values like igloo_task_size have no reason to be specified on the cmdline when we have better ways of passing information through our kernel.
I'd like us to add a hypercall on module insertion that will gather all our configuration values for the driver and igloobase and set them accordingly.
These would likely appear in the config as arguments to igloomodule.
- IGLOO init variables
Similarly, these values need not be passed as environment variables.
A reasonable model here could involve a program making a hypercall for either individual or overall config settings for IGLOO.
This would likely save the igloo boot config settings so that they could be better understood and looked up at runtime.
- Environment variables
env should exclusively contain values explicitly understood to be passed to userland programs.
Some of these (like LD_PRELOAD) can be set by penguin to pass through values. We should make sure we deconflict with userland.
A simple implementation here would make these part of the igloo configuration that the init process looks up. These would be set before calling init.
Another more complex, but potentially advantageous approach here could be to modify the kernel to add/require these variables to the end of env as processes are starting. This would help us if a particular program ever decided to unset one of our values.
Overview of
envenvis used for several things right now:/proc/cmdline- which is parsed by some systemsAll of these values are passed as key/value pairs on the kernel cmdline:
penguin/src/penguin/penguin_run.py
Lines 476 to 488 in e8d1008
The timeline looks like:
On boot, some parameters are handled by kernel param parsers and dropped moving forward. (e.g.
console=ttyS0)Our igloobase module has some of these as well (dropped at this point): https://github.com/rehosting/linux/blob/996b6807416e345bd242774ef7808c63e274b524/drivers/igloobase/igloo_args.c#L17-L29
Next, any values that have not been consumed by an earlier step are passed to the penguin init process as an environment variable.
In our init process we use and then unset some of these values:
penguin/src/resources/source.d/40_mount_shared_dir.sh
Lines 1 to 2 in e8d1008
everything else in
envthat does not otherwise get matched to one of the previous cases is passed along to the new init task as an environment variable.Is your feature request related to a problem? Please describe.
There are several problems with this approach:
Describe the solution you'd like
I'd like us to make all these applications distinct.
Non-IGLOO kernel arguments
These should be allowed to be specified in our config. They need to be able to provide non key=value format (e.g. nokaslr). They also need to handle our default arguments and have warnings around them.
IGLOO kernel arguments
Values like
igloo_task_sizehave no reason to be specified on the cmdline when we have better ways of passing information through our kernel.I'd like us to add a hypercall on module insertion that will gather all our configuration values for the driver and igloobase and set them accordingly.
These would likely appear in the config as arguments to igloomodule.
Similarly, these values need not be passed as environment variables.
A reasonable model here could involve a program making a hypercall for either individual or overall config settings for IGLOO.
This would likely save the igloo boot config settings so that they could be better understood and looked up at runtime.
envshould exclusively contain values explicitly understood to be passed to userland programs.Some of these (like LD_PRELOAD) can be set by penguin to pass through values. We should make sure we deconflict with userland.
A simple implementation here would make these part of the igloo configuration that the init process looks up. These would be set before calling init.
Another more complex, but potentially advantageous approach here could be to modify the kernel to add/require these variables to the end of
envas processes are starting. This would help us if a particular program ever decided to unset one of our values.