From 8f1505a08ec3fc80e9e6f3b7649c410e9c92e7d3 Mon Sep 17 00:00:00 2001
From: Luke Craig <lacraig3@gmail.com>
Date: Wed, 10 Jun 2026 20:50:58 -0400
Subject: [PATCH] docs: drop stale llm_knowledge_base + its dead OpenAI
 consumer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

docs/llm_knowledge_base/{playbook,plugins}.md had drifted out of sync with the
tool and are actively misleading to an LLM agent reading them:
- wrong output filenames: console.txt/env_missing.txt/pseudofiles_failures.txt
  /pseudofiles_modeled.txt and base/fs.tar — the real files are console.log,
  env_missing.yaml, pseudofiles_failures.yaml, pseudofiles_modeled.yaml,
  base/fs.tar.gz.
- references makeuboot.py, which does not exist in the tree.
- describes per-change "tools" (add_pseudofile, environment_variable) that don't
  exist — Penguin is driven by YAML config edits + `penguin run`, not a tool API.

Its only consumer was src/penguin/llm.py (an OpenAI gpt-4o Assistants/vector-store
integration) via GraphSearch.select_best_config_llm(). That path is dead and
broken: the sole caller is commented out (graphs.py), upload_knowledge_files()
references an undefined self.KNOWLEDGE_DIR, it uses deprecated openai.beta APIs,
and `openai` isn't a declared dependency. Removing it with the docs.

- rm docs/llm_knowledge_base/
- rm src/penguin/llm.py
- graphs.py: drop the `from . import llm` import, the dead select_best_config_llm
  method, and the commented-out caller. (UUID import retained; still used.)

The maintained agent-facing guidance now lives in the penguin-pilot context pack.
---
 docs/llm_knowledge_base/playbook.md | 324 ----------------------------
 docs/llm_knowledge_base/plugins.md  | 136 ------------
 src/penguin/graphs.py               |  49 -----
 src/penguin/llm.py                  | 138 ------------
 4 files changed, 647 deletions(-)
 delete mode 100644 docs/llm_knowledge_base/playbook.md
 delete mode 100644 docs/llm_knowledge_base/plugins.md
 delete mode 100644 src/penguin/llm.py

diff --git a/docs/llm_knowledge_base/playbook.md b/docs/llm_knowledge_base/playbook.md
deleted file mode 100644
index 4d95efe22..000000000
--- a/docs/llm_knowledge_base/playbook.md
+++ /dev/null
@@ -1,324 +0,0 @@
-# Penguin Playbook
-
-As you go through your rehosting loop, editing configs and seeing what happens when
-they're run, you're trying to mitigate observed failures and improve system health.
-There are three key choices you'll want to focus on at the start:
-
-* init program selection
-* pseudofile modeling
-* kernel environment variables
-
-## Init program selection
-In a Linux-based system, the __init program__ is the first program (script or binary)
-run by the kernel. This program is responsible for starting all other programs on
-the system. If the wrong program is selected, it might crash, error, or even run
-successfully and exit. An init program should never do any of these things: you want
-your init program to run until the system shuts down.
-
-**Penguin configs set the init program in the `env` section as the `igloo_init` field.**
-
-Your initial rehosting configuration is automatically populated with a 
-potentially-correct init binary. However this may be incorrect and you might
-want to change it.
-
-### When to change init
-If you firmware kernel panics when init exits with a code of 0, you probably have
-the wrong init binary. In this case towards the end of your `console.txt` you'll
-see a line like this:
-```
-[   63.581671] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 
-```
-
-If you instead see a similar line with a non-zero exit code, this could indicate
-that you have the wrong init selected or that something else is going wrong causing
-the correct init program to crash. You could either try another init or try
-tracking down other failures and seeing what changes.
-
-
-### Potential init programs to choose from
-Penguin's initial static analysis populates `<project_dir>/base` with a file
-`env.txt`. Within this file, there will typically be one or more
-statically-identified init programs, listed under `igloo_init`.
-Note that this list isn't comprehensive (it's just finding executables that contain
-`start` or `init`), but it will usually find the right binary.
-
-### How to change init:
-In your config file, change the `igloo_init` key under the `env` section:
-
-```yaml
-env:
-  igloo_init: /your/desired/init
-```
-
-## Pseudofile modeling
-Unlike regular files stored on disk, files in `/proc` (procfs) and `/dev` (devtmpfs)
-aren't really a part of your filesystem. Instead these files just a way for user
-space applications and the kernel to communicate. Through this interface applications
-can learn about the hardware state of a system and interact with attached peripherals.
-
-In the rehosting context, many changes will be visible here. Many hardware peripherals
-a system expects to run with will no longer be present or, if they are present,
-behave differently than expected. While many of these changes are acceptable, some
-will be fatal and must be handled in your rehosting config.
-
-### When to create pseudofile models
-If you see errors in your `console.txt` about missing devices or it seems like
-services are crashing or failing to start, pseudofiles are a good place to begin
-making changes.
-
-Example error: Failed to open /dev/example: No such file or directory
-Solution: here you could invoke your add_pseudofile tool to address this issue.
-
-### Potential pseudofiles to add
-After each run, the `pseudofiles` plugin will populate the `pseudofiles_failures.txt`
-file in the output directory. The first layer of keys show the filenames that programs
-tried and failed to access during execution. Within each of those values, you'll get
-the name of the interaction as well as a count of the number of times it was attempted.
-
-Note that you don't generally need to model every pseudofile you see here. These models
-are generally low quality shims just trying to get a program to stop crashing. But if 
-applications are behaving correctly when a pseudofile is missing, you may be better off
-by leaving that alone instead of creating a model and trying to make it correct.
-
-First, you'll see failed accesses to a pseudofile and add this file into your
-config. Then, you can run with the new config and see if and how applications try to
-interact with the newly added pseudofile.
-
-### Modeling pseudofiles
-Beyond allowing you to add pseudofiles into a system, penguin also allows
-you to specify how `read`s, `write`s, and `ioctl`s of these files should be modeled.
-
-After adding a pseudofile to a config and running it, you might see
-guest applications try to interact with this newly created psueodfile. The
-`pseudofiles` plugin will collect the details of these accesses in the `pseudofiles_modeled.txt`.
-
-In this file, you'll see keys of device paths with a list of interactions that
-were modeled on that device. You'll see details for `read`s, `write`s, and `ioctl`s.
-
-Each of these three behaviors can be modeled in various ways. By default, the `default`
-model is used. The [docs/schema_doc.md](docs/schema_doc.md) contains auto-generated 
-documentation for these fields which will always be the most up to date. However, the
-following descriptions may be of value.
-
-#### Read modeling
-
-**default**: Return an empty string with return value `-EINVAL`.
-
-**const_buf**: Given some string in `val` model the file as containing that data.
-
-**const_map**: Given a list of constants, model the file as a large buffer with just the specified values set.
-Arguments: `size` for total size, `pad` for what to place between values (byte value as int or a character).
-Provide a dictionary in `vals` with keys as an integer offset into the buffer and values as strings or lists of byte values.
-
-**const_map_file**: like const\_map, but adds a `filename` field. Will create this file on your host (container) if it doesn't exist with contents based on the const\_map details and then read from the host file.
-
-**from_file**: Given a host (container) file path in `filename` read from that file
-
-#### Write modeling
-
-**default**: Return value `-EINVAL`
-
-**discard**: Do nothing with the value and return as if the write was successful.
-
-**to_file**: Given a host (container) file path in `filename` write to that file
-
-#### IOCTL modeling
-IOCTLs have a command number and each command can be modeled distinctly. A wildcard `*` can be used as a command number to indicate that all other ioctls should be modeled in a given way.
-
-**default** Return `-ENOTTY`
-
-**return_const**: Return the specified `val`
-
-**return_symex**: Coming soon.
-
-### How to model pseudofiles:
-In your config file, you'll insert new keys udner `pseudofiles` for each file you want to model. By specifying a key (which must start with `/dev/` or `/proc/`), you'll
-change the system so that a pseudofile is present at the specified location. If this is
-all you wish to do, you'll specify the key as having a value of `{}`.
-Otherwise, if you'd like to model the behavior of the pseudofile, you'll add one or more subkeys of `read`, `write,` and `ioctl` and specify the model details.
-
-To just add `/dev/example` into the filesystem:
-
-```
-pseudofiles:
-  /dev/example: {}
-```
-
-To actually model what happens once this device is accessed we could expand this so
-
-* reads return the string hello world, 
-* writes appear to work but actually do nothing, and
-* IOCTLs all return 0
-
-
-```
-pseudofiles:
-  /dev/example:
-    read:
-        model: const_buf
-        val: "hello world"
-    write:
-        model: discard
-    ioctls:
-        '*':
-            model: return_const
-            val: 0
-```
-
-
-## Kernel environment variables
-Before the Linux kernel is launched, a bootloader typically sets up a system with some
-initial state. There are two key sets of environment variables we may care about,
-the first is much more common than the second.
-
-**Linux kernel boot arguments**: these are arguments passed to the Linux kernel at
-boot time. These control things such as where the root filesystem is and there
-serial console configuration. A system's bootloader may be configured to pass
-nonstandard arguments through these arguments.
-
-The init program will be given many of these values in its environment while regular applications may examine these arguemnts by reading from `/proc/cmdline`.
-
-**U-Boot Environment**: If the `U-Boot` bootloader is used, it may have its own set
-of environment variables that can be passed through to the Linux kernel through
-one of the `/dev/mtdX` devices (where X is a number).
-These are typically stored with a hash followed by
-null-terminated key=value pairs: `[crc32][key=value]\0[key=value]...`.
-
-A mapping of MTD device names to the corresponding `/dev/mtdX` is available
-at `/proc/mtd`. Applications may hardcode which MTD devices they try to
-read from or dynamically search `/proc/mtd` to find which device a
-given name corresponds to.
-
-Typically a custom binary is used to access U-Boot environments with
-support for getting and setting keys. These are often based off the
-open source [fw_env](https://github.com/ARM-software/u-boot/blob/master/tools/env/fw_env.c) program.
-
-### When to set boot arguments
-After running a configuration, examine `env_missing.txt` to find a list of
-environment variables that were searched for in `/proc/cmdline`.
-
-If you encounter the following error message in console.txt:
-"<var_name> not found", dross-check with 'env_missing.txt' to see if this is an 
-environment variable that is missing. If so, you can add it using you environment_variable tool.
-
-If you're unsure of what values you might want to set a key to,
-try the magic value `DYNVALDYNVALDYNVAL` in your config and run again.
-Then examine the generated `env_cmp.txt` output file
-which will report strings that this magic value was compared against.
-
-Alteratively, you might want to search through the filesystem to identify which
-binaries or scripts are parsing this environment variable and reverse engineer them
-to determine a good value. For identifying such binaries, extract the
-`<project_dir>/base/fs.tar` file and then `grep` through all files. For example,
-to find programs that reference `myvar` you could do:
-
-```
-# Extract filesystem
-mkdir /tmp/fs
-tar xvf base/fs.tar -C /tmp/fs
-
-# Find binaries and scripts that reference `myvar`
-grep --binary 'myvar' /tmp/fs'
-```
-
-Matching binaries can be analyzed with a tool like Ghidra while scripts can be
-analyzed with any text editor.
-
-### How to set boot arguments (environment variables):
-In your config file add new values into the `env` section as key-value pairs.
-
-```
-env:
-  my_env_name: my_value
-```
-
-### When to set U-Boot Environment variables
-
-**WARNING: this interface is subject to change and the documentation may be outdated**
-
-If you see output in `env_mtd.txt`, penguin detected an application searching
-for an MTD device with a specified name. When you see this, you may wish to add
-a new MTD device by adding an `mtdparts` env variable specifying values for the `0.flash` device. After this device name, you'll craft a comma-seperated list of `0xsize(name)` values. Your sizes should be multiples of 0x4000.
-
-```
-env:
-    mtdparts: 0.flash:0x4000(yourname),0x8000(anothername)
-```
-
-After adding such an entry, the guest will be configured to have new MTD devices named `yourname` and `anothername` and new `/dev/mtdX` files will also be created.
-
-Generally the values of `X` should correspond to the order in which you've specified these devices (e.g., `yourname` is `/dev/mtd0`, `anothername` is `/dev/mtd1`).
-You can confirm this by connecting to the root shell and examining `/proc/mtd`
-which will list the mapping from name to device file.
-
-If you'd like to then control the contents of that mtd device, use the `pseudofiles`
-plugin. If you'd like to create a valid u-boot environment with arbitrary key-value pairs, check out the `makeuboot.py` script.
-
-Alternatively, if you add the variable `MTD_PLACEHOLDER` and set it to 1 in your config's `env`, penguin will automatically set up `/dev/mtdX` for all X in 1 to 10 with a placeholder value. When running in this mode, penguin will analyze accesses to these
-devices and track variable names searched for. These names will be logged in the output
-file `env_uboot.txt`
-
-After setting these values, you'll need to customize `makeuboot.py` to generate a
-valid uboot key-value store then customize your `pseudofile` config to pass this
-file through on reads of the relevant device. Bringing this together, you might
-create the file `/results/mtd.flash` (abusing the shared `results` directory to share
-something that isn't a result) with `makeuboot.py` and then pass it through
-to your firmware with a config with elements like this:
-
-```yaml
-env:
-    mtdparts: 0.flash:0x4000(flash)
-
-pseudofiles:
-    /dev/mtd0:
-        read:
-            model: from_file
-            filename: /results/mtd.flash
-        write:
-            model: to_file
-            filename: /results/mtd.flash
-        ioctl:
-            '*':
-                model: return_const
-                val: 0
-```
-
-## Advanced debugging
-If you've tried selecting the correct init program, modeling psueodfiles, and
-adding environment variables but things are still failing, you'll need to
-try some more involved debugging.
-
-First examine the console output for error messages that relate to processes
-being killed, missing files, bad arguments, and so on. Examine scripts and binaries
-as necessary.
-
-Next examine output from other plugins. The `shell_cov_trace.csv` script may
-be particularly useful as it shows each line from shell scripts executed in the
-order they were run with concrete values listed along with each variable.
-For example if a script `foo.sh` has a 10th line of  `if [ -e $myfile ]; then` and the `myfile` variable was set to `/root/myfile`, the log would show this as:
-
-```
-foo.sh:10,if [ -e $(myfile=>/root/myfile)]
-```
-
-Next enable the root shell by changing your config's `base` section's `root_shell`
-value to be `true`. Then run your target and connect with `telnet as described
-in the Penguin output to get a root shell.
-
-After launching this, press enter a few times and perhaps wait ~10s.
-You should then get a root prompt and be able to run shell commands.
-
-From this shell you can try running `strace` on various guest
-applications to see how they behave dynamically.
-You can examine running processes with `ps aux` and then connect
-strace to a running process with `strace -p [PID]` 
-
-If your guest is kernel panicking and shutting down or you'd just like
-more explicit control of what's being run, you can change your config
-to skip running the right init program and instead just launch a shell
-that doesn't exit by setting:
-
-```yaml
-env:
-    igloo_init: /igloo/utils/sh
-```
\ No newline at end of file
diff --git a/docs/llm_knowledge_base/plugins.md b/docs/llm_knowledge_base/plugins.md
deleted file mode 100644
index 72faab1b7..000000000
--- a/docs/llm_knowledge_base/plugins.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# Penguin Plugins
-The following penguin plugins are currently supported. Each is documented below.
-* [Coverage](#coverage): Track block-level coverage of binaries
-* [Env](#env): Track usage of boot arguments, environment variable accesses, and environment variable comparisons.
-* [Health](#health): Track system health metrics including processes run.
-* [Interfaces](#interfaces): Track network interfaces referenced
-* [Lifeguard](#lifeguard): Track and block signals
-* [Mounts](#mounts): Track attempts to mount file systems
-* [NVRAM2](#nvram2): Tracks accesses to NVRAM
-* [Netbinds](#netbinds): Track network listening guest processes
-* [Nmap](#nmap): Network scanning for guest applications that bind to TCP ports
-* [Pseudofiles](#pseudofiles): Model and monitor interactions to devices in `/dev` `/proc` and `/sys`
-* [Shell](#shell): Track behavior of shell scripts including lines executed
-* [VPNguin](#vpnguin): Bridge network connections to networked guest processes
-* [Zap](#zap): **Currently disabled** Network scanning of guest web applications
-
-## Coverage
-This plugin tracks the module and offset block level coverage of all binaries
-in the system. These results are reported in `coverage.csv`.
-The file `coverage_tree.csv` stores this information with parent/child
-relationships to visualize as a tree. The file `coverage_transitions.csv`
-records all context switches between processes.
-
-## Env
-The `env` plugin dynamically tracks linux environment variables accessed through
-`/proc/cmdline` and calls to `getenv`. It also tracks accesses to `/proc/mtd`
-as well as `/dev/mtdX` to identify accesses to u-boot environment variables.
-
-If an env value is set to the magic string `DYNVALDYNVALDYNVAL` a dynamic analysis
-to detect comparisons between this magic string and any other string will be enabled.
-The results of this analysis will be stored in `env_cmp_py.txt`. On the next run, set
-the environment variable to the first of these concrete values.
-
-In a config file, a user may add key-value pairs into the `env` filed to set new
-values into the linux environment. Note that a number of required internal variables
-(e.g., `root=/dev/vda`) will added to the system's arguments _after_ any arguments you specify here.
-
-## Health
-The `health` plugin tracks the system health over time. `health.csv` tracks counts
-of various behaviors of interest over time while `health_final.yaml` just reports
-these values at the end of execution.
-
-The plugin also creates `health_procs.txt` as a sorted list of processes run and
-`health_procs_with_args.txt` as a sorted list of processes with their arguments.
-
-## Interfaces
-Track network interfaces referenced in executed commands. Results are
-reported in `iface.log`.
-
-## Lifeguard
-Track and block signals sent between processes. Results are stored in
-`lifeguard.csv`.
-
-Lifeguard suppresses configured signals sent through supported signal-send
-syscalls by skipping the syscall before the kernel sends the signal. Supported
-syscalls are defined in one table in the plugin and resolved through the syscall
-prototype argument helpers supplied by the driver, falling back to fixed indexes
-only if the prototype names are unavailable. It also consumes `signal_monitor`
-for configured signals other than `SIGKILL` and `SIGSTOP`, allowing delivery
-paths to be observed and dropped when the driver hook is effective.
-
-Delivery-time drops are not equivalent to preventing a signal from being sent.
-The driver hooks the kernel dequeue path, so it can drop catchable signal
-deliveries after the kernel has selected a target. Default-fatal signals that do
-not have a userspace handler may start process or thread-group exit before that
-dequeue path is reached. For process-preserving behavior, the preferred path is
-still to block the sending syscall before the kernel applies signal semantics.
-`SIGKILL` and `SIGSTOP` cannot be caught or ignored by Linux processes, so
-Lifeguard treats them as syscall-only and only suppresses instances generated by
-supported signal-send syscalls.
-
-The syscall path covers the supported signal-send syscalls currently listed by
-Lifeguard: `kill`, `tkill`, `tgkill`, `rt_sigqueueinfo`,
-`rt_tgsigqueueinfo`, and `pidfd_send_signal`. Signals sent by other kernel paths
-or unsupported syscalls may be visible through delivery monitoring when the
-driver hooks them, but the syscall fallback will not prevent them before send
-time.
-
-Some signals also require guest state repair when dropped. For example, blindly
-dropping synchronous fault signals such as `SIGILL`, `SIGSEGV`, `SIGBUS`,
-`SIGFPE`, `SIGTRAP`, or `SIGSYS` can immediately re-enter the same faulting
-instruction. Use a purpose-built `signal_monitor` consumer, such as
-`sigill_bypass.py`, when the handler needs to advance the PC, emulate an
-instruction, or otherwise repair guest state.
-
-## Mounts
-Track which file systems are mounted (or attempted to be mounted) at which paths.
-Results stored in `mounts.csv`. Note this plugin will track some penguin-internal
-initialization logic with mounts in the `/igloo` directory.
-
-## NVRAM2
-This plugin tracks accesses to keys and values stored in NVRAM. Results
-are stored in `nvram.csv`
-
-## Netbinds
-This plugin detects and logs network binds by guest processes. The results
-are logged into `netbinds.csv` and include a `time` column indicating how
-many seconds after boot until the bind occurred.
-
-## Nmap
-This plugin runs nmap scans on all network-listening services.
-It depends on the VPN plugin to establish network connections to guest services.
-Logs are written to `nmap_{protocol}_{port}.log`
-
-## Pseudofiles
-This plugin tracks accesses and interactions with files in `/dev/` and `/proc/`.
-In `pseudofiles_failures.yaml` details of failed interactions are reported.
-
-Users can add pseudofiles and configure models for reads, writes, and IOCTLs on
-these files by adding entries into the `pseudofiles` config section.
-
-## Shell
-This plugin tracks the behavior of shell scripts, capturing coverage in `shell_cov.csv`, environment variable values in `shell_env.csv` and a combined trace in `shell_cov_trace.csv`.
-
-## VPNguin
-This plugin detects network binds and configures a custom VPN to bridge
-network connections to guest services. The mappings between guest
-network services and what port the VPN exposes them on are listed
-in `vpn_bridges.csv` For example, if the file contains:
-
-```
-procname,ipvn,domain,guest_ip,guest_port,host_port
-lighttpd,ipv4,tcp,127.0.0.1,80,80
-lighttpd,ipv4,tcp,192.168.0.1,80,48823
-```
-
-This means `lighttpd` started listening on port 80 on the loopback interface as well as another IP address.
-To talk to the service as if you were connecting via loopback, you'd connect to the relevant `host_port`, here 80.
-To talk to the service as if you were connecting via the other IP address, you'd connect to the other `host_port`, here 48823.
-Note these are ports within your container, not on your host, so you must connect to the appropriate IP address to reach
-the container.
-
-## ZAP
-**Currently disabled**
-This plugin runs the [zap web application scanner](https://github.com/zaproxy/) to crawl and interact with guest
-web applications listening on TCP port 80. Logs are written to `zap.log` and `zap_tcp_80.log`.
diff --git a/src/penguin/graphs.py b/src/penguin/graphs.py
index 99709b116..ec56da247 100644
--- a/src/penguin/graphs.py
+++ b/src/penguin/graphs.py
@@ -7,8 +7,6 @@
 import networkx as nx
 from pyvis.network import Network
 
-from . import llm
-
 
 def get_global_mitigation_weight(mitigation_type: str) -> float:
     """
@@ -1369,13 +1367,6 @@ def run_exploration_cycle(
 
         with self.lock:
             config_to_run, weight = self.select_best_config()
-            # XXX: work in progress to improve this selection
-            # uncomment below for llm-based selection
-            # config2, weight2 = self.select_best_config_llm()
-            # if config_to_run != config2:
-            #     print(f'=== DIFFERENCE ===')
-            #     print(f'config1: {config_to_run}, score: {weight}')
-            #     print(f'config2: {config2}, score: {weight2}')
 
             if config_to_run:
                 self.pending_runs.add(config_to_run)
@@ -1398,46 +1389,6 @@ def run_exploration_cycle(
             self.pending_runs.remove(config_to_run)
         return config_to_run
 
-    def select_best_config_llm(self) -> Tuple[Optional[Configuration], float]:
-        """
-        TODO
-        """
-        print('===== LLM Finding Best Config to Run =====')
-        target_configs = []
-        unexplored_configs = self.graph.find_unexplored_configurations()
-
-        # TODO: for each config, create llm assistant, upload files, and ask to summarize failures
-        # store this inside the configuration object (so we dont reduntantly do this)
-        for config in unexplored_configs:
-            if config not in self.pending_runs:
-                target_configs.append((config, self.graph.calculate_expected_config_health(config)))
-
-        # if not len(target_configs):
-        #     # Nothing to do. Other threads are working or we're all out of work
-        #     return None, 0
-
-        # # Now we have a list of (health, config) tuples. Sort by health
-        # results = sorted(target_configs, key=lambda x: x[0], reverse=True)
-
-        # TODO: for each unexplored config, get its summary and provide to the final LLM: select_best_config
-
-        graph = self.stringify_state()
-        if "unexplored" not in graph:
-            return (None, 0)
-
-        uid = llm.select_best_config(graph, llm.PROMPTS["config_graph"])
-        if uid == 'None':
-            return (None, 0)
-
-        uid_obj = UUID(uid)
-        try:
-            best_config = self.graph.get_node(uid_obj)
-        except ValueError as e:
-            print(e)
-            return (None, 0)
-
-        return (best_config, self.graph.calculate_expected_config_health(best_config))
-
     def select_best_config(self) -> Tuple[Optional[Configuration], float]:
         """
         First try finding an un-run+non-pending config that's derived from a mitigation
diff --git a/src/penguin/llm.py b/src/penguin/llm.py
deleted file mode 100644
index 5a626f90e..000000000
--- a/src/penguin/llm.py
+++ /dev/null
@@ -1,138 +0,0 @@
-import os
-import openai
-from typing import List, Optional
-
-GPT_MODEL = "gpt-4o"
-KNOWLEDGE_DIR = '/docs/llm_knowledge_base'
-openai.api_key = os.getenv("OPENAI_API_KEY")
-
-PROMPTS = {
-    "config_graph": """Here is a configuration graph. Choose the best, unexplored config to run next. Simply return the UID string and nothing else. If no UID is present in the graph, return 'None'"""
-}
-
-
-class AssistantManager:
-    """
-    A class to manage OpenAI assistants, threads, and vector stores.
-    """
-
-    def __init__(self):
-        """
-        Initialize the AssistantManager with empty client, assistant, thread, and vector_store.
-        """
-        self.client: Optional[openai.OpenAI] = None
-        self.assistant: Optional[openai.types.Assistant] = None
-        self.thread: Optional[openai.types.Thread] = None
-        self.vector_store: Optional[openai.types.VectorStore] = None
-
-    def exists_client(self) -> bool:
-        """
-        Check if the OpenAI client exists.
-
-        Returns:
-            bool: True if the client exists, False otherwise.
-        """
-        return self.client is not None
-
-    def exists_assistant(self) -> bool:
-        """
-        Check if the assistant exists.
-
-        Returns:
-            bool: True if the assistant exists, False otherwise.
-        """
-        return self.assistant is not None
-
-    def create_assistant(self, name: str, instructions: str, tools: Optional[List[dict]] = None, model: str = GPT_MODEL):
-        """
-        Create an assistant if it doesn't already exist.
-
-        Args:
-            name (str): The name of the assistant.
-            instructions (str): Instructions for the assistant.
-            tools (Optional[List[dict]]): List of tools for the assistant.
-            model (str): The model to use for the assistant.
-        """
-        if self.exists_assistant():
-            return
-
-        self.assistant = self.client.beta.assistants.create(
-            name=name,
-            instructions=instructions,
-            tools=tools,
-            model=model,
-        )
-
-    def create_run(self, prompt: str) -> str:
-        """
-        Create a new thread, add a message, and start a run.
-
-        Args:
-            prompt (str): The prompt to send to the assistant.
-
-        Returns:
-            str: The ID of the created run.
-        """
-        self.thread = self.client.beta.threads.create()
-        self.client.beta.threads.messages.create(thread_id=self.thread.id, role="user", content=prompt)
-        run = self.client.beta.threads.runs.create_and_poll(
-            thread_id=self.thread.id,
-            assistant_id=self.assistant.id,
-        )
-        len_msgs = len(list(self.client.beta.threads.messages.list(self.thread.id)))
-        msg = self.client.beta.threads.messages.list(self.thread.id, run_id=run.id).data[0].content[0].text.value
-        print(f'===== MESSAGES [{len_msgs}] =====\n{msg}\n')
-        return run.id
-
-    def upload_knowledge_files(self):
-        """
-        Upload knowledge files to the vector store and update the assistant.
-        """
-        print('===== Knowledge Files =====')
-        file_paths = [os.path.join(self.KNOWLEDGE_DIR, fn) for fn in os.listdir(self.KNOWLEDGE_DIR)]
-        for fp in file_paths:
-            print(f'[FILE] {fp}')
-
-        file_streams = [open(path, 'rb') for path in file_paths]
-        self.vector_store = self.client.beta.vector_stores.create(name="Penguin Tool Documentation")
-        file_batch = self.client.beta.vector_stores.file_batches.upload_and_poll(
-            vector_store_id=self.vector_store.id,
-            files=file_streams
-        )
-        print(f'\tUPLOAD STATUS: {file_batch.status} => {file_batch.file_counts}\n')
-        # XXX API bug, status shows that upload failed, but openai playground displays files correctly
-
-        self.assistant = self.client.beta.assistants.update(
-            assistant_id=self.assistant.id,
-            tool_resources={"file_search": {"vector_store_ids": [self.vector_store.id]}},
-        )
-
-    def select_best_config(self, graph: str, prompt: str) -> str:
-        """
-        Select the best configuration based on the given graph and prompt.
-
-        Args:
-            graph (str): The graph to use for configuration selection.
-            prompt (str): The prompt to send to the assistant.
-
-        Returns:
-            str: The selected configuration as a string.
-        """
-        if not self.exists_client():
-            self.client = openai.OpenAI()
-
-        if not self.exists_assistant():
-            llm_name = "llm_rehoster"
-            instructions = ""
-            self.create_assistant(
-                name=llm_name,
-                instructions=instructions,
-                tools=[{"type": "file_search"}]
-            )
-            self.upload_knowledge_files()
-
-        full_prompt = f'{prompt}\n\n{graph}'
-        print(f'===== PROMPT =====\n{full_prompt}\n')
-        run_id = self.create_run(full_prompt)
-        uid_str = self.client.beta.threads.messages.list(self.thread.id, run_id=run_id).data[0].content[0].text.value
-        return uid_str