Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 3 additions & 194 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,198 +1,7 @@
# aes-stream-drivers

**[Documentation](https://slaclab.github.io/aes-stream-drivers)** | [DOE Code](https://www.osti.gov/doecode/biblio/8043)
Common repository for streaming kernel drivers (datadev, gpuDirect, Yocto, etc).

Common repository for streaming kernel drivers (datadev, gpuDirect, Yocto, etc)
See the **[full documentation](https://slaclab.github.io/aes-stream-drivers)** for build instructions, GPUDirect setup, Yocto integration, CI testing, and reference material.

<!--- ########################################################################################### -->

#### common/

Contains shared kernel and application libraries

#### data\_dev/

Contains driver and application code for TID-AIR generic DAQ PCIe cards, optionally with GPUDirect RDMA support (for use with NVIDIA GPUs)

/etc/modprobe.d/datadev.conf

options datadev cfgTxCount=1024 cfgRxCount=1024 cfgSize=131072 cfgMode=1 cfgCont=1

#### include/

Contains top level application include files for all drivers

#### rce\_hp\_buffers/

Contains driver that allocates memory blocks for use in a pure firmware dma engine

#### rce\_stream/

Contains driver and application code for the RCE AXI stream DMA

#### Yocto/

Contains BitBake recipes for the aximemorymap and axistreamdma drivers.

<!--- ########################################################################################### -->

## Local CI Testing

The repository includes a local CI runner that validates the full Phase 3 test suite without requiring `sudo` on the host. It boots a QEMU virtual machine under TCG emulation (no KVM, no `/dev/kvm` access needed), loads the `datadev_emulator` and `datadev` kernel modules inside the VM, runs the test suite, and reports pass/fail. The same test scripts run in GitHub Actions CI (`.github/workflows/ci_pipeline.yml`), so local and CI behavior are identical.

> **Host kernel must match the guest cloud-image kernel.** `run_local_ci.sh` builds the `.ko` modules on the host against `linux-headers-$(uname -r)` and then `insmod`s them inside the Ubuntu 24.04 cloud-image VM. Kernel-module `vermagic` must match, so this flow only works when the host and guest run the same kernel series (Ubuntu 24.04 cloud images ship a `6.8.x` generic kernel). If the host kernel differs — for example an Ubuntu 22.04 host on `5.15`, a Rocky/RHEL 9 host, or any system whose `uname -r` disagrees with the cloud image — use the KVM-based local runner under [`scripts/ci-local/`](scripts/ci-local/) instead. That runner installs guest-matching headers inside the VM and builds there, eliminating the host/guest kernel dependency (see [`scripts/LOCAL_CI_TESTING.md`](scripts/LOCAL_CI_TESTING.md)).

### Prerequisites

Ubuntu / Debian:

```bash
sudo apt-get install qemu-system-x86 qemu-utils cloud-image-utils \
build-essential linux-headers-$(uname -r)
```

RHEL / CentOS / Fedora:

```bash
sudo dnf install qemu-kvm qemu-img genisoimage make gcc gcc-c++ kernel-devel
```

Required tools:

- `qemu-system-x86_64` — QEMU full-system emulator (TCG mode, no KVM needed)
- `qemu-img` — overlay image creation
- `cloud-localds` **or** `genisoimage` **or** `mkisofs` — builds the cloud-init seed ISO
- `curl` or `wget` — one-time cloud image download
- `make`, `gcc`, `g++` — build kernel modules and test binaries
- Linux kernel headers matching your host kernel

### Usage

From the repository root:

```bash
./run_local_ci.sh
```

This will:

1. Check prerequisites (exits with guidance if anything is missing)
2. Build the `nvidia_p2p_stub` module (loaded first to satisfy the emulator's `emu_gpu_register_drain_cb` symbol dependency), the emulator kernel module, the `datadev` driver, and all test binaries (`dmaLoopTest`, `dmaRate`, `dmaIoctlTest`, `dmaFileOpsTest`, `dmaErrorTest`)
3. Download the Ubuntu 24.04 cloud image (first run only, ~600 MB, cached at `~/.cache/aes-stream-local-ci/`)
4. Boot a QEMU VM with the project directory shared via 9p virtfs at `/mnt/host` in the guest
5. Inside the VM: `insmod` the three modules in order (`nvidia_p2p_stub` → `datadev_emulator` → `datadev`), run `tests/run_tests.sh` then `tests/test_params.sh`, then unload in reverse order
6. Capture the VM exit code and print overall PASS / FAIL

Exit code `0` means all tests passed. Non-zero means at least one test failed — see the VM console output for details.

### Environment Variables

Override the defaults by exporting these before running the script:

| Variable | Default | Purpose |
|----------|---------|---------|
| `VM_MEM` | `2G` | VM memory size |
| `VM_CPUS` | `2` | VM vCPU count |
| `VM_TIMEOUT` | `600` | QEMU wall-clock timeout in seconds |
| `CLOUD_IMG_URL` | Ubuntu 24.04 cloud image URL | Change distro/version |
| `CACHE_DIR` | `~/.cache/aes-stream-local-ci` | Where the base cloud image is cached |

### Troubleshooting

- **"qemu-system-x86_64 not found"** — install QEMU (see Prerequisites above)
- **"cloud-localds not found"** — install `cloud-image-utils` (Debian/Ubuntu) or `genisoimage` (RHEL/Fedora)
- **VM boot timeout** — TCG emulation is slow, especially on first cloud-init run (~1–2 min). Raise the limit with `VM_TIMEOUT=1200 ./run_local_ci.sh`
- **Tests fail in VM but pass in GitHub Actions** — TCG-emulated throughput is much lower than native, so timing-sensitive tests may behave differently. The test suite is designed to tolerate this; report any persistent failures.
- **"VM did not record exit code"** — usually a boot failure or a timeout before cloud-init `runcmd` finished. Check the serial console output printed to your terminal.

<!--- ########################################################################################### -->

## Continuous Integration

A single unified GitHub Actions workflow runs on every `push` event:

[![CI Pipeline](https://github.com/slaclab/aes-stream-drivers/actions/workflows/ci_pipeline.yml/badge.svg)](https://github.com/slaclab/aes-stream-drivers/actions/workflows/ci_pipeline.yml)

| Workflow | Purpose | Runner Environment |
|----------|---------|--------------------|
| [`ci_pipeline.yml`](.github/workflows/ci_pipeline.yml) — **CI Pipeline** | Unified CI combining repo integration and emulator/runtime validation: documentation + lint/static checks, multi-distro kernel-module build + load + test matrix (`ubuntu:22.04`, `ubuntu:24.04`, `rockylinux:9`, `debian:experimental`, `fedora:rawhide`) for both the CPU and GPU stacks, end-to-end Phase 3 and Phase 4 test coverage against the `datadev_emulator` + `nvidia_p2p_stub` pair, `dmesg` scanning for oops/panic/BUG, DKMS tarball smoke + full-install validation, and release packaging on tagged releases | `ubuntu-24.04` hosted runner; every matrix cell executes in a containerized distro image with host kernel headers bind-mounted. The CPU/GPU load + test steps are gated by `CI_HOST_MATCH=1` so they only fire on cells whose kernel matches the host runner (passwordless `sudo` + `CAP_SYS_MODULE` for `insmod`); other cells are compile-only. |

### Single unified workflow

`ci_pipeline.yml` replaces the previously separate `aes_ci.yml` (compile matrix) and `emu_ci.yml` (emulator runtime tests). One workflow means one badge, one summary, one place to look when something is red, and no possibility of the two drifting relative to each other. Broad compile coverage across distros, static analysis / lint checks, and runtime tests that require loading kernel modules now all live in the same pipeline.

Repo maintainers can still require specific job names from the unified workflow as branch-protection gates — for example `test_and_document`, `cpu_test (ubuntu:24.04)`, and `gpu_test (ubuntu:24.04)`.

### Runtime tests share code with the local VM runner

The workflow invokes the same `tests/run_tests.sh` + `tests/test_params.sh` scripts that `./run_local_ci.sh` runs inside a QEMU VM (see **Local CI Testing** above). Behaviour in CI and the local VM is therefore identical — if a test passes locally via `./run_local_ci.sh`, it should pass in CI, and vice versa (with the caveat that TCG emulation used locally is much slower than the hosted runners, so timing-sensitive assertions are intentionally tolerant).

### How to interpret a CI failure

When `ci_pipeline.yml` reports red on a push:

1. Open the failing run and click the **Summary** tab. Each test-suite step renders a PASS/FAIL count table; failing tests are listed in a fenced block.
2. Scroll the workflow view for red **`::error::` annotations** — each failing test emits one with the test name and exit code (e.g. `run_tests.sh: error_paths failed (exit=1)`).
3. Download the `cpu-ci-diag-*` or `gpu-ci-diag-*` artifact from the run's **Artifacts** panel. It contains `dmesg.txt`, the saved test-suite logs (`/tmp/phase3_tests.log`, `/tmp/phase4_tests.log`, `/tmp/test_params.log`), any `dma_loop_output*.txt`, and the built `.ko` modules — enough to reproduce a post-mortem without re-running CI.

<!--- ########################################################################################### -->

# How to build the data\_dev driver

```bash
# Go to the base directory
$ cd aes-stream-drivers

# Build the drivers
$ make driver

# Build the applications
$ make app
```

## How to load the data\_dev driver

```bash
# Go to the base directory
$ cd aes-stream-drivers

# Load the driver for the current kernel
$ sudo insmod install/$(uname -r)/datadev.ko
```

<!--- ########################################################################################### -->

# How to use the Yocto recipes

The Yocto recipes can be trivially included in your Yocto project via symlink.

```bash
$ ln -s $aes_stream_drivers/Yocto/recipes-kernel $myproject/sources/meta-user/recipes-kernel
```

Make sure to set the following variables in your local.conf:
```bash
# Substitute these values with your own desired settings
DMA_TX_BUFF_COUNT = 128
DMA_RX_BUFF_COUNT = 128
DMA_BUFF_SIZE = 131072
```

For a practical example of how to integrate these recipes into a Yocto project, see [axi-soc-ultra-plus-core](https://github.com/slaclab/axi-soc-ultra-plus-core).

<!--- ########################################################################################### -->

# How to build the RCE drivers

```bash
# Go to the base directory
$ cd aes-stream-drivers

# Source the setup script (required for cross-compiling)
$ source /sdf/group/faders/tools/xilinx/2016.4/Vivado/2016.4/settings64.sh

# Build the drivers
$ make rce
```

<!--- ########################################################################################### -->
[DOE Code Record](https://www.osti.gov/doecode/biblio/8043)
2 changes: 1 addition & 1 deletion Yocto/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ echo "MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS += \"aximemorymap\"" >> $proj_dir/conf
echo "KERNEL_MODULE_AUTOLOAD += \"aximemorymap\"" >> $proj_dir/conf/layer.conf

# Step 5 - No action required for device tree
# Note: axi_memory_map does NOT require an entire for the device-tree
# Note: axi_memory_map does NOT require an entry for the device-tree

# Step 6 - Build the module
bitbake core-image-minimal
Expand Down
12 changes: 9 additions & 3 deletions Yocto/recipes-kernel/axistreamdma/files/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,22 @@ NAME := axi_stream_dma
# Directory containing the Makefile
HOME := $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))

# Trust the worktree even when owned by a different user — required when
# /etc/rc.local rebuilds drivers as root from a tree cloned by an
# unprivileged user. System/global git config is suppressed so a hostile
# core.fsmonitor / pager / credential helper there cannot ride along.
GIT := GIT_CONFIG_SYSTEM=/dev/null GIT_CONFIG_GLOBAL=/dev/null git -c safe.directory='*'

# Automatically determine the git version (override via GITV=... from caller, e.g. recipe)
ifndef GITV
GITT := $(shell cd $(HOME); git describe --tags 2>/dev/null)
GITT := $(shell cd $(HOME); $(GIT) describe --tags 2>/dev/null)
ifeq ($(GITT),)
GITT := $(shell cd $(HOME); git rev-parse --short HEAD 2>/dev/null)
GITT := $(shell cd $(HOME); $(GIT) rev-parse --short HEAD 2>/dev/null)
endif
ifeq ($(GITT),)
GITT := emulator
endif
GITD := $(shell cd $(HOME); git status --short -uno 2>/dev/null | wc -l)
GITD := $(shell cd $(HOME); $(GIT) status --short -uno 2>/dev/null | wc -l)
GITV := $(if $(filter $(GITD),0),$(GITT),$(GITT)-dirty)
endif

Expand Down
2 changes: 1 addition & 1 deletion common/app/dmaLoopTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ static char doc[] = "";

static struct argp_option options[] = {
{ "path", 'p', "PATH", 0, "Path of pgpcard device to use. Default=" DEFAULT_AXI_DEVICE ".", 0},
{ "dest", 'm', "LIST", 0, "Comman seperated list of destinations.", 0},
{ "dest", 'm', "LIST", 0, "Comma separated list of destinations.", 0},
{ "prbsdis", 'd', 0, 0, "Disable PRBS checking.", 0},
{ "size", 's', "SIZE", 0, "Size for transmitted frames.", 0},
{ "indexen", 'i', 0, 0, "Use index based receive buffers.", 0},
Expand Down
2 changes: 1 addition & 1 deletion common/app/dmaRead.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ static char doc[] = "";

static struct argp_option options[] = {
{ "path", 'p', "PATH", 0, "Path of pgpcard device to use. Default=" DEFAULT_AXI_DEVICE ".", 0},
{ "dest", 'm', "LIST", 0, "Comma seperated list of destinations.", 0},
{ "dest", 'm', "LIST", 0, "Comma separated list of destinations.", 0},
{ "prbsen", 'e', 0, 0, "Enable PRBS checking.", 0},
{ "indexen", 'i', 0, 0, "Use index based receive buffers.", 0},
{ "rawEn", 'r', "COUNT", 0, "Show raw data up to count.", 0},
Expand Down
2 changes: 1 addition & 1 deletion common/driver/axis_gen1.c
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ irqreturn_t AxisG1_Irq(int irq, void *dev_id) {
desc = NULL;
}

// Return entry to FPGA if destc is not open
// Return entry to FPGA if dest is not open
if ( desc == NULL ) {
if ( dev->debug > 0 ) {
dev_info(dev->device, "Irq: Port not open return to free list.\n");
Expand Down
12 changes: 9 additions & 3 deletions data_dev/driver/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,22 @@ ifneq ($(NVIDIA_DRIVERS),)
KBUILD_EXTRA_SYMBOLS := $(NVIDIA_DRIVERS)/Module.symvers
endif

# Trust the worktree even when owned by a different user — required when
# /etc/rc.local rebuilds drivers as root from a tree cloned by an
# unprivileged user. System/global git config is suppressed so a hostile
# core.fsmonitor / pager / credential helper there cannot ride along.
GIT := GIT_CONFIG_SYSTEM=/dev/null GIT_CONFIG_GLOBAL=/dev/null git -c safe.directory='*'

# Automatically determine the git version
ifndef GITV
GITT := $(shell cd $(HOME); git describe --tags 2>/dev/null)
GITT := $(shell cd $(HOME); $(GIT) describe --tags 2>/dev/null)
ifeq ($(GITT),)
GITT := $(shell cd $(HOME); git rev-parse --short HEAD 2>/dev/null)
GITT := $(shell cd $(HOME); $(GIT) rev-parse --short HEAD 2>/dev/null)
endif
ifeq ($(GITT),)
GITT := emulator
endif
GITD := $(shell cd $(HOME); git status --short -uno 2>/dev/null | wc -l)
GITD := $(shell cd $(HOME); $(GIT) status --short -uno 2>/dev/null | wc -l)
GITV := $(if $(filter $(GITD),0),$(GITT),$(GITT)-dirty)
endif

Expand Down
4 changes: 2 additions & 2 deletions data_dev/driver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@ $ sudo reboot

<!--- ######################################################## -->

## How to build and load the nvidia and datadev drviers
## How to build and load the nvidia and datadev drivers

After you completed all the "System Configuration" configuration steps above, run the following script to build and load the nvidia and datadev drviers
After completing all the "System Configuration" steps above, run the following script to build and load the nvidia and datadev drivers

```bash
$ sudo ./comp_and_load_drivers.sh
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/dma-legacy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ This document describes the ``aes-stream-drivers`` user space API for DMA.
``include/AxisDriver.h``

- axisSetFlags - Set the flags for a DMA transfer.
- axisGetFuser - Get the file user bits associated with the DMA transer.
- axisGetFuser - Get the file user bits associated with the DMA transfer.
- axisGetLuser - Get the last user bits associated with the DMA transfer.
- axisGetCont - Get the continue bit; set when there is another DMA transfer.
- axisReadAck - Acknowledge that a DMA transfer has been completed by the application.
Expand Down
18 changes: 10 additions & 8 deletions include/GpuAsyncLib.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
#include "DmaDriver.h"
#include "GpuAsync.h"

#ifdef __NVCC__
#ifdef __CUDACC__
#define globalFunc __global__
#define hostFunc __host__
#define deviceFunc __device__
Expand Down Expand Up @@ -184,13 +184,15 @@ void gpuDestroyBufferState(GpuBufferState_t* b);
* Layout matches AxiStreamDmaV2Write.vhd.
*/
struct __attribute__((packed)) AxiWrDesc64_t {
uint32_t result : 2;
uint32_t overflow : 1; /**< Overflow bit. */
uint32_t cont : 1; /**< Continue bit. */
uint32_t reserved0 : 12;
uint32_t lastUser : 8;
uint32_t firstUser : 8;
uint32_t size;
uint32_t flags;
uint32_t size;

// Accessors for frame flags. Cannot use bit flags due to potential reordering by the compiler.
deviceFunc hostFunc inline uint32_t result() const { return flags & 0x3; }
deviceFunc hostFunc inline uint32_t overflow() const { return !!(flags & 0x4); }
deviceFunc hostFunc inline uint32_t cont() const { return !!(flags & 0x8); }
deviceFunc hostFunc inline uint32_t lastUser() const { return (flags >> 16) & 0xFF; }
deviceFunc hostFunc inline uint32_t firstUser() const { return (flags >> 24) & 0xFF; }
};

static_assert(sizeof(AxiWrDesc64_t) == 8, "AxiWrDesc64_t must be 64-bits (8-bytes)");
Expand Down
2 changes: 1 addition & 1 deletion include/GpuAsyncRegs.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ GPU_ASYNC_DEF_REG(WrLatencyV4, 0x58, 0, 0xFFFFFFFF);
GPU_ASYNC_DEF_REG(RemoteWriteMaxSizeV4, 0x60, 0, 0xFFFFFFFF);


// The following register defintiions are firmware specific. GpuAsyncCore can have up to 16 buffers, but defaults to 4.
// The following register definitions are firmware specific. GpuAsyncCore can have up to 16 buffers, but defaults to 4.
// You must check the MaxBuffers register for the true value

/*********************** Write Buffers ************************/
Expand Down
12 changes: 9 additions & 3 deletions rce_hp_buffers/driver/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,16 @@ OBJS := $(patsubst %.c,%.o,$(SRCS))
GCC := $(COMP)gcc
TEST := $(shell which $(GCC) 2> /dev/null)

# Trust the worktree even when owned by a different user — required when
# /etc/rc.local rebuilds drivers as root from a tree cloned by an
# unprivileged user. System/global git config is suppressed so a hostile
# core.fsmonitor / pager / credential helper there cannot ride along.
GIT := GIT_CONFIG_SYSTEM=/dev/null GIT_CONFIG_GLOBAL=/dev/null git -c safe.directory='*'

ifndef GITV
GITT := $(shell cd $(HOME); git describe --tags 2>/dev/null)
GITT := $(shell cd $(HOME); $(GIT) describe --tags 2>/dev/null)
ifeq ($(GITT),)
GITT := $(shell cd $(HOME); git rev-parse --short HEAD 2>/dev/null)
GITT := $(shell cd $(HOME); $(GIT) rev-parse --short HEAD 2>/dev/null)
endif
# Final fallback so -DGITV=\"$(GITV)\" always produces a well-formed
# string literal when building from a non-git source tree (release
Expand All @@ -34,7 +40,7 @@ ifndef GITV
ifeq ($(GITT),)
GITT := unknown
endif
GITD := $(shell cd $(HOME); git status --short -uno 2>/dev/null | wc -l)
GITD := $(shell cd $(HOME); $(GIT) status --short -uno 2>/dev/null | wc -l)
GITV := $(if $(filter $(GITD),0),$(GITT),$(GITT)-dirty)
endif

Expand Down
Loading