Skip to content

[Nexthop] Documentation for debugging utilities#929

Open
anna-nexthop wants to merge 1 commit intofacebook:mainfrom
nexthop-ai:anna-nexthop.debug-documentation
Open

[Nexthop] Documentation for debugging utilities#929
anna-nexthop wants to merge 1 commit intofacebook:mainfrom
nexthop-ai:anna-nexthop.debug-documentation

Conversation

@anna-nexthop
Copy link
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

This PR adds comprehensive debugging documentation for FBOSS when the CLI is not available, typically when the hardware agent (fboss_hw_agent) is down or not responding.

The new documentation covers:

Service Status and Logs

  • Checking FBOSS service status with systemd
  • Viewing service logs with journalctl
  • Additional log file locations (boot history, snapshots)
  • Service dependency tree

Crash Dumps and State Files

  • Crash dump locations and analysis
  • Core dump management with coredumpctl
  • Using gdb for debugging

Direct Hardware Access Utilities

  • wedge_qsfp_util - Transceiver debugging with --direct-i2c
  • weutil - EEPROM information
  • fw_util - Firmware versions
  • fixmyfboss - Automated diagnostic tool
  • showtech - System information collection with --details options
  • Other diagnostic utilities

Important Runtime Files and Directories

  • Configuration files
  • Platform information (including dmidecode for system info)
  • Warm boot state
  • SDK dumps

Debug Builds

  • Building with debug symbols
  • Using debug builds with gdb

GDB Debugging

  • Attaching to running processes
  • Debugging core dumps

Common Troubleshooting Scenarios

  • Hardware agent won't start
  • Service crashes immediately
  • No logs available

Test Plan

  • Verified all commands on actual FBOSS device
  • Tested documentation rendering in Docusaurus dev server
  • Confirmed all internal links work correctly
  • Validated that generic examples don't expose real hardware identifiers
image

This PR adds comprehensive debugging documentation for FBOSS when the
CLI is not available, typically when the hardware agent
(`fboss_hw_agent`) is down or not responding.

The new documentation covers:

- Checking FBOSS service status with systemd
- Viewing service logs with journalctl
- Additional log file locations (boot history, snapshots)
- Service dependency tree

- Crash dump locations and analysis
- Core dump management with coredumpctl
- Using gdb for debugging

- `wedge_qsfp_util` - Transceiver debugging with `--direct-i2c`
- `weutil` - EEPROM information
- `fw_util` - Firmware versions
- `fixmyfboss` - Automated diagnostic tool
- `showtech` - System information collection with `--details` options
- Other diagnostic utilities

- Configuration files
- Platform information (including `dmidecode` for system info)
- Warm boot state
- SDK dumps

- Building with debug symbols
- Using debug builds with gdb

- Attaching to running processes
- Debugging core dumps

- Hardware agent won't start
- Service crashes immediately
- No logs available

- Verified all commands on actual FBOSS device
- Tested documentation rendering in Docusaurus dev server
- Confirmed all internal links work correctly
- Validated that generic examples don't expose real hardware identifiers
@meta-cla meta-cla bot added the CLA Signed label Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant