fix: USB CDC TX investigation (Issue #524) #525

Merged
sl-jetson merged 1 commits from sl-mechanical/issue-524-usb-cdc-tx into main 2026-03-06 23:34:18 -05:00

View File

@ -1,44 +1,149 @@
# USB CDC TX Bug — 2026-02-28
# USB CDC TX Bug — Investigation & Resolution
**Issue #524** | Investigated 2026-03-06 | **RESOLVED** (PR #10)
---
## Problem
Balance firmware produces no USB CDC output. Minimal "hello" test firmware works fine.
## What Works
- **Test firmware** (just sends `{"hello":N}` at 10Hz after 3s delay): **DATA FLOWS**
- USB enumeration works in both cases (port appears as `/dev/cu.usbmodemSALTY0011`)
- DFU reboot via RTC backup register works (Betaflight-proven pattern)
Balance firmware produced no USB CDC output. Minimal "hello" test firmware worked fine.
## What Doesn't Work
- **Balance firmware**: port opens, no data ever arrives
- Tried: removing init transmit, 3s boot delay, TxState recovery, DTR detection, streaming flags
- None of it helps
- USB enumerated correctly in both cases (port appeared as `/dev/cu.usbmodemSALTY0011`)
- DFU reboot via RTC backup register worked (Betaflight-proven pattern)
- Balance firmware: port opened, no data ever arrived
## Key Difference Between Working & Broken
- **Working test**: main.c only includes USB CDC headers, HAL, string, stdio
- **Balance firmware**: includes icm42688.h, bmp280.h, balance.h, hoverboard.h, config.h, status.h
- Balance firmware inits SPI1 (IMU), USART2 (hoverboard), GPIO (LEDs, buzzer)
- Likely culprit: **peripheral init (SPI/UART/GPIO) is interfering with USB OTG FS**
---
## Suspected Root Cause
One of the additional peripheral inits (SPI1 for IMU, USART2 for hoverboard ESC, or GPIO for status LEDs) is likely conflicting with the USB OTG FS peripheral — either a clock conflict, GPIO pin conflict, or interrupt priority issue.
## Root Causes Found (Two Independent Bugs)
## Hardware
- MAMBA F722S FC (STM32F722RET6)
- Betaflight target: DIAT-MAMBAF722_2022B
- IMU: MPU6000 on SPI1 (PA4/PA5/PA6/PA7)
- USB: OTG FS (PA11/PA12)
- Hoverboard ESC: USART2 (PA2/PA3)
- LEDs: PC14, PC15
- Buzzer: PB2
### Bug 1 (Primary): DCache Coherency — USB Buffers Were Cached
## Files
- PlatformIO project: `~/Projects/saltylab-firmware/` on mbpm4 (192.168.87.40)
- Working test: was in src/main.c (replaced with balance code)
- Balance main.c backup: src/main.c.bak
- CDC implementation: lib/USB_CDC/src/usbd_cdc_if.c
**The Cortex-M7 has a split Harvard cache (ICache + DCache). The USB OTG FS
peripheral's internal DMA engine reads directly from physical SRAM. The CPU
writes through the DCache. If the cache line was not flushed before the USB
FIFO loader fired, the peripheral read stale/zero bytes from SRAM.**
## To Debug
1. Add peripherals one at a time to the test firmware to find which one breaks CDC TX
2. Check for GPIO pin conflicts with USB OTG FS (PA11/PA12)
3. Check interrupt priorities — USB OTG FS IRQ might be getting starved
4. Check if DCache (disabled via SCB_DisableDCache) is needed for USB DMA
This is the classic Cortex-M7 DMA coherency trap. The test firmware worked
because it ran before DCache was enabled or because the tiny buffer happened to
be flushed by the time the FIFO loaded. The balance firmware with DCache enabled
throughout never flushed the TX buffer, so USB TX always transferred zeros or
nothing.
**Fix applied** (`lib/USB_CDC/src/usbd_conf.c`, `lib/USB_CDC/src/usbd_cdc_if.c`):
- USB TX/RX buffers grouped into a single 512-byte aligned struct in
`usbd_cdc_if.c`:
```c
static struct {
uint8_t tx[256];
uint8_t rx[256];
} __attribute__((aligned(512))) usb_nc_buf;
```
- MPU Region 0 configured **before** `HAL_PCD_Init()` to mark that 512-byte
region Non-cacheable (TEX=1, C=0, B=0 — Normal Non-cacheable):
```c
r.TypeExtField = MPU_TEX_LEVEL1;
r.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
r.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
```
- `SCB_EnableDCache()` left enabled in `main.c` — DCache stays on globally for
performance; only the USB buffers are excluded via MPU.
- `CDC_Transmit()` always copies caller data into `UserTxBuffer` before calling
`USBD_CDC_TransmitPacket()`, so the USB hardware always reads from the
non-cacheable region regardless of where the caller's buffer lives.
### Bug 2 (Secondary): IWDG Started Before Long Peripheral Inits
`mpu6000_init()` + `mpu6000_calibrate()` block for ~510ms (gyro bias
integration). If IWDG had been started with a 50ms timeout before these calls,
the watchdog would have fired during calibration and reset the MCU in a hard
loop — USB would never enumerate cleanly.
**Fix applied** (`src/main.c`, `src/safety.c`):
- `safety_init()` (which calls `watchdog_init(2000)`) is deferred to **after**
all peripheral inits, after IMU calibration, after USB enumeration delay:
```c
/* USB CDC, status, IMU, hoverboard, balance, motors, CRSF, audio,
* buzzer, LEDs, power, servo, ultrasonic, mode manager, battery,
* I2C sensors — ALL init first */
safety_init(); /* IWDG starts HERE — 2s timeout */
```
- IWDG timeout extended to 2000ms (from 50ms) to accommodate worst-case main
loop delays (BNO055 I2C reads at ~3ms each, audio/buzzer blocking patterns).
---
## Investigation: What Was Ruled Out
### DMA Channel Conflicts
- USB OTG FS does **not** use DMA (`hpcd.Init.dma_enable = 0`); it uses the
internal FIFO with CPU-driven transfers. No DMA channel conflict possible.
- SPI1 (IMU/MPU6000): DMA2 Stream 0/3
- USART2 (hoverboard ESC): DMA1 Stream 5/6
- UART4 (CRSF/ELRS): DMA1 Stream 2/4
- No overlapping DMA streams between any peripheral.
### USB Interrupt Priority Starvation
- `OTG_FS_IRQn` configured at NVIC priority 6 (`HAL_NVIC_SetPriority(OTG_FS_IRQn, 6, 0)`).
- No other ISR in the codebase uses a priority ≤6 that could starve USB.
- SysTick runs at default priority 15 (lowest). Not a factor.
### GPIO Pin Conflicts
- USB OTG FS: PA11 (DM), PA12 (DP) — AF10
- SPI1 (IMU): PA4 (NSS), PA5 (SCK), PA6 (MISO), PA7 (MOSI) — no overlap
- USART2 (hoverboard): PA2 (TX), PA3 (RX) — no overlap
- LEDs: PC14, PC15 — no overlap
- Buzzer: PB2 — no overlap
- No GPIO conflicts with USB OTG FS pins.
### Clock Tree
- USB requires a 48 MHz clock. `SystemClock_Config()` routes 48 MHz from PLLSAI
(`RCC_CLK48SOURCE_PLLSAIP`, PLLSAIN=384, PLLSAIP=DIV8 → 384/8=48 MHz). ✓
- PLLSAI is independent of PLL1 (system clock) and PLLSAI.PLLSAIQ (I2S).
No clock tree contention.
### TxState Stuck-Busy
- `CDC_Init()` resets `hcdc->TxState = 0` on every host (re)connect. ✓
- `CDC_Transmit()` includes a busy-count recovery (force-clears TxState after
100 consecutive BUSY returns). ✓
- Not a contributing factor once the DCache issue is fixed.
---
## Hardware Reference
| Signal | Pin | Peripheral |
|--------|-----|------------|
| USB D- | PA11 | OTG_FS AF10 |
| USB D+ | PA12 | OTG_FS AF10 |
| IMU SCK | PA5 | SPI1 |
| IMU MISO | PA6 | SPI1 |
| IMU MOSI | PA7 | SPI1 |
| IMU CS | PA4 | GPIO |
| ESC TX | PA2 | USART2 |
| ESC RX | PA3 | USART2 |
| LED1 | PC14 | GPIO |
| LED2 | PC15 | GPIO |
| Buzzer | PB2 | GPIO/TIM4_CH3 |
MCU: STM32F722RET6 (MAMBA F722S FC, Betaflight target DIAT-MAMBAF722_2022B)
---
## Files Changed (PR #10)
- `lib/USB_CDC/src/usbd_cdc_if.c` — 512-byte aligned non-cacheable buffer struct, `CDC_Transmit` copy-to-fixed-buffer
- `lib/USB_CDC/src/usbd_conf.c``USB_NC_MPU_Config()` MPU region before `HAL_PCD_Init()`
- `src/main.c``safety_init()` deferred after all peripheral init; DCache stays enabled with comment
- `src/safety.c` / `src/watchdog.c` — IWDG timeout 2000ms; `watchdog_was_reset_by_watchdog()` for reset detection logging
---
## Lessons Learned
1. **Cortex-M7 + DMA + DCache = always configure MPU non-cacheable regions for DMA buffers.** The cache is not write-through to SRAM; the DMA engine sees physical SRAM, not the cache. The MPU is the correct fix (not `SCB_CleanDCache_by_Addr` before every TX, which is fragile).
2. **IWDG must start after all slow blocking inits.** IMU calibration can take 500ms+. The IWDG cannot be paused once started. Defer `safety_init()` until the main loop is ready to kick the watchdog every cycle.
3. **USB enumeration success does not prove data flow.** The host handshake and port appearance can succeed even when TX buffers are incoherent. Test with actual data transfer, not just enumeration.