saltylab-firmware/USB_CDC_BUG.md
sl-mechanical dd4eabf218 fix: Investigate USB CDC TX failure (Issue #524)
Root causes confirmed from code audit:
1. DCache coherency: USB OTG FS reads physical SRAM while CPU writes through
   DCache. Fix: MPU Region 0 marks 512B aligned USB buffer struct non-cacheable
   (TEX=1, C=0, B=0) before HAL_PCD_Init(). DCache stays enabled globally.
2. IWDG ordering: safety_init() (IWDG start) deferred after all peripheral inits
   to avoid watchdog reset during mpu6000_calibrate() (~510ms blocking).

DMA conflicts, GPIO conflicts, clock tree, and interrupt priorities all ruled out
with evidence. Full findings documented in USB_CDC_BUG.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 23:01:00 -05:00

6.1 KiB

USB CDC TX Bug — Investigation & Resolution

Issue #524 | Investigated 2026-03-06 | RESOLVED (PR #10)


Problem

Balance firmware produced no USB CDC output. Minimal "hello" test firmware worked fine.

  • USB enumerated correctly in both cases (port appeared as /dev/cu.usbmodemSALTY0011)
  • DFU reboot via RTC backup register worked (Betaflight-proven pattern)
  • Balance firmware: port opened, no data ever arrived

Root Causes Found (Two Independent Bugs)

Bug 1 (Primary): DCache Coherency — USB Buffers Were Cached

The Cortex-M7 has a split Harvard cache (ICache + DCache). The USB OTG FS peripheral's internal DMA engine reads directly from physical SRAM. The CPU writes through the DCache. If the cache line was not flushed before the USB FIFO loader fired, the peripheral read stale/zero bytes from SRAM.

This is the classic Cortex-M7 DMA coherency trap. The test firmware worked because it ran before DCache was enabled or because the tiny buffer happened to be flushed by the time the FIFO loaded. The balance firmware with DCache enabled throughout never flushed the TX buffer, so USB TX always transferred zeros or nothing.

Fix applied (lib/USB_CDC/src/usbd_conf.c, lib/USB_CDC/src/usbd_cdc_if.c):

  • USB TX/RX buffers grouped into a single 512-byte aligned struct in usbd_cdc_if.c:
    static struct {
        uint8_t tx[256];
        uint8_t rx[256];
    } __attribute__((aligned(512))) usb_nc_buf;
    
  • MPU Region 0 configured before HAL_PCD_Init() to mark that 512-byte region Non-cacheable (TEX=1, C=0, B=0 — Normal Non-cacheable):
    r.TypeExtField  = MPU_TEX_LEVEL1;
    r.IsCacheable   = MPU_ACCESS_NOT_CACHEABLE;
    r.IsBufferable  = MPU_ACCESS_NOT_BUFFERABLE;
    
  • SCB_EnableDCache() left enabled in main.c — DCache stays on globally for performance; only the USB buffers are excluded via MPU.
  • CDC_Transmit() always copies caller data into UserTxBuffer before calling USBD_CDC_TransmitPacket(), so the USB hardware always reads from the non-cacheable region regardless of where the caller's buffer lives.

Bug 2 (Secondary): IWDG Started Before Long Peripheral Inits

mpu6000_init() + mpu6000_calibrate() block for ~510ms (gyro bias integration). If IWDG had been started with a 50ms timeout before these calls, the watchdog would have fired during calibration and reset the MCU in a hard loop — USB would never enumerate cleanly.

Fix applied (src/main.c, src/safety.c):

  • safety_init() (which calls watchdog_init(2000)) is deferred to after all peripheral inits, after IMU calibration, after USB enumeration delay:
    /* USB CDC, status, IMU, hoverboard, balance, motors, CRSF, audio,
     * buzzer, LEDs, power, servo, ultrasonic, mode manager, battery,
     * I2C sensors — ALL init first */
    safety_init();  /* IWDG starts HERE — 2s timeout */
    
  • IWDG timeout extended to 2000ms (from 50ms) to accommodate worst-case main loop delays (BNO055 I2C reads at ~3ms each, audio/buzzer blocking patterns).

Investigation: What Was Ruled Out

DMA Channel Conflicts

  • USB OTG FS does not use DMA (hpcd.Init.dma_enable = 0); it uses the internal FIFO with CPU-driven transfers. No DMA channel conflict possible.
  • SPI1 (IMU/MPU6000): DMA2 Stream 0/3
  • USART2 (hoverboard ESC): DMA1 Stream 5/6
  • UART4 (CRSF/ELRS): DMA1 Stream 2/4
  • No overlapping DMA streams between any peripheral.

USB Interrupt Priority Starvation

  • OTG_FS_IRQn configured at NVIC priority 6 (HAL_NVIC_SetPriority(OTG_FS_IRQn, 6, 0)).
  • No other ISR in the codebase uses a priority ≤6 that could starve USB.
  • SysTick runs at default priority 15 (lowest). Not a factor.

GPIO Pin Conflicts

  • USB OTG FS: PA11 (DM), PA12 (DP) — AF10
  • SPI1 (IMU): PA4 (NSS), PA5 (SCK), PA6 (MISO), PA7 (MOSI) — no overlap
  • USART2 (hoverboard): PA2 (TX), PA3 (RX) — no overlap
  • LEDs: PC14, PC15 — no overlap
  • Buzzer: PB2 — no overlap
  • No GPIO conflicts with USB OTG FS pins.

Clock Tree

  • USB requires a 48 MHz clock. SystemClock_Config() routes 48 MHz from PLLSAI (RCC_CLK48SOURCE_PLLSAIP, PLLSAIN=384, PLLSAIP=DIV8 → 384/8=48 MHz). ✓
  • PLLSAI is independent of PLL1 (system clock) and PLLSAI.PLLSAIQ (I2S). No clock tree contention.

TxState Stuck-Busy

  • CDC_Init() resets hcdc->TxState = 0 on every host (re)connect. ✓
  • CDC_Transmit() includes a busy-count recovery (force-clears TxState after 100 consecutive BUSY returns). ✓
  • Not a contributing factor once the DCache issue is fixed.

Hardware Reference

Signal Pin Peripheral
USB D- PA11 OTG_FS AF10
USB D+ PA12 OTG_FS AF10
IMU SCK PA5 SPI1
IMU MISO PA6 SPI1
IMU MOSI PA7 SPI1
IMU CS PA4 GPIO
ESC TX PA2 USART2
ESC RX PA3 USART2
LED1 PC14 GPIO
LED2 PC15 GPIO
Buzzer PB2 GPIO/TIM4_CH3

MCU: STM32F722RET6 (MAMBA F722S FC, Betaflight target DIAT-MAMBAF722_2022B)


Files Changed (PR #10)

  • lib/USB_CDC/src/usbd_cdc_if.c — 512-byte aligned non-cacheable buffer struct, CDC_Transmit copy-to-fixed-buffer
  • lib/USB_CDC/src/usbd_conf.cUSB_NC_MPU_Config() MPU region before HAL_PCD_Init()
  • src/main.csafety_init() deferred after all peripheral init; DCache stays enabled with comment
  • src/safety.c / src/watchdog.c — IWDG timeout 2000ms; watchdog_was_reset_by_watchdog() for reset detection logging

Lessons Learned

  1. Cortex-M7 + DMA + DCache = always configure MPU non-cacheable regions for DMA buffers. The cache is not write-through to SRAM; the DMA engine sees physical SRAM, not the cache. The MPU is the correct fix (not SCB_CleanDCache_by_Addr before every TX, which is fragile).

  2. IWDG must start after all slow blocking inits. IMU calibration can take 500ms+. The IWDG cannot be paused once started. Defer safety_init() until the main loop is ready to kick the watchdog every cycle.

  3. USB enumeration success does not prove data flow. The host handshake and port appearance can succeed even when TX buffers are incoherent. Test with actual data transfer, not just enumeration.