Skip to content

Full support of J1850, new starter algo, bugfixes, pin code etc#57

Open
Pugnator wants to merge 46 commits into
release/F103from
bugfix/code_refactoring
Open

Full support of J1850, new starter algo, bugfixes, pin code etc#57
Pugnator wants to merge 46 commits into
release/F103from
bugfix/code_refactoring

Conversation

@Pugnator

@Pugnator Pugnator commented May 30, 2026

Copy link
Copy Markdown
Owner

This PR brings the J1850-based engine-state starter lock (#54), full targeted per-module DTC clearing (MIL + SIL), ADC voltage calibration, and a series of J1850 bus analysis fixes validated on a real Harley.
Closes #12, closes #13, closes #14, closes #15, closes #16, closes #17, closes #18, closes #19, closes #20, closes #21, closes #22, closes #23, closes #24, closes #25, closes #26, closes #27, closes #28, closes #29, closes #30, closes #31, closes #32, closes #33, closes #34, closes #35, closes #36, closes #37, closes #38, closes #39, closes #40, closes #41, closes #42, closes #43, closes #44, closes #45, closes #46, closes #47, closes #48, closes #49, closes #50, closes #51, closes #52, closes #53, closes #54

Pugnator added 30 commits May 2, 2024 11:48
The macro was named after TIM9 but the active blinker timer is TIM4
(BLINKER_TIMER). Rename to reflect actual usage and document that the
value must stay in sync with the CubeMX-generated timer configuration.

Fixes #7
The blinker timer period (110 ms) is larger than DEBOUNCE_MIN_TIME
(100 ms), so the first time this branch could be evaluated pressDuration
is already greater than the threshold and the check never fires.

The DEBUG_LOG line also had a format-string mismatch (one %u, two
arguments). Removing the dead code eliminates both problems.

Fixes #8
The 'startTime > currentTime' guard incorrectly discarded button events
near the 32-bit HAL_GetTick() wraparound (~49.7 days uptime). Unsigned
subtraction already gives the correct elapsed time across the wrap, so
remove the buggy guard.

Fixes #6
If the user pressed the second button within the debounce window
(intending hazard), the EXTI handler called startBlinkerTimer() again,
clobbering startTime and timerHitCounter set by the first press. The
elapsed-time calculation for the original press was lost.

Only start the timer when neither button event is already armed, so the
existing debounce window keeps running for the combined press.

Fixes #9
After toggling hazard the code entered the same waitLongPress state used
for single-side turn signals. Holding both buttons for more than 1s then
flipped overtakeMode as a side effect, which could affect the next
single-side press.

Hazard mode has no notion of overtake/long-press, so simply reset the
event and return.

Fixes #11
The 'timerHitCounter > 2' branch fired about 220 ms after a single-side
toggle, setting overtakeMode=true if both buttons were already released.
This made any brisk tap on a turn-signal button look as if the signal
'barely turned on' because overtakeMode auto-cancels after a few blinks.

The decision is redundant with the existing LONG_PRESS_COUNT check a
few lines below: that path already classifies an early release as
overtake. Drop the early-exit so the FSM always waits the full
long-press window before classifying tap vs. hold, and document the
intended policy.

Also use '<' instead of '!=' for the counter compare to be robust if
the counter ever overshoots.

Fixes #10
leftEnabled, rightEnabled, hazardEnabled, overtakeMode and blinkCounter
are written from ISR context (HAL_GPIO_EXTI_Callback,
HAL_TIM_PeriodElapsedCallback) and read from the main loop in tsm.cc.
Without 'volatile' the compiler is free to cache these across function
boundaries under -O3 -flto, which can make the main loop miss an ISR
state change or vice-versa.

Add 'volatile' to both the definitions in turn_ctrl.cc and the extern
declarations in tsm.h to keep them consistent.

Fixes #5
Sensor mounting is X=back, Y=right, Z=up. Conventional body frame for a ground vehicle is X=forward, Y=right, Z=up - a 180 deg rotation about Z. Apply x_body = -x_sensor once at the float layer in readAccelAxis() / readGyroAxis() so all downstream consumers see a consistent right-handed body frame.

Bias-capture paths still operate in sensor frame (they write back to the IMU's internal bias registers, which use raw sensor axes), so they are unchanged.

Fixes #29
…6-DoF

The previous (ax && ay && az) guard intended to skip the gradient-descent correction when the accelerometer reading was invalid. In practice, on a level, stationary device any of the three components routinely equals 0.0f exactly, and under engine vibration the components oscillate through zero - both cases silently dropped legitimate samples and left the filter integrating gyro open-loop.

Replace with a squared-magnitude check (> 1e-6f) which correctly rejects only the all-zero / division-by-zero pathology.

Fixes #21
data[] was declared 8 bytes but only 6 bytes were read from the device. The next two bytes (data[6], data[7]) were uninitialised stack memory, yet they were interpreted as a temperature value and written into chipTemperature_ on every accelerometer sample. The TEMP_OUT registers are not adjacent to the accel block on the MPU9250 anyway, so the value would have been garbage even with a longer burst read.

Drop the bogus temperature write entirely; a correct readChipTemperature() is tracked separately.

Fixes #23\nRefs #32
The previous unconditional 'while (interruptStatus() != DataReady);' busy-wait monopolised the SPI/I2C bus with back-to-back INT_STATUS register reads and would freeze the firmware indefinitely if the IMU INT signal ever became stuck (mis-wire, ESD event, IMU power-down).

Replace with a HAL_GetTick-based deadline of 3 ms.  At the 100 Hz fixed update rate the IMU always has fresh data within the first poll, so this is generous; on timeout we return the previous quaternion and try again on the next tick.  A future change can replace polling with an EXTI handler on the IMU INT line.

Fixes #20
These functions were never invoked - there is no UI surface (button, menu, serial command) that can trigger a calibration sequence on the deployed motorcycle device, and the device cannot be hand-rotated through such a sequence in place. The MPU9250 driver already performs a 1000-sample gyro bias capture at boot via gyroCurrentBias(), which covers the legitimate intent of staticCalibration() for the gyro. A separate ZUPT-based online bias refinement is tracked in #30.

Removing them also eliminates an infinite-loop bug (i >= 0 with uint32_t) and another instance of the unbounded DATA_READY busy-wait.

Fixes #14
Mahony 9-DoF requires a magnetometer, but the magnetometer is permanently disabled on this product (steel motorcycle frame produces extreme, non-stationary hard-iron offsets that cannot be calibrated out). Without mag aiding, Mahony degrades to a gyro+accel filter that is strictly worse than Madgwick-6-DoF for the same compute cost.

Drop the mahony9DoF declaration and definition along with the file-scope eInt[3] and deltat globals that only Mahony used (deltat was never written, so the integration step was dead code anyway).  Keep beta since Madgwick still uses it.

Fixes #22
The magnetometer is permanently unusable on this product (steel motorcycle frame produces a multi-hundred-microtesla hard-iron offset that varies with engine heat and ignition coil EMI).  DISABLE_MAGNETOMETER=1 is set and will stay set.

Previously configureMagnetometer() and setMagnetometerAsSlave() were called unconditionally from both the SPI and I2C IMU constructors, which: pulled the AK8963 into 100 Hz continuous-conversion mode for nothing, ran a buggy magCurrentBias() routine (int32 accumulator on float, 10 samples), and added ~100 ms of blocking delay to boot.

Wrap both calls in #if !DISABLE_MAGNETOMETER and delete the now-unreachable magCurrentBias().  Saves ~870 bytes of flash and removes one int-on-float UB site.

Fixes #24
MagneticJammingDetector::sample() was a stub that computed the magnetic field magnitude and discarded the result.  No caller used the detection output - the only invocation site in ahrs.cc was already commented out, and the one in fusion.cc lived inside a /* */ block.

Even if completed, the feature has no purpose in this deployment: the magnetometer is disabled, and a steel motorcycle frame would produce constant strong 'jamming' that the detector could not distinguish from real interference.

Delete antijam.cc, antijam.h, the std::unique_ptr<MagneticJammingDetector> member of AhrsBase, the heap allocation in the constructor, the commented-out call sites, and the Makefile entry.  Saves ~200 bytes and one boot-time heap allocation.

Fixes #27
sampleFreq_ was read before the first sampleQuant() update path wrote to it, feeding garbage into the Madgwick integrator's 1/dt for the first sample.  Initialise it to AHRS_UPDATE_RATE in the AhrsBase ctor.

Also clamp the non-fixed-rate elapsed-time computation to >= 1 ms so two samples that land in the same SysTick millisecond cannot divide by zero.  The fixed-rate path is unaffected; this only matters if FIXED_AHRS_UPDATE_RATE is later turned off.

Fixes #25

Fixes #19
VectorFloat::normalize() divided each component by the magnitude unconditionally; on a zero vector this produced NaN that then propagated through every downstream consumer (getRotated, getGravity, ...).  Bail out early when the magnitude is zero so the vector stays at the origin.

Fixes #18
The Quake III magic-constant fast inverse square root used 'long*' casts to reinterpret float bits as integers and back.  This violates strict aliasing and the behaviour is undefined; even with -fno-strict-aliasing locally, -flto can constant-fold through the cast incorrectly when the function is inlined into another TU.

Replace the casts with std::memcpy() into a std::uint32_t and back.  GCC lowers memcpy() of a register-sized scalar to no instructions, so generated code is identical for well-formed input while becoming defined behaviour.

Dormant under current SPEED_MATH=0 build (FAST_INV_SQRT expands to 1.0/sqrt), but the function is included from a header and would be live the moment fast math is re-enabled.

Fixes #16
The log2-based bit-hack approximation in _fastSqrt() left ~5% relative error -- enough to noticeably skew Quaternion::getMagnitude() and any other consumer.  Add one Newton-Raphson step (y' = 0.5 * (y + z/y)) which brings the relative error below 0.1% for the entire positive range.

Also guard against z <= 0 to avoid the division by zero introduced by the Newton step, and switch the bit reinterpretation to std::memcpy to match the strict-aliasing-safe pattern already used by _fastInvSqrt.

Dormant under SPEED_MATH=0 but kept correct for the path that does use it.

Fixes #17
The polynomial approximation diverges sharply outside [-1, +1]; on a gravity vector that has just been normalised, rounding routinely produces values like 1.0000001f which would feed the polynomial out of its valid domain.  Clamp explicitly and return +/- pi/2 at the limits.

Dormant under SPEED_MATH=0 (FAST_ASIN expands to std::asin).

Fixes #26
Multiplying each component by the magnitude inflated the vector by |v|^2 -- the exact opposite of what normalise() is supposed to do.  Divide instead.

Currently dead code (no live caller), but the matching VectorFloat::normalize() is on the hot AHRS path and they should not diverge.

Fixes #12
Quaternion::normalize() returns true on success, but getNormalized() returned the identity quaternion on success and the un-normalised input on failure -- exact opposite of what was intended.  Invert the test.

Dead code today, but the same struct is the canonical Quaternion type, so the helper has to be correct before any future caller relies on it.

Fixes #13
accelCurrentBias() had multiple defects: uninitialised maskBit[] when the input bit was 0, indices off by one in the read-back (maskBit[2]/maskBit[3] for entries that only hold three elements), and writes that ANDed maskBit[i] back into the byte instead of just clearing/setting bit 0.  The only call site was already commented out in configureAccelerometer().

Delete the function and its declaration.  A correctly-written runtime accel bias routine can be reintroduced later if needed, but the deployment relies on the factory trims plus the existing AHRS-side bias accumulators, so nothing is lost.

Fixes #15
After the Mahony cleanup the only remaining file-scope global in fusion.cc was 'volatile float beta = betaDef;'.  It was never modified, but its file-scope storage made it shared across template instantiations rather than belonging to the filter, and 'volatile' blocked the optimiser from folding the multiplications by 0.1.

Promote to 'constexpr float beta = 0.1f;' inside the anonymous namespace and drop the betaDef macro.

Fixes #28
The default A_DLPF_CFG=0x03 (41 Hz bandwidth) admits the entire engine-vibration spectrum (12-67 Hz fundamentals, 20-80 Hz frame resonances) into the orientation integrator.  Madgwick has no way to tell that energy apart from real orientation change, which is the root cause behind the axis-truthiness drops fixed by issue #21 and the noisy lean estimates the autocancel logic has to fight.

Introduce ACCEL_DLPF_CFG in ahrs_config.h (default 0x05 -> 10.2 Hz) and consume it from configureAccelerometer().  Turn-signal autocancel decisions operate on second-scale events, so 10 Hz of accel bandwidth is comfortably enough; gyro DLPF stays at 41 Hz since lean transients are genuinely fast.

Fixes #31
Pugnator added 14 commits May 27, 2026 11:34
Adds Mpu9250base::readChipTemperature(float&) which reads TEMP_OUT_H/L (registers 0x41/0x42) and converts per datasheet section 4.18: T_C = (raw / 333.87) + 21.  Replaces the bogus over-read from readAccelAxis() that was tracked under #23.

sampleQuant() now refreshes chipTemperature_ every 100th sample (1 Hz at the 100 Hz fixed AHRS rate), keeping the existing getTemperature() accessor populated with a real value.  Provides the thermal covariate the ZUPT bias model in #30 needs.

Fixes #32
While the bike is stationary (per-axis |gyro_residual| < 0.5 deg/s and ||accel|-1g| < 30 mg sustained for 0.5 s), update an EMA of the gyro reading with alpha=0.01 per sample.  The estimate is subtracted from every live gyro sample before it reaches Madgwick, so thermal drift (~0.005 deg/s/degC * 60 degC = ~0.3 deg/s over a ride) and any vibration captured by the boot-time hardware bias get cancelled in the field.

All thresholds live in ahrs_config.h (ZUPT_GYRO_THRESH_DPS, ZUPT_ACCEL_THRESH_G, ZUPT_HOLD_SAMPLES, ZUPT_BIAS_ALPHA) and the whole feature is gated by ENABLE_ZUPT (default 1).  Epoch start/end are logged on RTT along with the current chip-die temperature for offline drift modelling.

Coexists with the boot-time IMU register bias (gyroCurrentBias()): the IMU still holds the coarse offset, this is the online refinement on top.

Fixes #30
SEGGER RTT was previously up-only.  Adds cliPoll(), called once per
main-loop iteration, that drains the down-channel-0 byte queue into a
192-byte line buffer, echoes input back on the up-channel, handles
backspace, and dispatches whole lines to a tiny command table.

v1 command set: help, ver (build tag + CPU id + feature flags), ahrs
(live YPR + chip temperature), j1850 (RX byte/frame counters +
message-ready flag), reset (NVIC_SystemReset with a short drain delay
so the RTT FIFO flushes first).

The CLI stays decoupled from the AHRS templates via two extern-C hooks
(cliGetChipTemperatureC, cliGetYprDeg) that tsm.cc implements over a
file-static raw pointer to the unique_ptr-owned AHRS instance.  Default
weak overrides in cli.cc keep the file buildable when MEMS_ENABLED=0.

Unblocks J1850 phases 2-5 in docs/J1850_PLAN.md: future j1850 tx/spoof
commands can hang off the same dispatcher without touching the bus-side
code.
- settings.h: J1850_ENABLED 0 -> 1; IC interrupt + EOF timer now start at boot.
- tsm.cc main loop: call J1850VPW::parseFrame() on every collected message
  so rpms/kph/mil/sil/trip mirrors stay live; printFrame() now gated by the
  runtime flag J1850VPW::j1850TraceEnabled (default off, quiet log).
- j1850.h / j1850parser.cc: introduce J1850VPW::j1850TraceEnabled (volatile
  bool, namespace-scoped).
- cli.cc / cli.h: extend `j1850` command with sub-commands
    * `j1850`              -> status (rx ctr, frames, trace flag, MIL/SIL)
    * `j1850 trace on|off` -> toggle auto RX dump
    * `j1850 tx HH HH ...` -> bench transmit (CRC appended, max 10 bytes)
    * `j1850 clear`        -> 6C 00 F1 14 (KWP clear-DTC request)
  Hex parser tolerates whitespace-separated or run-on byte pairs.
- j1850vpw.cc: cliJ1850TxRaw() strong override.  Briefly masks TIM2_CH2
  input-capture IT, resets the RX state machine, drives sendFrame(),
  re-arms IT, reports OK / BAD_FRAME / LOST_ARB on the CLI.
- docs/J1850_PLAN.md: record confirmed hardware (MCZ33390 populated,
  PA1=RX/TIM2_CH2, PA2=TX bit-bang), 2005 carb Sportster Delphi target;
  mark phase 2 done; rewrite open questions list to ECM-revision /
  bus-terminator / OEM capture.

No autonomous TX yet -- heartbeat is phase 3.

Refs #2
- tim.c / TSM_103.ioc: TIM2 prescaler 63 -> 71.  Gives a 1 MHz capture tick
  so the SAE-J1850 pulse-width thresholds in j1850.h read directly as us.
  Without this the SOF band (164-239 us) is reached as 184-269 counts and
  any SOF longer than ~213 us was being rejected.  Fixes #34.
- j1850vpw.cc: IC ISR bounds check `j1850RXctr > J1850_PAYLOAD_SIZE`
  changed to `>=` so the next bit write cannot land one past the end of
  payloadJ1850[].  Fixes #35.
- j1850parser.cc: printFrame() rewritten to a single-line dump
  `[tick] #N OK|BAD nB pri=P DST<-SRC : HH HH ...` so a busy bus stays
  readable on the RTT log.  Verbose multi-line header dump removed.
- j1850parser.cc: j1850TraceEnabled now defaults to true so RX frames
  appear immediately on boot for bench verification; turn off with
  'j1850 trace off' once the link is validated.
- switch_ctrl.cc: corrected stale comment (TIM5/TIM6 -> TIM2/TIM3, the
  actual timers used for J1850 capture + EOF detection).

Other findings from the audit are tracked in #36 (1-byte-header frames
silently dropped), #37 (stale DMA config on TIM2), #38 (asymmetric
wrap-around guard in onRisingEdge).
Fixes #39 - Assign distinct preempt priorities so TIM2 (J1850 IC) can
preempt all other ISRs:
  TIM2 (J1850 IC)   : 0  <- highest
  TIM3 (J1850 EOF)  : 1
  EXTI15_10 (buttons): 2
  ADC1_2 / DMA1_Ch1/7: 3
  TIM4 (blinker FSM) : 4  <- lowest

Fixes #38 - Add symmetric wrap-around guard in onRisingEdge:
  if (fallEdgeTime > riseEdgeTime) { messageReset(); return; }
  (mirrors the existing guard in onFallingEdge)

Fixes #41 - Remove HAL_ADC_Stop_DMA() from HAL_ADC_ConvCpltCallback.
ISR now only sets adcDMAcompleted flag. adcHandler() in the main loop
calls Stop_DMA just before restarting the conversion.
Bug 1 (critical): early returns in the voltage FSM bypassed
adcDMAcompleted=false and HAL_ADC_Start_DMA. After the first threshold
crossing (either direction) DMA was never restarted; adcHandler()
reprocessed stale buffer data forever and the ISR never fired again.
Fix: move Stop/Start_DMA to right after smoothedAverage is computed,
before the state machine, so every return path is safe.

Bug 2 (hazard): HAL_ADC_ConvCpltCallback gated adcDMAcompleted on
hazardEnabled. After hazard ended, adcDMAcompleted was false and DMA
was not running, so voltage monitoring never resumed.
Fix: always set adcDMAcompleted in ISR; gate the FRL/starter state
machine in adcHandler() on hazardEnabled instead.

Bug 3 (minor): buffer average loop started at i=1 (skipping [0]) but
divided by ADC_DMA_BUF_SIZE (128 not 127) -> 0.78% systematic under-
estimate. Fix: start at i=0, include all 128 samples.

Bug 4: adcDMAcompleted declared as plain bool; ISR writes it and main
loop reads it under -O3/-flto. Fix: volatile bool in definition and
extern declaration.
ADC_12_2V_VALUE (2919) and ADC_12_8V_VALUE (3522) names did not match
the voltages computed by the divider formula in the code (R1=10k,
R2=2.7k, Vref=3.3V, 12-bit ADC):
  2919 counts -> 11.1 V
  3522 counts -> 13.4 V

Renamed to ADC_11_1V_VALUE and ADC_13_4V_VALUE. No functional change.
…R refactor

Fixes #36 - J1850 parser: reject 1-byte-header frames instead of
silently returning true. WARN_LOG emitted with rx byte count.

Fixes #37 - Remove stale TIM2 DMA configuration. TIM2 operates in IC
interrupt mode; DMA was initialised but never started. Removed:
  - DMA_HandleTypeDef hdma_tim2_ch2_ch4 global in tim.c
  - DMA init + __HAL_LINKDMA block in HAL_TIM_Base_MspInit (tim.c)
  - DMA1_Channel7 NVIC config in dma.c
  - extern + DMA1_Channel7_IRQHandler in stm32f1xx_it.c
  - TIM2_CH2/CH4 DMA entries in TSM_103.ioc

Fixes #40 - Move TIM4 blinker FSM out of HAL_TIM_PeriodElapsedCallback.
TIM4 ISR now only sets blinkerTick = true. blinkerTimerFSM() (the old
80-line ISR body) runs in the main loop via blinkerHandler().

Fixes #42 - EXTI button ISR reduced to raw flag set only:
  leftButtonRawEvent = true / rightButtonRawEvent = true
All debounce logic (startBlinkerTimer, DEBUG_LOG, event arm guard) moved
to processButtonEvents(), called from blinkerHandler() in the main loop.
printFrame() used INFO_LOG which compiles to nothing when DEBUG=0.
Since the function is already gated by j1850TraceEnabled, switch all
output in the J1850 parser to PrintF (unconditional SEGGER RTT).

- printFrame(): guard moved inside the function, uses PrintF
- parseFrame(): RPM/speed readout uses PrintF instead of INFO_LOG
- parseFrame(): CRC mismatch now logs got/expected values
- parseFrame(): 1-byte header rejection uses PrintF (was WARN_LOG=empty)
- tsm.cc: remove now-redundant j1850TraceEnabled outer check
- tsm.cc: unconditional PrintF at startup showing MPU9250 SPI init result
- tsm.cc: add cliGetImuOk() accessor (returns gAhrs_->ok())
- cli.h: declare cliGetImuOk()
- cli.cc: ahrs command now shows "IMU: OK / FAILED" before YPR

Previously all IMU init diagnostics were DEBUG_LOG (silent in production)
so a SPI/hardware failure was invisible.
@Pugnator Pugnator self-assigned this May 30, 2026
@Pugnator Pugnator added this to the v1.0 milestone May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment