Skip to content

chore: optimize avatar bone matrix calculation pipeline#7604

Open
dalkia wants to merge 1 commit intodevfrom
chore/opti-transform-job
Open

chore: optimize avatar bone matrix calculation pipeline#7604
dalkia wants to merge 1 commit intodevfrom
chore/opti-transform-job

Conversation

@dalkia
Copy link
Collaborator

@dalkia dalkia commented Mar 17, 2026

Pull Request Description

What does this PR change?

Improves the avatar bone matrix calculation pipeline with three key optimizations:

1. TransformAccessArray for parallel bone gathering

Bone localToWorldMatrix reads are now performed on worker threads via TransformAccessArray + IJobParallelForTransform (dedicated BoneGatherJob and AvatarRootGatherJob), instead of iterating every bone per avatar on the main thread. This eliminates a significant main-thread bottleneck that scaled linearly with avatar count — 62 Transform.localToWorldMatrix calls per avatar, each involving a managed-to-native transition.

2. Dedicated main player pipeline to unblock InterpolateCharacterSystem

The main player avatar is processed in its own separate small pipeline (62 bones + 1 root) that schedules and completes immediately within StartAvatarMatricesCalculationSystem. This releases the TransformAccessArray lock on the main player's transforms before InterpolateCharacterSystem runs in ChangeCharacterPositionGroup, preventing the job system from blocking the main thread when the character controller needs to write to the player transform hierarchy. Remote avatars use a separate batched pipeline with deferred completion in PreRenderingSystemGroup.

3. Per-avatar Burst-optimized bone calculation (from PR #7230)

BoneMatrixCalculationJob is now parallelized per avatar rather than per bone. Each job task loads the avatar matrix once and loops over 62 bones using math.mul on float4x4 natively — a tight sequential range that Burst can auto-vectorize. This also eliminates Matrix4x4float4x4 casts throughout the pipeline. Additionally, TransformAccessArrays are rebuilt lazily (only when avatars are added/removed), and the register-once pattern (RegisterAvatar / RegisterMainPlayerAvatar) avoids redundant per-frame main-thread work.

Files changed:

  • BoneMatrixCalculationJob.csIJobParallelFor per-avatar with float4x4/math.mul
  • TransformGatherJobs.cs — New BoneGatherJob + AvatarRootGatherJob (IJobParallelForTransform)
  • AvatarTransformMatrixJobWrapper.cs — Dual pipeline (main player immediate + remote deferred), lazy TAA rebuild
  • AvatarTransformMatrixComponent.cs — Added IsMainPlayer flag for pipeline routing
  • StartAvatarMatricesCalculationSystem.cs — Split query: PlayerComponent → main player pipeline, others → remote
  • FinishAvatarMatricesCalculationSystem.cs — Routes to correct result array based on IsMainPlayer

Performance comparison

With 500 avatars, on an isolated worlds, there are clear gains. Left is dev, right is this PR

image

Interpolate character system doesnt show a bottleneck, as originally presented in this issue. The difference between dev and this branch is meaningless

image

Test Instructions

Prerequisites

  • Use a build with GPU skinning enabled
  • Have access to a scene with multiple avatars (10+ recommended)

Test Steps

  1. Enter world and confirm your own avatar renders correctly with all wearables
  2. Move to a populated area with 10+ other avatars visible
  3. Verify all remote avatars render and animate correctly
  4. Teleport between parcels — confirm no avatar flickering or missing meshes
  5. Walk around continuously — verify character movement is smooth with no microstuttering
  6. Open the Unity Profiler and check StartAvatarMatricesCalculationSystem — confirm the main player gather jobs complete within the system's frame and no TransformAccessArray sync points appear during InterpolateCharacterSystem
  7. Change your wearables — confirm avatar re-instantiates correctly
  8. Verify emotes still play correctly on both local and remote avatars

Additional Testing Notes

  • Compare Profiler traces before/after: main-thread time in StartAvatarMatricesCalculationSystem should be significantly reduced
  • Pay special attention to character movement smoothness — the main player pipeline separation specifically targets stutter caused by job system locks
  • Test with both low (1-5) and high (50+) avatar counts to verify both pipelines perform well
  • Edge case: rapidly teleporting in/out of crowded areas (avatar registration/release churn)

Quality Checklist

  • Changes have been tested locally
  • Documentation has been updated (if required)
  • Performance impact has been considered
  • For SDK features: Test scene is included

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dalkia dalkia requested review from a team as code owners March 17, 2026 15:29
@github-actions
Copy link
Contributor

badge

New build in progress, come back later!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant