chore: optimize avatar bone matrix calculation pipeline#7604
Open
chore: optimize avatar bone matrix calculation pipeline#7604
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
What does this PR change?
Improves the avatar bone matrix calculation pipeline with three key optimizations:
1. TransformAccessArray for parallel bone gathering
Bone
localToWorldMatrixreads are now performed on worker threads viaTransformAccessArray+IJobParallelForTransform(dedicatedBoneGatherJobandAvatarRootGatherJob), instead of iterating every bone per avatar on the main thread. This eliminates a significant main-thread bottleneck that scaled linearly with avatar count — 62Transform.localToWorldMatrixcalls per avatar, each involving a managed-to-native transition.2. Dedicated main player pipeline to unblock InterpolateCharacterSystem
The main player avatar is processed in its own separate small pipeline (62 bones + 1 root) that schedules and completes immediately within
StartAvatarMatricesCalculationSystem. This releases theTransformAccessArraylock on the main player's transforms beforeInterpolateCharacterSystemruns inChangeCharacterPositionGroup, preventing the job system from blocking the main thread when the character controller needs to write to the player transform hierarchy. Remote avatars use a separate batched pipeline with deferred completion inPreRenderingSystemGroup.3. Per-avatar Burst-optimized bone calculation (from PR #7230)
BoneMatrixCalculationJobis now parallelized per avatar rather than per bone. Each job task loads the avatar matrix once and loops over 62 bones usingmath.mulonfloat4x4natively — a tight sequential range that Burst can auto-vectorize. This also eliminatesMatrix4x4↔float4x4casts throughout the pipeline. Additionally,TransformAccessArrays are rebuilt lazily (only when avatars are added/removed), and the register-once pattern (RegisterAvatar/RegisterMainPlayerAvatar) avoids redundant per-frame main-thread work.Files changed:
BoneMatrixCalculationJob.cs—IJobParallelForper-avatar withfloat4x4/math.mulTransformGatherJobs.cs— NewBoneGatherJob+AvatarRootGatherJob(IJobParallelForTransform)AvatarTransformMatrixJobWrapper.cs— Dual pipeline (main player immediate + remote deferred), lazy TAA rebuildAvatarTransformMatrixComponent.cs— AddedIsMainPlayerflag for pipeline routingStartAvatarMatricesCalculationSystem.cs— Split query:PlayerComponent→ main player pipeline, others → remoteFinishAvatarMatricesCalculationSystem.cs— Routes to correct result array based onIsMainPlayerPerformance comparison
With 500 avatars, on an isolated worlds, there are clear gains. Left is
dev, right is this PRInterpolate character system doesnt show a bottleneck, as originally presented in this issue. The difference between
devand this branch is meaninglessTest Instructions
Prerequisites
Test Steps
StartAvatarMatricesCalculationSystem— confirm the main player gather jobs complete within the system's frame and noTransformAccessArraysync points appear duringInterpolateCharacterSystemAdditional Testing Notes
StartAvatarMatricesCalculationSystemshould be significantly reducedQuality Checklist
🤖 Generated with Claude Code