I've tried depth-pro using my own image with size 320x320, before inference focal length is set according to camera intrinsic.
The result is unstable. While my camera is stationary (there're only camera noise influents image generation), continious frames show quite different depth:
