Currently detecting when the use stops speaking is done based on a fixed audio threshold. This works in current testing because I'm developing this in a quiet room.
To be more flexible and work in places with more background noise we should calculate the silence threshold dynamically based on something like a rolling mean.