I've experimented a bit and tried to fix the offset in mp4 rendered with connector 0.5.1 using itsoffset ffmpeg option:
ffmpeg.exe -i voukoder.mp4 -itsoffset 0.044 -i voukoder.mp4 -map 0:0 -map 1:1 -acodec copy -vcodec copy synced.mp4
ffprobe synced.mp4:
start_pts=1056
start_time=0.022000
However, the synced.mp4 file behaves in VP exactly like file rendered using connector 0.7.1: if so4 disabled (default in up to VP14) -- the audio stream shifted one frame behind comparing to when so4 is enabled (default in VP15+).
So looks like the VP's decoders decode audio streams differently depend on the start_time stream information.
I suspect that Happy Otter Scripts is also affected.
As a result, you cannot actually fix this using the offset and the only way to fix it properly for VP is to add silence at the beginning or trim audio.
In any case, current implementation looks better as audio shift is smaller.