You are right. This is a rare case (and a good example) where actually the video encoding is the bottleneck.
Code
[17:12:11] Frame #233: vRender: 20 us, vProcess: 10567 us, vEncoding: 29849 us, aRender: 63 us, aEncoding: 793 us, Latency: 42414 us
[17:12:11] Frame #234: vRender: 20 us, vProcess: 10604 us, vEncoding: 29452 us, aRender: 63 us, aEncoding: 442 us, Latency: 41711 us
[17:12:11] Frame #235: vRender: 19 us, vProcess: 10465 us, vEncoding: 29521 us, aRender: 64 us, aEncoding: 462 us, Latency: 41812 us
[17:12:11] Frame #236: vRender: 20 us, vProcess: 10410 us, vEncoding: 29694 us, aRender: 61 us, aEncoding: 10 us, Latency: 41313 us
[17:12:11] Frame #237: vRender: 19 us, vProcess: 10273 us, vEncoding: 29524 us, aRender: 62 us, aEncoding: 446 us, Latency: 41454 us
As I can see you are encoding in 10 bit. That means:
| Step (per frame) | Time (in ms) | Theor. FPS |
|---|---|---|
| vRender - Premiere renders in "VUYA 4:4:4:4 (Float)" | 0.02 | 50.000 |
| vProcess - Pixel format conversion to "p010le" | + 10.50 |
95 |
| vEncoding - Actual GPU encoding | + 29.50 | 24 |
| Total |
41.50 |
24 |
You should get a huge performance boost if you render in 8 bit. Do you have to use 10 bit?
As far as I know the old version of Voukoder did not do a proper 10 bit encode. That's most likely why they were faster.