Analyzing the Performance
This document describes how Voukoder processes data and how it measures the time needed to perform each step of an exporting process.
A host application ("Host") is the application the plugin runs in. In example Adobe Premiere or VEGAS Pro.
First we need to talk about some basic facts.
- Voukoder does not necessarily do exports faster! It simply provides access to different encoders. These give you the possibilities to choose between the three encoding priorities. Encoding speed or video quality or filesize. You can't have all together!
- Voukoder does not speed up rendering! If the host application is slow with rendering / delivering frames Voukoder can't make it go faster and it is actually limited to that speed then. No matter how fast any encoder / GPU is!
It's not that easy ...
It is quite complicated to analyze performance issues for various reasons ...
- There are millions of different computer configurations and the slightest difference could have an impact on the measurements
- Comparing encoders is complicated as hell! One can not simply say "Voukoder is better than xyz!", because:
- What do you compare against? Encoding speed? File size? Visual quality?
- What is happening? FFmpeg will always be faster than Voukoder because Voukoder includes the rendering part of the Host.
- What settings or presets do you use? Are the settings optimized for the CPU / GPU?
Breaking it down
In the end most of you are interested in the encoding speed, how many frames per second (fps) can be exported. Let's see how we're calculating the average fps at the end of the encoding process. This value is the average time (latency) of all encoded frames. So lets take a look at a single frame like the line from the log file:
[12:31:35] Frame #920: vRender: 31 µs, vProcess: 9 µs, vEncoding: 13274 µs, aRender: 106 µs, aEncoding: 806 µs, Latency: 14378 µs
The exporting process is basically executing the tasks above from the left to the right:
Render > Process > Encoding
These three steps can be broken down to "substeps":
- Single, compressed frames will be acquired from the demuxer
- Uncompressing these frames
- Layers, Effects and Transitions get rendered to a single, uncompressed frame
- Depending on the host application pixel format conversions need to be done
- Maybe images need to be vertically flipped
- Maybe small, secondary pixel format conversions are required
- The filter chain will be processed
- The final frame gets encoded and will send to the muxer (and to disk)
Each of these steps take a certain amount of time. In the example above rendering the frame takes like 31 µs. That is pretty fast. So if one frame takes like 31 µs we could theoretically have like 32258 frames per second (1 / time(s)), right? Awesome but until now the Host only rendered the frame. It has not been process, encoder nor written to disk yet. All the times of each step add up to each other and the export of a single frame can never be faster than the previous steps.
|Render video frame||31 µs||31 µs||32258 fps|
|Process video frame||9 µs||40 µs||25000 fps|
|Encoding video frame||13274 µs||13314 µs||75 fps|
|Render audio frame||106 µs||13420 µs||74 fps|
|Encode audio frame||806 µs||14226 µs||70 fps|
|Total frame-to-frame latency||-||14378 µs||69 fps|
That's how you'll get like 69 fps in the end (if all frames of the project have an average of 69 fps). But what if your project is UHD and has lots of filters and effects? Your log could look a bit different because the Host has a lot of work to render all the effects and text layers and images before handing it over to Voukoder:
[22:14:01] Frame #515: vRender: 12891µs, vProcess: 110 µs, vEncoding: 2912 µs, aRender: 118 µs, aEncoding: 791 µs, Latency: 16972 µs
The table above would look like this:
|Render video frame||12891µs||12891 µs||77 fps|
|Process video frame||110 µs||12991 µs||76 fps|
|Encoding video frame||2912 µs||15903 µs||62 fps|
|Render audio frame||118 µs||16021 µs||62 fps|
|Encode audio frame||791 µs||16812 µs||59 fps|
|Total frame-to-frame latency||-||16972 µs||58 fps|
In this example you see a slightly lower fps. But while you could improve the first example with add hardware encoding (GPU) encoding to accelerate the "Encoding video frame" step, it would not be possible in the second example. The percentual GPU usage would be very low. This means Voukoder is not able to accelerate this.
You will have this metrics only when having video encoding enabled.
The time the host application requires to render the frame. Voukoder has no impact on this (with a few exceptions) as all of this happens in the host application.
- Decoding the source frame to uncompressed values
- Applying filters, effects and all layers
- Converting it to the requested pixel format
CUDA can have a significant impact on vRender depending on your project structure. It will most likely limit your export speed to a certain value (You can test this with the VRPT-Tool), but it will also accelerate your effects and filters.
A a rule of thumb: If your project makes use of lots of effects and filters turn CUDA on. If not, turn it off. It can be changed in the project settings.
The Host can use the hardware decoding support of an integrated GPU on Intel systems. This sounds like it would be faster than CPU decoding but this is not the case in general. On slower CPUs this could have a positive effect, on faster CPUs it might be better to disable it. Again, you can test it with the VRPT-Tool which is faster on your system / project.
This is the time Voukoder needs to prepare the frame date for using it with FFmpeg/libav. With YUV 4:2:0 (8 bit) data this value should be pretty low as no conversion is necessary. With other pixel formats the time needed for this task could increase drastically.
After rendering and processing the raw frame data this is the final step. The value of vEncoding is the time the software- or hardware encoder needs to compress the frame and write the frame to disk. This also includes the processing of all video filters.
You will have this metrics only when having audio encoding enabled.
Just like vRender this is the time the host application needs to render the audio samples. It also is the combined value of several tasks:
- Decoding the source audio to uncompressed values
- Applying filters, effects and all layers
Voukoder has no impact or acceleration possibilities on this value.
After rendering the raw sample data this is the final step. The value of aEncoding is the time the encoder needs to compress the data and write the it to disk. This also includes the processing of all audio filters.
The overall end-to-end time a frame needed to export. This includes all steps above as well as the required glue code. 1 / Latency (in seconds) equals the frames per second.