Camera sensors can generate a lot of data, and quickly!
How the data is recorded can greatly affect the requirements of the storage system. For example- recording the raw data from the sensor at 60+ frames/second will use both the most amount of storage, but also demand the highest rate for saving the data.
But most frames have a lot of self-similarity. As well, perhaps most of a subsequent image will be the same as the previous image. These similarities and differences can be represented mathematically- letting us trade space for computation. This is called compression.
For compression we need an original image to start with. This original frame is denoted as an I-frame - intra coded picture. It may itself be compressed relative to itself (eg like a JPEG image.) Subsequent blocks of data represent how each successive frame changes from the previous (single, or multiple frames.) These are predicted frames, denoted as P-frames. If we allow a P-frame to not only reference previous frames but also future frames, then even higher compression can be achieved. These are called bi-directional predicted picutres, or B-frames.
We can achieve even higher rates of compression if we allow the alogorithm to be a little loose and make some minor mistakes, loosing some details or accuracy. This is called loosy compression.
As well, the human visual perception system is more sensitive to the brightness (luminance) than color (chrominance.) We can take advantage of this to use more of our limited bandwidth to send luminance information than chrominance. This is called chroma subsampling and is defined using a geometric ratio:
<sample size>:<horizontal samples>:<vertical samples>. Full chroma quality is usually written as 4:4:4 chroma subsampling. Full information here and here.
Note: the most natural measure of compression would be as a ratio between the original amount of data and the amount after compression. However it’s rarely directly measured. Rather we tell the “smart compressor” what our constraints are, usually the maximum data-rate/second, and let it adjust to meet these requirements.
There is a fine dance to be played between intended use and other constraints. For example, we could achieve very high compression if we have start with a single I-frame and encode the rest of the video with P-frames. However, if there is any data missing, or corrupted, we will loose all our video from that point forward.
Some example compression algorithms are: mpeg-2, h264, h265, vp9.
Once we have the compressed data we have to have an agreed means of determining where the data for a given frame starts and stops. This is called the encoding format. Some example container/encoder types are: mp2, mp4, divx, avi.
The type of compression (and encoding) can heavily affect editing: going backwards in a clip (‘rewinding’) may require playing forward from the last i-frame to the desired point. It can also make certain types of editing difficult, or impossible. For example 4:2:0 and 4:1:1 chroma subsampling can make for difficult green (chroma) screen keying. See this article.
Where, and how, the video is compressed can greatly affect the cost of the final camera.
Another thing to consider is recording time limits. These may be dictated by:
- Storage capacity
- Heat (due to computing the compression)
- Tax- some jurisdictions impose higher tax rates on video cameras. Typically the capability to record 30 minutes or more is the cut-off. See this article.
- we do not need advanced recording options on the camera
- we will perform encoding and storage on a connected PC, where there (should be) faster and larger storage and computational capabilities.
- it’s desirable to be able to save higher chroma sampling rates (e.g. 4:4:4 or 4:4:2) and different compression and encoding schemes. e.g. h264 if editing is not expected, prores or similar if a lot of editing is expected.
- Cameras must be able to be powered by a plug pack/AC adapter during recording, preferably with the battery removed