Sofus Albert Høgsbro Rose 73cbcae2c9 | ||
---|---|---|
gen | ||
res | ||
src | ||
.gitignore | ||
LICENSE.txt | ||
README.md | ||
compile.sh |
README.md
Fast HDR-->SDR Conversion Without a GPU
A slightly insane solution to a devilishly irritating problem.
Problem: What's the Problem?
I like to master my 3D work in HDR, because, well, I can. I'm a color nerd without a reference monitor.
When I want to show my friends, I try to stream this HDR content using the wonderful Jellyfin (Emby fork). Problems ensue: https://github.com/jellyfin/jellyfin/issues/415 . This isn't the first time somebody has wanted to do something like this, and had trouble: https://www.verizondigitalmedia.com/blog/best-practices-for-live-4k-video-ingestion-and-transcoding/ .
Long story short? The colors look WRONG. It's all wrong. It irritates me.
I am a normal person: I don't own an HDR screen, so like any normal person I spend days writing software to solve my irritations!
Plus, my screen just doesn't do BT.2020. I'm not sure any screen really does.
Problem: Isn't This Solved in VLC/mpv/etc. ?
It is indeed! They do a very nice 2020/PQ --> 709 on the GPU, in real time, and with very nice tone mapping.
However, ffmpeg cannot do this in realtime. This means, no live transcoding --> streaming of HDR clips through Jellyfin - and none of the other fun realtime imaging one wants to do on a when one isn't plugged into a loud ASIC more power-hungry than my refrigerator.
Problem: How to Solve It?
My approach is as follows:
- Piping: We keep
ffmpeg
to decode, but have it spit out raw YUV444 data, which we process and spit back out toffmpeg
for encoding. - Threaded I/O: Read, Write, and Process happen in different threads - the slowest determines the throughput.
- 8-Bit LUT: The actual imaging operations are first precomputed into a
256x256x256x3
LUT. These arbitrarily complex global operations are then applied with a simple lookup (no interpolation - just pure 8-bit madness!).
Problem: So - Was it Worth It?
Well - all the optimization and reverse engineering aside, the colors in my HDR content looks right through Jellyfin. So? All is right with the world again!
Though, to be honest, none of my non-technical peers quite understand where I've been the past couple days...
Features
How can fast-hdr
make your day brighter?
- [Hackable] Want to easily implement arbitrarily complex global imaging operations, whilst seeing them executed with decent accuracy in real time? It's not like there's a lot of code to sift through - just spice up
hdr_sdr_gen.cpp
with your custom tonemapping function, or whatever you want! - [CPU-Based] There's no GPU here, which means flexibility. Run it on your Raspberry Pi! Run it on a phone! Run it cheaply in - dare I say it - the cloud!
- [Fast] With FFMPEG as a decoder piped in, I get
30
FPS at 4K on aThreadripper 1950X (16-core)
.- Of course, not all CPUs in the world are Threadrippers, so the
ffmpeg
wrapper sets the max resolution to1080p
, which works like smooth butter;80
FPS on my less-powerful-but-still-kinda-beefy server. Comment it out if you don't like it :P
- Of course, not all CPUs in the world are Threadrippers, so the
- [Jellyfin-oriented
ffmpeg
Wrapper] Designed for Jellyfin, there's a Python wrapper aroundffmpeg
which injects the HDR-->SDR conversion into any complexffmpeg
invocation!- Everyone uses
ffmpeg
- therefore,fast-hdr
can run everywhere :) !
- Everyone uses
- [Realtime, Arbitrarily Complex Global Operations] With a
3 * 256^3
sized 8-Bit LUT (just 50MB in memory!) precomputed at compile-time byhdr_sdr_gen.cpp
, you can perform arbitrarily complex global operations, and apply them to an image (each 8-bit YUV triplet has a corresponding triplet) without any interpolation whatsoever. In this case, that's just an inverse PQ operator, followed by a global tonemap, followed by an sRGB correction. - [UNIX Philosophy]
fast-hdr
does processing, and that's it. It's just a brick. You can use any decoder / encoder you want - as long as it spits out / gobbles up YUV444p data tostdout
! Hell, I don't know, go crazy with OpenHEVC, or whatever shiny new thing ILM cooked up this time. (Though I'd probably useffmpeg
as it's less hard and at least tested).
How To Do
- Make sure you're on Linux, and have
python3
andgcc
(and can invokeg++
). - Get your hands on a standalone
ffmpeg
andffprobe
, and put it inres
. - Run
compile.sh
. It might take a sec, it has to precompute the.lutd
.
Then, let it play with some footage! For example, here's a complex ffmpeg invocation (run by Jellyfin):
rm -rf .tmp
FFMPEG="./dist/ffmpeg-custom"
FFMPEG_PY="./dist/ffmpeg"
HDR_SDR="./dist/hdr_sdr"
LUT_PATH="./dist/cnv.lut8"
# Test on a nice shot of a cute little catterpiller.
TIME="00:01:28"
FILE="./Life Untouched 4K Demo.mp4"
HLS_SEG=".tmp/transcodes/hash_of_your_hdr_file_no_need_to_replace%d.ts"
FILE_OUT=".tmp/transcodes/hash_of_your_hdr_file_no_need_to_replace.m3u8"
TEST=true "$FFMPEG_PY" -ss "$TIME" -f mp4 -i file:"$FILE" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -pix_fmt yuv420p -preset veryfast -crf 23 -maxrate 34541128 -bufsize 69082256 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames:0 "expr:gte(t,0+n_forced*3)" -g 72 -keyint_min 72 -sc_threshold 0 -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -start_at_zero -vsync -1 -codec:a:0 aac -strict experimental -ac 2 -ab 384000 -af "volume=2" -copyts -avoid_negative_ts disabled -f hls -max_delay 5000000 -hls_time 3 -individual_header_trailer 0 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "$HLS_SEG" -hls_playlist_type vod -hls_list_size 0 -y "$FILE_OUT"
How To: Jellyfin
To get Jellyfin to convert HDR footage when transcoding for web playback, follow the How To steps and make sure it works locally.
Then:
0. Make sure the python script ffmpeg
, the actual binaries ffmpeg-custom
and ffprobe
, the compiled binary hdr_sdr
, and the generated LUT cnd_lut8
are in dist
.
- Copy
dist
to somewhere on your server owned by thejellyfin
user. Probably a good idea tochmod
it too. - In the Jellyfin interface, in Playback ->
ffmpeg
Path, point it at the Python scriptffmpeg
. It's critical thatffprobe
is there too, otherwise Jellyfin will be unable to read header info about your files, like audio tracks or subtitle tracks.
Things to be aware of:
- Seeking is super slow, as for some reason the wrapper doesn't understand keyframes, and will try to encode its way to wherever you seeked to. So don't seek :) Resuming works fine, however.
- Occasionally, you might have to
killall ffmpeg-custom && killall hdr_sdr
, or they'll keep eating CPU cycles after you've stopped watching. I'm still not sure why. I have no idea why.
Testing / Stability / Modularization / Any Kind of Good Software Development Practices
No
Contributing
Is that a malloc
I see in your C++? In this devout Stroustrup'ian neighborhood? Blasphemy...
I like you, you insane monkey - if this horrible hack truly is actually useful to you then I'm speechless!
I'm happy to help you make it work, help with bugs (make an Issue!), and/or (probably) accept any kind of PR. God knows there's enough wrong with this piece of software that it needs some fixing up...
Development happens at https://git.sofusrose.com/so-rose/fast-hdr .
TODO
It gets wilder!
- Seeking in Jellyfin: One cannot seek from the Jellyfin interface. This may have something to do with the
ffmpeg
wrapper not catching theq
to quit. - Better Profiling: Measuring performance characteristics of a threaded application like this isn't super easy, but probably worth it.
- Dithering: There's a reason nobody precomputes transforms on every possible 8-bit YUV value: Posterization. We can solve this to a quite reasonable degree by dithering while applying the LUT!
- More Robust
ffmpeg
Wrapper: It's a little touchy right now. Like, it throws an exception if you give it a wrong-f
... - More Vivid Image: Personal preference, I like vivid! This is just a matter of tweaking the image processing pipeline.
- Verify Color Science: Right now, it's all done a bit by trial and error. It does, however, match VLC's output quite well.
- 10-bit LUT: Who cares if this needs a 3GB buffer to compute? It solves posterization issues, and allows directly processing 10-bit footage to boot!
- Gamut Rendering: Currently, there's no gamut mapping or rendering intent management. It coule be nice to have.
- Better Tonemapping: The present tonemapping is arbitrarily chosen, even though it does look nice.
- Variable Input: YUV444p isn't nirvana. Plus, imagine how fast clever LUT'ing directly on 4K YUV420p data might be.