-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Whether RGB <-> XYZ conversion should be more faster or be kept simple #351
Comments
Is generated function designed for this situation? |
Am I correct in understanding that generating the approximate expressions (e.g. polynomials) can be done in compile-time? Edit: |
I agree, I wouldn't do this with a generated function. I really like the speedup. I might consider using a continued fraction representation instead. Some such representations are here. Since I am not certain one has to get full mathematical precision, since our ability to discriminate colors has limits. But of course precision is convenient, especially for tests, so we shouldn't give it up too lightly.
Yeah, tests have long been a sore point. The JLD-based tests were an attempt to improve the situation, but more could be done. |
Interestingly, julia> @benchmark x^(1/2.4) setup=(x=max(rand(),0.0031308))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 57.274 ns (0.00% GC)
median time: 57.884 ns (0.00% GC)
mean time: 60.855 ns (0.00% GC)
maximum time: 77.416 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 983
julia> @benchmark exp(1/2.4 * log(x)) setup=(x=max(rand(),0.0031308))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 10.910 ns (0.00% GC)
median time: 15.917 ns (0.00% GC)
mean time: 18.759 ns (0.00% GC)
maximum time: 38.437 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> @benchmark exp2(1/2.4 * log2(x)) setup=(x=max(rand(),0.0031308))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 24.874 ns (0.00% GC)
median time: 25.476 ns (0.00% GC)
mean time: 25.388 ns (0.00% GC)
maximum time: 47.442 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997 |
I know that pow3_4(x) = (y = sqrt(x); y*sqrt(y))
pow5_12(x) = pow3_4(x) / cbrt(x)
On the other hand, as Edit: Edit2: function pow4_5(x::Float32)
t = 0.2f0x + 0.8f0x/sqrt(sqrt(x))
abs(1-t) > 0.02f0 && (t = 0.2f0t + 0.8f0x/sqrt(sqrt(t)))
abs(1-t) > 0.52f0 && (t = 0.2f0t + 0.8f0x/sqrt(sqrt(t)))
t = Float64(t)
0.2t + 0.8x/sqrt(sqrt(t))
end
pow12_5(x::Float32) = (y = pow4_5(x); y^3) However, the speedup effect is not so great. Edit3: Line 281 in 43d93ff
So, we can use quadratic/cubic splines by means of the LUT. const invert_srgb_compand_n0f8 = [invert_srgb_compand(v/255.0) for v = 0:257] # +2
function invert_srgb_compand(v::Float32)
i = unsafe_trunc(Int, v * 255)
(i < 12 || i > 255) && return invert_srgb_compand(Float64(v))
y0,y1,y2,y3 = view(invert_srgb_compand_n0f8, i:i+3)
dv = v * 255.0 - i
dv == 0.0 && return y1
if v < 0.38857287f0 # `≈srgb_compand(1/8)`
return y1+0.5*dv*(-1/3*y3+2y2- y1-2/3*y0 +
dv*( y2-2y1+ y0 +
dv*(+1/3*y3- y2+ y1-1/3*y0)))
else
return y1+0.5*dv*(-y3+4y2-3y1+dv*(y3-2y2+y1))
end
end As far as I can think of, this spline technique with the LUT is almost fastest for |
Benchmark (snip.)julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake) before (43d93ff)julia> @benchmark convert(RGB{Float64}, x) setup=(x=XYZ{Float64}(rand(),rand(),rand()))
minimum time: 19.658 ns (0.00% GC)
median time: 45.737 ns (0.00% GC)
mean time: 50.024 ns (0.00% GC)
maximum time: 176.429 ns (0.00% GC)
julia> @benchmark convert(RGB{Float32}, x) setup=(x=XYZ{Float32}(rand(),rand(),rand()))
minimum time: 21.465 ns (0.00% GC)
median time: 45.135 ns (0.00% GC)
mean time: 48.481 ns (0.00% GC)
maximum time: 147.642 ns (0.00% GC)
julia> @benchmark convert(XYZ{Float64}, x) setup=(x=RGB{Float64}(rand(),rand(),rand()))
minimum time: 20.583 ns (0.00% GC)
median time: 54.618 ns (0.00% GC)
mean time: 55.775 ns (0.00% GC)
maximum time: 143.372 ns (0.00% GC)
julia> @benchmark convert(XYZ{Float32}, x) setup=(x=RGB{Float32}(rand(),rand(),rand()))
minimum time: 23.193 ns (0.00% GC)
median time: 59.839 ns (0.00% GC)
mean time: 61.497 ns (0.00% GC)
maximum time: 153.012 ns (0.00% GC) after (
|
Umm...:confused: function invert_srgb_compand(v::Float32)
i = unsafe_trunc(Int32, v * 255)
(i < 13 || i > 255) && return invert_srgb_compand(Float64(v))
@inbounds y = view(invert_srgb_compand_n0f8, i:i+3)
dv = v * 255.0 - i
dv == 0.0 && @inbounds return y[2]
if v < 0.38857287f0
return @fastmath(y[2]+0.5*dv*((-2/3*y[1]- y[2])+(2y[3]-1/3*y[4])+
dv*(( y[1]-2y[2])+ y[3]-
dv*(( 1/3*y[1]- y[2])+( y[3]-1/3*y[4]) ))))
else
return @fastmath(y[2]+0.5*dv*((4y[3]-3y[2])-y[4]+dv*((y[4]-y[3])+(y[2]-y[3]))))
end
end Edit: julia> @benchmark convert(XYZ{Float32}, x) setup=(x=RGB{Float32}(rand(),rand(),rand()))
minimum time: 13.713 ns (0.00% GC)
median time: 17.317 ns (0.00% GC)
mean time: 18.173 ns (0.00% GC)
maximum time: 59.859 ns (0.00% GC) |
Will this PR be influenced by JuliaMath/FixedPointNumbers.jl#125 ? |
The answer is probably no, because the RGB <--> XYZ conversions mainly use |
I learned a little about Julia's inlining. cf. JuliaMath/FixedPointNumbers.jl#129 (comment) # `pow12_5` is called from `invert_srgb_compand`.
# x^y ≈ exp(y*log(x)) ≈ exp2(y*log2(y)); the middle form is faster
@noinline pow12_5(x) = x^2 * exp(0.4 * log(x)) # 12/5 == 2.4 == 2 + 0.4 |
I'm going to change the significant digits of numbers in the sRGB matrices from ∼7 to ∼16. (cf. #355 (comment)) Paradoxically, this provides a backing that the accuracy can be lowered with lower precision types than Is there a problem? |
Sounds good to me. Once we've made one breaking change (and recent changes in FixedPointNumbers probably qualify, I have a PR I should submit for ColorVectorSpace that fixes one of the tests), there's little reason to hold back. Might as well get things "right." |
BTW, even if there is not enough color gamut support, the conversion matrices can be obtained dynamically (in the precompilation) when the color primaries are defined. (cf. #372 (comment)) |
This is not an issue report. And, I have not decided a vision of what it should be.
I was interested in the following comment in another issue.
Originally posted by @timholy in #349 (comment)
And then, I gave it a try eliminating
^
withFloat32
precision. (I bet there's a better way.)Since the sRGB matrix transformation is (substantially) always done in
Float64
, I changed the type of the matrix elements toFloat32
. (Formally, it may be better to keep them inFloat64
forXYZ{Float64}
.)Then, it certainly worked as I expected.
Since the RGB <-> XYZ conversion is used frequently, I think it is worth specializing. However, I also think that such an ad hoc function is not elegant.
Do you have any thoughts?
The text was updated successfully, but these errors were encountered: