You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a minimal example, involving only the forward pass, on Flux's master:
using Flux
using Statistics, Random
using CUDA
functiontrain_mlp()
d_in =128
d_out =128
batch_size =128
num_iters =10
device =gpu_device()
model =Dense(d_in => d_out) |> device
x =randn(Float32, d_in, batch_size) |> device
for iter in1:num_iters
ŷ =model(x)
@info iter
# GC.gc(true)
CUDA.pool_status()
endendtrain_mlp()
# GC.gc(true)# CUDA.raclaim()
The issue is likely with GC not being GPU-aware and not finalizing gpu arrays in time, so the memory just keeps growing, even though only a fraction of it is actually used.
Maybe with EscapeAnalysis and things like JuliaLang/julia#55990 situation can be improved, but I'm not sure if it can work effectively with things like Zygote (maybe @aviatesk can clarify).
The situation is worse if you render your desktop and run computations on the same GPU, so when you run out of memory in Julia you also crash your DE.
I recently experimented with allowing users to define a region of code where all gpu allocations are recorded and then bulk-freeing them once the program is out of that region in AMDGPU.
Example:
θ =<parameters>
AMDGPU.record_memory!(true)
∇ =gradient(θ) do θ
...endapply!(θ, ∇) # in-place parameter update
AMDGPU.record_memory!(false) # bulk-free all allocations that happened during recording.
does cap the maximum memory usage, it does not really improve the performance, since when you hit a limit we manually trigger GC and it can easily take 600+ ms where only ~10-20ms would be spent actually freeing gpu memory:
So recording memory allocations and bulk-freeing them also helps with this
This issue has emerged multiple times on discord
https://discourse.julialang.org/t/memory-usage-increasing-with-each-epoch/121798
https://discourse.julialang.org/t/flux-memory-usage-high-in-srcnn/115174
https://discourse.julialang.org/t/out-of-memory-using-flux-cnn-during-back-propagation-phase/24492
https://discourse.julialang.org/t/flux-gpu-memory-problems/79783
and it could be related to #828 #302 #736 and JuliaGPU/CUDA.jl#137
This is a minimal example, involving only the forward pass, on Flux's master:
with output
Running multiple times
train_mlp()
the memory usage keeps ever increasing and more and more memory is reserved.Mitigation strategies are to set memory limit like
or to manually run the garbage collector
which slows done a lot if done every iteration.
This behavior is highly problematic because training runs quickly fill the gpu and one cannot run other gpu processes.
cc @maleadt
The text was updated successfully, but these errors were encountered: