Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v22.4.1 AND v18.14.2 report abortIncoming (node:_http_server:806:17) when upload very-big file (20G+) , but the tcp socket is NOT teardown #55944

Open
navegador5 opened this issue Nov 21, 2024 · 1 comment

Comments

@navegador5
Copy link

Version

Node.js v22.4.1.

Platform

Linux dev 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04 LTS
Release:        22.04
Codename:       jammy


# ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 63498
max locked memory           (kbytes, -l) 2046520
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 63498
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_allowed_congestion_control = reno cubic
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_autocorking = 1
net.ipv4.tcp_available_congestion_control = reno cubic
net.ipv4.tcp_available_ulp = espintcp mptcp tls
net.ipv4.tcp_base_mss = 1024
net.ipv4.tcp_challenge_ack_limit = 1000
net.ipv4.tcp_comp_sack_delay_ns = 1000000
net.ipv4.tcp_comp_sack_nr = 44
net.ipv4.tcp_comp_sack_slack_ns = 100000
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_early_demux = 1
net.ipv4.tcp_early_retrans = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_ecn_fallback = 1
net.ipv4.tcp_fack = 0
net.ipv4.tcp_fastopen = 1
net.ipv4.tcp_fastopen_blackhole_timeout_sec = 0
net.ipv4.tcp_fastopen_key = 3d6fb714-9821d678-15c2886c-1bbdd4b0
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_frto = 2
net.ipv4.tcp_fwmark_accept = 0
net.ipv4.tcp_invalid_ratelimit = 500
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_l3mdev_accept = 0
net.ipv4.tcp_limit_output_bytes = 1048576
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_max_reordering = 300
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 360000
net.ipv4.tcp_mem = 786432       2097152 26777216
net.ipv4.tcp_migrate_req = 0
net.ipv4.tcp_min_rtt_wlen = 300
net.ipv4.tcp_min_snd_mss = 48
net.ipv4.tcp_min_tso_segs = 2
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_mtu_probe_floor = 48
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.tcp_no_ssthresh_metrics_save = 1
net.ipv4.tcp_notsent_lowat = 4294967295
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_pacing_ca_ratio = 120
net.ipv4.tcp_pacing_ss_ratio = 200
net.ipv4.tcp_probe_interval = 600
net.ipv4.tcp_probe_threshold = 8
net.ipv4.tcp_recovery = 1
net.ipv4.tcp_reflect_tos = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rmem = 4096        16384   33554432
net.ipv4.tcp_rx_skb_cache = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_slow_start_after_idle = 1
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_thin_linear_timeouts = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tx_skb_cache = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096        16384   33554432
net.ipv4.tcp_workaround_signed_windows = 0
net.mptcp.add_addr_timeout = 120
net.mptcp.allow_join_initial_addr_port = 1
net.mptcp.checksum_enabled = 0
net.mptcp.enabled = 1
net.mptcp.stale_loss_cnt = 4

Subsystem

No response

What steps will reproduce the bug?

  1. just using http.createServer to create a simple http server
  2. in chrome ,using a input<type=file> then using fetch to POST the File Obj, your file MUST be large enough(20G+)
  3. on httpServer, pipe the req to a fs.createWriteStream("xxxxx")
  4. just wait, when the file uploaded to about 10~12G, you maybe can get 【abortIncoming (node:_http_server:806:17)】
  5. Although this is NOT 100% to reproduce, BUT try 2-3 times ,you will get this error
  6. IF you use other http-server (such as uWebsocket ) everything woked well

see below:
`【request from client(chrome OR edge), client JUST use xmlhttp OR fetch to post a File object】
recv post {
host: '192.168.1.140:65535',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0',
'content-length': '20981630881',
accept: '/',
'accept-encoding': 'gzip, deflate',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
batchseq: '0',
'cache-control': 'no-cache',
'content-type': 'application/octet-stream',
name: 'paligemma-jax-paligemma-3b-pt-224-v1.tar.gz',
origin: 'http://192.168.1.140:65535',
pragma: 'no-cache',
referer: 'http://192.168.1.140:65535/',
size: '20981630881',
type: 'application%2Fx-gzip',
uiseq: 'events'
}
传输20981630881总耗时314.401s 文件位于 /home/cs6666-upld-srv/file/2024-11-21T12:56:35.146Z::0::paligemma-jax-paligemma-3b-pt-224-v1.tar.gz
[
false,
Error: aborted
at abortIncoming (node:_http_server:806:17)
at socketOnClose (node:_http_server:800:3)
at Socket.emit (node:events:532:35)
at TCP. (node:net:339:12) {
code: 'ECONNRESET'
}
]

//------------【
after the Abort message , the server report it receive a second POST from client(chrome OR edge)
BUT acturally
//-------------】
recv post {
host: '192.168.1.140:65535',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0',
'content-length': '20981630881',
accept: '/',
'accept-encoding': 'gzip, deflate',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
batchseq: '0',
'cache-control': 'no-cache',
'content-type': 'application/octet-stream',
name: 'paligemma-jax-paligemma-3b-pt-224-v1.tar.gz',
origin: 'http://192.168.1.140:65535',
pragma: 'no-cache',
referer: 'http://192.168.1.140:65535/',
size: '20981630881',
type: 'application%2Fx-gzip',
uiseq: 'events'
}

top - 21:33:52 up 113 days, 12:51, 12 users, load average: 3.21, 3.11, 2.17
Tasks: 295 total, 1 running, 294 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.8 us, 2.0 sy, 0.0 ni, 76.3 id, 21.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 15988.4 total, 163.1 free, 15663.0 used, 162.4 buff/cache
MiB Swap: 4096.0 total, 794.5 free, 3301.5 used. 42.8 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                      

306856 root 20 0 19.1g 14.7g 7248 D 8.3 94.2 4:44.11 node --------------------【IT COST nearly all memory】

【-----------------------------
client upload ONE
but server received two POST (each post i will creat a new file)
-> server (req,res ) handle triggered post
->abortIncoming (node:_http_server:806:17)
-> server (req,res ) handle triggered post
the tcp socket is same

ls -l file/

total 14874992
-rw-r--r-- 1 root root 12621333720 Nov 21 21:01 2024-11-21T12:56:35.146Z::0::paligemma-jax-paligemma-3b-pt-224-v1.tar.gz
-rw-r--r-- 1 root root 2610647040 Nov 21 21:03 2024-11-21T13:01:49.563Z::0::paligemma-jax-paligemma-3b-pt-224-v1.tar.gz`

How often does it reproduce? Is there a required condition?

you need to upload a BIG-FILE (20G+) to triggered it

NOT always。

BUT high. (try 2-3 times)

What is the expected behavior? Why is that the expected behavior?

IF 'ECONNRESET' triggered, node should tear down the tcp-socket.

What do you see instead?

when 'ECONNRESET' triggered, node-js http-server still live, AND wrongly report recv NEW request

Additional information

it NOT 100% to trigger it. you maybe need try it on different machine FOR serveral times

@navegador5
Copy link
Author

[9809629.918039] Out of memory: Killed process 306856 (node) total-vm:21659964kB, anon-rss:15465044kB, file-rss:2268kB, shmem-rss:0kB, UID:0 pgtables:45924kB oom_score_adj:0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant