Skip to content

Latest commit

 

History

History
368 lines (307 loc) · 18.5 KB

tcp.md

File metadata and controls

368 lines (307 loc) · 18.5 KB

Transmission Control Protocol (TCP)

(Back to Home)

Table of Contents

Introduction

Reliable Communication Mechanisms

  • Confirm delivery
    • Sender gets Acknowledgement (ACK) Packets from the receiver
  • No loss at receiver
    • Flow Control
    • Sliding Window
  • Detect corrupted packets
    • Checksum
  • Detect lost packets
    • Set up a timer on the sender.
    • Retransmission Timeout (RTO)
  • Recover from lost packets
    • Re-transmit packets
    • Automatic Repeat Request (ARQ)
  • Detect duplicates
    • Add a Sequence Number to each packet.
  • In-order delivery
    • Add a Sequence Number to each packet.
  • Multiplexing and De-multiplexing

Important Terms

  • CWND
    • Congestion Window
  • RAWND
    • Receiver Advertised Window
    • Available buffer space on receiver sent to client.
  • SWS
    • Sender Window Size
    • SWS = min(CWND, RAWND)
  • ACK
    • Acknowledgement Packet/Datagram
    • Packet that acknowledges the receipt of another packet.
  • MSS
    • Maximum Segment Size
    • Maximum payload (data) size per segment/datagram (Transport Layer).
  • MTU
    • Maximum Transmission Unit
    • Maximum payload (data) size per frame (Data Link Layer).
  • RTT
    • Round Trip Time
    • Time from the start of the first packet sent from the CWND to the receipt of the ACK of the last packet sent from e current CWND.
  • Capacity/Bandwidth > Throughput > Goodput
    • Capacity/Bandwidth
      • The total transmission/sending rate of the link.
      • Measured in bps (Bits per Second)
    • Throughput
      • Actual transmission/sending rate available after losses.
      • Includes new data and retransmitted data.
      • Measured in bps (Bits per Second)
    • Goodput
      • The transmission/sending rate of new data.
      • Measured in bps (Bits per Second)

Congestion Control Algorithms

Read from Section 3.6 and Section 3.7 of the 'Computer Networking - A Top-Down Approach' book.

Slow Start (SS)

  • Exponential growth (Doubling) to rapidly increase sending rate
  • 'SS Threshold's
    • CWND size at which SS stops.
  • CWND
    • Initial
      • CWND = 1 MSS
      • SS threshold = Large value
    • Incrementing CWND
      • CWND size = CWND size + 1 MSS per ACK, which implies doubling the CWND size every RTT.
      • Incrementing CWND stops when CWND = min(SS Threshold, RAWND).
        • New SS Threshold value?
      • SS phase stop (Whichever of the following occurs first.)
        • Packet loss occurs
          • Implies congestion
          • SS Threshold = CWND size / 2
          • Packet loss indicators
            • RTO expiring (Timeout)
              • Heavy congestion
            • Duplicate ACKs
              • Low to moderate congestion
        • SS Threshold is reached
        • RAWND is reached
          • Rate of sending becomes constant
  • Ramps up sending rate faster than AIMD.

Congestion Avoidance

  • Linear increase of sending rate rather than exponential increase (as in Slow Start), as Congestion Avoidance is slowly probing for congestion point.
    • Need to probe for congestion point to be able to operate at optimal throughput (just below link capacity).
  • Congestion Avoidance practices AIMD (Additive Increase, Multiplicative Decrease).
  • Starts after 'SS Threshold' is hit in Slow Start.
  • AIMD
    • Additive Increase (AI)
      • CWND size = CWND size + (1 / MSS) per ACK, which implies increasing the CWND size by one MSS every RTT.
      • Linear increase
    • Multiplicative Decrease (MD)
      • CWND size = CWND size / 2
      • This is a multiplicative decrease, as CWND decreases by a factor of 1 / 2.

Fast Retransmit

  • On receiving three consecutive duplicate ACKs, the sender immediately re-transmits the assumingly lost packet.
  • This is done to utilize the channel appropriately and not have wait times with no packet sending till the RTO expires to trigger a re-transmission.
  • There is a chance that the packet was not lost and will just reach late, but to hasten the transfer to use the link capacity optimally, Fast Retransmit is used.
    • This is fair to do, as loss detection by duplicate ACKs implies that the network is not as congested as when a loss is detected by a RTO expiring (which implies that no packets can be sent or received), so it is okay to retransmit without maybe requiring to, to hasten up communication and increase communication efficiency.

Fast Recovery

Explicit Congestion Notification

Versions of TCP

  • TCP Tahoe
  • TCP Reno
  • TCP NewReno
  • TCP CUBIC
  • TCP Vegas
  • TCP BBR (TCP Bottleneck Bandwidth and RTT)
  • CTCP (Compound TCP)
  • FAST TCP (FAST Active Queue Management Scalable TCP)
  • TCP Veno
  • TCP Westwood
  • TCP Bic
  • H-TCP (TCP Hamilton)
  • HS-TCP (Highspeed TCP)
  • TCP Hybla
  • TCP Illinois
  • TCP SACK
  • DCTCP (Data Center TCP)

and more...

TCP Tahoe

  • A Loss-based Congestion Control Algorithm.
  • Congestion Control algorithms used
  • Only timeouts were used to detect packet loss, so CWND size = 1 MSS after every RTO expiry.

TCP Tahoe Time vs CWND size graph

TCP Reno

TCP Reno Time vs CWND size graph

TCP CUBIC

  • A Loss-based Congestion Control Algorithm.
  • Similar to TCP Reno, but has changes in the Congestion Avoidance phase.
  • More info

TCP Vegas

  • A Delay-based Congestion Control Algorithm.
  • It compares the current Throughput with Throughput when the link was uncongested, and decides the current sending rate based on that.
  • More info

TCP BBR

DCTCP

Enabling a TCP Congestion Control Algorithm

Instructions for Linux.

  • Check available TCP Congestion Control algorithms

    $ sysctl net.ipv4.tcp_available_congestion_control
    net.ipv4.tcp_available_congestion_control = reno cubic
  • Check the current TCP Congestion Control algorithm

    $ sysctl net.ipv4.tcp_congestion_control
    net.ipv4.tcp_congestion_control = cubic
  • List all available loadable TCP Congestion Control Linux kernel modules

    $ find /lib/modules/$(uname -r) -type f -name '*.ko*' | grep tcp
    /lib/modules/4.15.0-169-generic/kernel/net/netfilter/xt_tcpudp.ko
    /lib/modules/4.15.0-169-generic/kernel/net/netfilter/xt_tcpmss.ko
    /lib/modules/4.15.0-169-generic/kernel/net/rds/rds_tcp.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_dctcp.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_hybla.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_vegas.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_bic.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_nv.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_cdg.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_veno.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_diag.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_bbr.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_illinois.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_westwood.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_yeah.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_probe.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_highspeed.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_scalable.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_htcp.ko
    /lib/modules/4.15.0-169-generic/kernel/net/ipv4/tcp_lp.ko
    /lib/modules/4.15.0-169-generic/kernel/drivers/usb/typec/tcpm.ko
    /lib/modules/4.15.0-169-generic/kernel/drivers/atm/atmtcp.ko
    /lib/modules/4.15.0-169-generic/kernel/drivers/rapidio/switches/idtcps.ko
    /lib/modules/4.15.0-169-generic/kernel/drivers/scsi/libiscsi_tcp.ko
    /lib/modules/4.15.0-169-generic/kernel/drivers/scsi/iscsi_tcp.ko
    /lib/modules/4.15.0-169-generic/kernel/drivers/staging/typec/tcpci.ko
  • Load the DCTCP Linux kernel module

    $ sudo modprobe tcp_dctcp
  • Check available TCP Congestion Control algorithms again

    $ sysctl net.ipv4.tcp_available_congestion_control
    net.ipv4.tcp_available_congestion_control = reno cubic dctcp
  • The current TCP Congestion Control algorithm can be changed as well

    $ sudo vim /etc/sysctl.conf # Add `net.ipv4.tcp_congestion_control=dctcp` to the last line of the file.
    $ sudo sysctl -p # Load the configuration (from `/etc/sysctl.conf`) to apply the changes
    
    # OR
    
    $ sudo sysctl net.ipv4.tcp_congestion_control = dctcp

TCP Head-of-Line Blocking

  • HoLB: Head-of-Line Blocking
  • How does HTTP2 solve Head of Line blocking (HOL) issue
  • Multiple messages multiplexed over a single TCP connection (as in HTTP/2) implies that even if only one packet at the start of the Congestion Window (CWND) needs to be retransmitted, all the packets after it will be buffered at the receiver and not be handed to their respective streams up the networking stack even if the individual packets that might be belonging to different streams are complete.

TCP Segmentation Offload

Resources