1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/net/ipv4
Eric Dumazet 65466904b0 tcp: adjust TSO packet sizes based on min_rtt
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.

The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.

Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()

Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.

This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.

/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.

Tested:

100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.

Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)

~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96005

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         65,945.29 msec task-clock                #    2.845 CPUs utilized
         1,314,632      context-switches          # 19935.279 M/sec
             5,292      cpu-migrations            #   80.249 M/sec
           940,641      page-faults               # 14264.023 M/sec
   201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
    17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
   136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
    53,809,530,436      instructions              #    0.27  insn per cycle
                                                  #    2.54  stalled cycles per insn  (83.36%)
     9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
       153,008,621      branch-misses             #    1.69% of all branches          (83.32%)

      23.182970846 seconds time elapsed

TcpInSegs                       15648792           0.0
TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered              58654791           0.0
TcpExtTCPDeliveredCE            19                 0.0

After patch:

~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96046

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         48,982.58 msec task-clock                #    2.104 CPUs utilized
           186,014      context-switches          # 3797.599 M/sec
             3,109      cpu-migrations            #   63.472 M/sec
           941,180      page-faults               # 19214.814 M/sec
   153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
    12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
   120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
    36,803,672,106      instructions              #    0.24  insn per cycle
                                                  #    3.27  stalled cycles per insn  (83.18%)
     5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
        87,984,616      branch-misses             #    1.48% of all branches          (83.43%)

      23.281200256 seconds time elapsed

TcpInSegs                       1434706            0.0
TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered              58878971           0.0
TcpExtTCPDeliveredCE            9664               0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:05:44 -08:00
..
bpfilter net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" 2020-08-10 12:06:44 -07:00
netfilter netfilter: conntrack: pptp: use single option structure 2022-02-04 06:30:28 +01:00
af_inet.c gso: do not skip outer ip header in case of ipip and net_failover 2022-02-21 11:41:30 +00:00
ah4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
arp.c net: neigh: add skb drop reasons to arp_error_report() 2022-02-26 12:53:59 +00:00
bpf_tcp_ca.c bpf: reject program if a __user tagged memory accessed in kernel way 2022-01-27 12:03:46 -08:00
cipso_ipv4.c NET: IPV4: fix error "do not initialise globals to 0" 2021-09-19 12:43:56 +01:00
datagram.c net/ipv4/datagram.c: remove superfluous header files from datagram.c 2021-09-29 11:39:33 +01:00
devinet.c net: Add new protocol attribute to IP addresses 2022-02-18 21:20:06 -08:00
esp4.c Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6" 2022-01-27 07:34:06 +01:00
esp4_offload.c net: move gro definitions to include/net/gro.h 2021-11-16 13:16:54 +00:00
fib_frontend.c ipv4: Invalidate neighbour for broadcast address upon address addition 2022-02-21 11:44:30 +00:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c net: ipv4: remove superfluous header files from fib_notifier.c 2021-09-28 17:32:56 -07:00
fib_rules.c ipv4: Reject again rules with high DSCP values 2022-02-10 15:33:33 +00:00
fib_semantics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fou.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
icmp.c ipv4: do not use per netns icmp sockets 2022-01-25 11:25:21 +00:00
igmp.c ipv4: drop unused assignment 2021-11-14 12:20:44 +00:00
inet_connection_sock.c tcp: Use BPF timeout setting for SYN ACK RTO 2022-02-02 14:45:18 +00:00
inet_diag.c inet_diag: fix kernel-infoleak for UDP sockets 2021-12-10 21:14:49 -08:00
inet_fragment.c net: ip: Handle delivery_time in ip defrag 2022-03-03 14:38:48 +00:00
inet_hashtables.c tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. 2022-02-09 21:28:36 -08:00
inet_timewait_sock.c tcp: allocate tcp_death_row outside of struct netns_ipv4 2022-01-26 19:00:31 -08:00
inetpeer.c inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold 2021-03-01 13:32:12 -08:00
ip_forward.c net: Add skb_clear_tstamp() to keep the mono delivery_time 2022-03-03 14:38:48 +00:00
ip_fragment.c net: ip: Handle delivery_time in ip defrag 2022-03-03 14:38:48 +00:00
ip_gre.c gre: Don't accidentally set RTO_ONLINK in gre_fill_metadata_dst() 2022-01-11 20:36:08 -08:00
ip_input.c net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally 2022-03-03 14:38:48 +00:00
ip_options.c ipv4: drop fragmentation code from ip_options_build() 2022-01-29 17:53:07 +00:00
ip_output.c net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() 2022-03-03 14:38:48 +00:00
ip_sockglue.c ipv4: Exposing __ip_sock_set_tos() in ip.h 2021-11-20 14:11:00 +00:00
ip_tunnel.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ip_tunnel_core.c net: ip_tunnel: clean up endianness conversions 2021-01-08 19:25:35 -08:00
ip_vti.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipcomp.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
ipconfig.c net: ipconfig: Release the rtnl_lock while waiting for carrier 2021-10-28 14:36:41 +01:00
ipip.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-10 17:29:56 -08:00
ipmr_base.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
Kconfig net: ipv4: remove duplicate "the the" phrase in Kconfig text 2020-08-18 16:02:16 -07:00
Makefile bpf: Clean up sockmap related Kconfigs 2021-02-26 12:28:03 -08:00
metrics.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
netfilter.c netfilter: Dissect flow after packet mangling 2021-04-18 22:04:16 +02:00
netlink.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
nexthop.c nexthop: change nexthop_net_exit() to nexthop_net_exit_batch() 2022-02-08 20:41:33 -08:00
ping.c ping: remove pr_err from ping_lookup 2022-02-24 09:18:29 -08:00
proc.c tcp: allocate tcp_death_row outside of struct netns_ipv4 2022-01-26 19:00:31 -08:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw.c Networking fixes for 5.17-rc2, including fixes from netfilter and can. 2022-01-27 20:58:39 +02:00
raw_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
syncookies.c net: align static siphash keys 2021-11-16 19:07:54 -08:00
sysctl_net_ipv4.c tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
tcp.c tcp: autocork: take MSG_EOR hint into consideration 2022-03-09 20:05:20 -08:00
tcp_bbr.c bpf: Remove check_kfunc_call callback and old kfunc BTF ID API 2022-01-18 14:26:41 -08:00
tcp_bic.c tcp: fix stretch ACK bugs in BIC 2020-03-16 18:26:54 -07:00
tcp_bpf.c bpf, sockmap: Fix return codes from tcp_bpf_recvmsg_parser() 2022-01-05 20:43:08 +01:00
tcp_cdg.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_cong.c net: Only allow init netns to set default tcp cong to a restricted algo 2021-05-04 11:58:28 -07:00
tcp_cubic.c bpf: Remove check_kfunc_call callback and old kfunc BTF ID API 2022-01-18 14:26:41 -08:00
tcp_dctcp.c bpf: Remove check_kfunc_call callback and old kfunc BTF ID API 2022-01-18 14:26:41 -08:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
tcp_fastopen.c net/ipv4/tcp_fastopen.c: remove superfluous header files from tcp_fastopen.c 2021-09-20 13:09:06 +01:00
tcp_highspeed.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_htcp.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_hybla.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_illinois.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_input.c net: tcp: use tcp_drop_reason() for tcp_data_queue_ofo() 2022-02-20 13:55:31 +00:00
tcp_ipv4.c tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
tcp_lp.c ipv4: tcp_lp.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
tcp_metrics.c fixes-v5.11 2020-12-14 16:40:27 -08:00
tcp_minisocks.c tcp: Use BPF timeout setting for SYN ACK RTO 2022-02-02 14:45:18 +00:00
tcp_nv.c net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.c 2021-09-27 12:47:39 +01:00
tcp_offload.c net: move gro definitions to include/net/gro.h 2021-11-16 13:16:54 +00:00
tcp_output.c tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
tcp_rate.c tcp: tracking packets with CE marks in BW rate sample 2021-09-24 14:16:40 +01:00
tcp_recovery.c tcp: more accurately check DSACKs to grow RACK reordering window 2021-07-27 20:07:21 +01:00
tcp_scalable.c net: ipv4: delete repeated words 2020-08-24 17:31:20 -07:00
tcp_timer.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
tcp_ulp.c bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
tcp_vegas.c tcp: use semicolons rather than commas to separate statements 2020-10-13 17:11:52 -07:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_westwood.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_yeah.c tcp_yeah: check struct yeah size at compile time 2021-06-29 11:54:36 -07:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp.c net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb() 2022-02-07 11:18:49 +00:00
udp_bpf.c net: Implement ->sock_is_readable() for UDP and AF_UNIX 2021-10-26 12:29:33 -07:00
udp_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
udp_impl.h net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
udp_offload.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
udp_tunnel_core.c net/ipv4/udp_tunnel_core.c: remove superfluous header files from udp_tunnel_core.c 2021-09-21 10:17:20 +01:00
udp_tunnel_nic.c udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() 2022-02-23 12:35:00 +00:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udplite.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_input.c xfrm: state: remove extract_input indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_output.c xfrm: fix unused variable warning if CONFIG_NETFILTER=n 2020-05-11 15:12:27 +02:00
xfrm4_policy.c xfrm: use net device refcount tracker helpers 2021-12-09 11:51:45 -08:00
xfrm4_protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_state.c xfrm: remove output_finish indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_tunnel.c net/ipv4/xfrm4_tunnel.c: remove superfluous header files from xfrm4_tunnel.c 2021-09-23 10:10:00 +02:00