Branch: refs/heads/doxycov Home: https://github.com/kronosnet/kronosnet Commit: 16a9190616b3875232c7ad26efde6b2eb0cb2c1b https://github.com/kronosnet/kronosnet/commit/16a9190616b3875232c7ad26efde6b... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-08-12 (Mon, 12 Aug 2019)
Changed paths: M libknet/Makefile.am M libknet/crypto.c M libknet/crypto_model.h M libknet/crypto_nss.c M libknet/crypto_openssl.c M libknet/internals.h M libknet/links.c A libknet/onwire.c M libknet/onwire.h M libknet/tests/Makefile.am M libknet/tests/api_knet_send_crypto.c A libknet/tests/fun_pmtud_crypto.c M libknet/threads_common.c M libknet/threads_pmtud.c
Log Message: ----------- [PMTUd] rework the whole math to calculate MTU
internal changes: - drop the concept of sec_header_size that was completely wrong and unnecessary - bump crypto API to version 3 due to the above change - clarify the difference between link->proto_overhead and link->status->proto_overhead. We cannot rename the status one as it would also change ABI. - add onwire.c with documentation on the packet format and what various len(s) mean in context. - add 3 new functions to calculate MTUs back and forth and use them around, hopefully with enough clarification on why things are done in a given way. - heavily change thread_pmtud.c to use those new facilities. - fix major calculation issues when using crypto (non-crypto was not affected by the problem). - fix checks around to make sure they match the new math. - fix padding calculation. - add functional PMTUd crypto test this test can take several hours (12+) and should be executed on a controlled environment since it automatically changes loopback MTU to run tests. - fix way the lowest MTU is calculated during a PMTUd run to avoid spurious double notifications. - drop redundant checks.
user visible changes: - Global MTU is now calculated properly when using crypto and values will be in general bigger than before due to incorrect padding calculation in the previous implementation.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 434299300a2f23acd96f2f287939549ab2944411 https://github.com/kronosnet/kronosnet/commit/434299300a2f23acd96f2f28793954... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-08-13 (Tue, 13 Aug 2019)
Changed paths: M libknet/internals.h M libknet/links.c M libknet/links.h M libknet/threads_pmtud.c
Log Message: ----------- [PMTUd] add dynamic pong timeout when using crypto
problem originally reported by proxmox community, users observed that under pressure the MTU would flap back and forth between 2 values due to other node response timeout.
implement a dynamic timeout multiplier when using crypto that should solve the problem in a more flexible fashion.
When a timeout hits, those new logs will show:
[knet]: [info] host: host: 1 (passive) best link: 0 (pri: 0) [knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0 [knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (4) for host 1 link: 0 [knet]: [info] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 65429 [knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429 [knet]: [info] pmtud: Global data MTU changed to: 65429 [knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0 [knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (8) for host 1 link: 0 [knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (16) for host 1 link: 0 [knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (32) for host 1 link: 0 [knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (64) for host 1 link: 0 [knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429 [knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0 [knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (128) for host 1 link: 0 [knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
and when the latency reduces and it is safe to be more responsive again:
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0 [knet]: [debug] pmtud: Decreasing PMTUd response timeout multiplier to (64) for host 1 link: 0 [knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
....
testing this patch on normal hosts is a bit challenging tho.
Patch was tested by hardcoding a super low timeout here:
diff --git a/libknet/threads_pmtud.c b/libknet/threads_pmtud.c index 4f0ba0f..5e2b89b 100644 --- a/libknet/threads_pmtud.c +++ b/libknet/threads_pmtud.c @@ -261,7 +271,8 @@ retry: /* * crypto, under pressure, is a royal PITA */ - pong_timeout_adj_tmp = dst_link->pong_timeout_adj * 2; + //pong_timeout_adj_tmp = dst_link->pong_timeout_adj * dst_link->pmtud_crypto_timeout_multiplier; + pong_timeout_adj_tmp = 30 * dst_link->pmtud_crypto_timeout_multiplier; } else { pong_timeout_adj_tmp = dst_link->pong_timeout_adj; }
and using a long running version of api_knet_send_crypto_test with a short PMTUd setfreq (10 sec).
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 5a49e03f40c85ae6bf0d0b0ab9e16c8128a53c36 https://github.com/kronosnet/kronosnet/commit/5a49e03f40c85ae6bf0d0b0ab9e16c... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-08-20 (Tue, 20 Aug 2019)
Changed paths: M libknet/handle.c M libknet/internals.h M libknet/libknet.h M libknet/tests/api-check.mk A libknet/tests/api_knet_handle_pmtud_set.c M libknet/threads_pmtud.c M man/Makefile.am
Log Message: ----------- [PMTUd] add ability to manually override MTU and disable PMTUd
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 657f43fd4345afed3546181361bae6889c3c40d8 https://github.com/kronosnet/kronosnet/commit/657f43fd4345afed3546181361bae6... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-08-21 (Wed, 21 Aug 2019)
Changed paths: M libknet/Makefile.am M libknet/crypto.c M libknet/crypto_model.h M libknet/crypto_nss.c M libknet/crypto_openssl.c M libknet/handle.c M libknet/internals.h M libknet/libknet.h M libknet/links.c M libknet/links.h A libknet/onwire.c M libknet/onwire.h M libknet/tests/Makefile.am M libknet/tests/api-check.mk A libknet/tests/api_knet_handle_pmtud_set.c M libknet/tests/api_knet_send_crypto.c A libknet/tests/fun_pmtud_crypto.c M libknet/threads_common.c M libknet/threads_pmtud.c M man/Makefile.am
Log Message: ----------- Merge pull request #245 from kronosnet/pmtud-fixes
[PMTUd] rework the whole math to calculate MTU
Commit: 1c306982de5ad3718655df9378db462a434d3fc9 https://github.com/kronosnet/kronosnet/commit/1c306982de5ad3718655df9378db46... Author: Jan Friesse jfriesse@redhat.com Date: 2019-08-26 (Mon, 26 Aug 2019)
Changed paths: M libknet/libknet.h
Log Message: ----------- [man] Fix priority description of POLICY_PASSIVE
... to match source code.
Signed-off-by: Jan Friesse jfriesse@redhat.com
Commit: 4601f8e30e35296bcd542959417b35f3fe7a6cd4 https://github.com/kronosnet/kronosnet/commit/4601f8e30e35296bcd542959417b35... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-08-27 (Tue, 27 Aug 2019)
Changed paths: M libknet/libknet.h
Log Message: ----------- Merge pull request #248 from jfriesse/fix-prio-description
[man] Fix priority description of POLICY_PASSIVE
Commit: f4a73758443fe52de22c7240110cdbca31ac13e9 https://github.com/kronosnet/kronosnet/commit/f4a73758443fe52de22c7240110cdb... Author: Jan Friesse jfriesse@redhat.com Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths: M libknet/compat.c M libknet/crypto.c M libknet/onwire.c
Log Message: ----------- [common] Include correct errno.h
sys/errno.h is system-specific path and errno.h should be used instead.
Signed-off-by: Jan Friesse jfriesse@redhat.com
Commit: 55d8c75abe5d7a1e4029a076f71117c7de82bc32 https://github.com/kronosnet/kronosnet/commit/55d8c75abe5d7a1e4029a076f71117... Author: Jan Friesse jfriesse@redhat.com Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths: M configure.ac M libknet/common.c
Log Message: ----------- [common] Conditionalize RTLD_DI_ORIGIN
RTLD_DI_ORIGIN is used to get absolute path of plugin. It is used only for logging useful info and not strictly needed, so use it only when it is defined (only musl is known to author of the patch)
Signed-off-by: Jan Friesse jfriesse@redhat.com
Commit: 80a329ad17ae4093e89b96cf8b1516c116b659f5 https://github.com/kronosnet/kronosnet/commit/80a329ad17ae4093e89b96cf8b1516... Author: Jan Friesse jfriesse@redhat.com Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths: M libknet/handle.c M libknet/internals.h
Log Message: ----------- [handle] Set thread stack size on create
Musl libc has small stack size for threads. Knet needs ~300KiB (tested at the time when this patch was created). Glibc seems to use ~8MiB. As a compromise, 1MiB is used.
Signed-off-by: Jan Friesse jfriesse@redhat.com
Commit: 512e433b0b3d8bf14818dcb8c8eb7748d7e899be https://github.com/kronosnet/kronosnet/commit/512e433b0b3d8bf14818dcb8c8eb77... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths: M configure.ac M libknet/common.c M libknet/compat.c M libknet/crypto.c M libknet/handle.c M libknet/internals.h M libknet/onwire.c
Log Message: ----------- Merge pull request #250 from jfriesse/musl_fix
Fix compilation and running on Linux distribution with musl libc
Commit: 0f67ee86745d52d68f376c92e96e1dd6661e9f5d https://github.com/kronosnet/kronosnet/commit/0f67ee86745d52d68f376c92e96e1d... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-06 (Fri, 06 Sep 2019)
Changed paths: M libknet/threads_heartbeat.c M libknet/threads_rx.c
Log Message: ----------- [links] stabilize latency calculation when nodes are not responsive
The following scenario is more of a corner case than normal, but this change allows to better deal with this situation:
1) 2 nodes cluster (corosync) (node A and node B) 2) kill -stop $(pidof corosync) on node A 3) node B will continue to send ping packets to node A 4) node A is accumulating those ping packets in the kernel network socket 5) wait some seconds and unpause node A 6) node A will start processing the ping packets in the queue and send pong replies to node B 7) node B will see an extreme increase of latency due those "obsoleted" ping/pong packets 8) node B, as latency increases, will take longer and longer to notice that node A is down due to the pong_timeout adjustment for latency (required for initial cluster spike).
the solution:
1) Use average latency to calculate pong_timeout_adj vs latency_max. Averate latency will go down again in time, while latency_max is never reset.
2) RX thread will filter out all pong packets that have higher latency than currently configure pong_timeout. This barrier should have been in place even before.
this solution reduces the latency spike on node B to a perfectly reasonable level and it will all eventually stabilize over time as latency samples increase and latency will reduce.
Please be aware that using a pong_timeout smaller than latency will simply mark the link down now.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 28e2b563e8acb0ac0eeb7a4c39efc6e7bf54ec53 https://github.com/kronosnet/kronosnet/commit/28e2b563e8acb0ac0eeb7a4c39efc6... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-09 (Mon, 09 Sep 2019)
Changed paths: M libknet/threads_heartbeat.c M libknet/threads_rx.c
Log Message: ----------- Merge pull request #251 from kronosnet/latency-fixes
[links] stabilize latency calculation when nodes are not responsive
Commit: f45e4c67902b95bcd212275f5f6081fa31311793 https://github.com/kronosnet/kronosnet/commit/f45e4c67902b95bcd212275f5f6081... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-09 (Mon, 09 Sep 2019)
Changed paths: M libknet/threads_pmtud.c
Log Message: ----------- [pmtud] switch to use async version of dstcache update due to locking context (read vs write)
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 21105dab21e5bac0f01dd80565ce86f35587196b https://github.com/kronosnet/kronosnet/commit/21105dab21e5bac0f01dd80565ce86... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-10 (Tue, 10 Sep 2019)
Changed paths: M libknet/threads_pmtud.c
Log Message: ----------- Merge pull request #252 from kronosnet/lock-fix
[pmtud] switch to use async version of dstcache update due to locking…
Commit: 80ea7979ff02ec81e781c3d094d54323700c00c4 https://github.com/kronosnet/kronosnet/commit/80ea7979ff02ec81e781c3d094d543... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-12 (Thu, 12 Sep 2019)
Changed paths: M libnozzle/libnozzle.c
Log Message: ----------- [nozzle] fix tapX range on newer FreeBSD
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: d9a5e4564db3c0a5977511a35ad1903c7f0013da https://github.com/kronosnet/kronosnet/commit/d9a5e4564db3c0a5977511a35ad190... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-12 (Thu, 12 Sep 2019)
Changed paths: M libnozzle/libnozzle.c
Log Message: ----------- Merge pull request #253 from kronosnet/bsd-fixes
[nozzle] fix tapX range on newer FreeBSD
Commit: b4239fa2833a316c94bc0f7e9ccbc532bcc889d1 https://github.com/kronosnet/kronosnet/commit/b4239fa2833a316c94bc0f7e9ccbc5... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-13 (Fri, 13 Sep 2019)
Changed paths: M libnozzle/tests/test-common.c
Log Message: ----------- [tests] fix ip generation boundaries
https://ci.kronosnet.org/job/knet-build-all-voting/1450/knet-build-all-votin...
and similar, when pid = 255, the secondary IP would hit 256 that is of course invalid.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 954c06950b3594918bd82a93859370d5b5cea627 https://github.com/kronosnet/kronosnet/commit/954c06950b3594918bd82a93859370... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-13 (Fri, 13 Sep 2019)
Changed paths: M libknet/tests/api_knet_handle_pmtud_set.c
Log Message: ----------- [tests] give PMTUd more time to redetect MTU
Ideal fix would be to use PMTUd callback, but that requires a lot of extra test infrastructure. For now just workaround the problem.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 63d5a3eb52e1fe1efae56bdbbd81181e88266c36 https://github.com/kronosnet/kronosnet/commit/63d5a3eb52e1fe1efae56bdbbd8118... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-13 (Fri, 13 Sep 2019)
Changed paths: M libknet/tests/api_knet_handle_pmtud_set.c M libnozzle/tests/test-common.c
Log Message: ----------- Merge pull request #254 from kronosnet/test-fixes
Test fixes
Commit: 93f3df56ce1008c362df679b2768edbf2e5a860a https://github.com/kronosnet/kronosnet/commit/93f3df56ce1008c362df679b2768ed... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-19 (Thu, 19 Sep 2019)
Changed paths: M libknet/links.c
Log Message: ----------- [links] fix memory corryption of link structure
the index would overflow the buffer and overwrite data in the link structure. Depending on what was written the cluster could fall apart in many ways, from crashing, to hung.
Fixes: https://github.com/kronosnet/kronosnet/issues/255
thanks to the proxmox developers and community for reporting the issue and for all the help reproducing / debugging the problem.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 89213e429ef7eb8241606313b99ff413afdb8662 https://github.com/kronosnet/kronosnet/commit/89213e429ef7eb8241606313b99ff4... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-19 (Thu, 19 Sep 2019)
Changed paths: M libknet/links.c
Log Message: ----------- Merge pull request #257 from kronosnet/netload-fixes
[links] fix memory corryption of link structure
Commit: eba47802dc84f096b0dca3c6963baa1d94bed6a2 https://github.com/kronosnet/kronosnet/commit/eba47802dc84f096b0dca3c6963baa... Author: Ferenc Wágner wferi@debian.org Date: 2019-09-25 (Wed, 25 Sep 2019)
Changed paths: M libknet/tests/api_knet_send.c
Log Message: ----------- tests: skip the SCTP test if SCTP is not supported by the kernel
For example, module loading is disabled on Debian build daemons. (In the vein of c5aa1c3343703455b480cef5c173f471e1bb020f.)
Signed-off-by: Ferenc Wágner wferi@debian.org
Commit: 1c5d845851028e8e423095dab5a2d87c8eff1437 https://github.com/kronosnet/kronosnet/commit/1c5d845851028e8e423095dab5a2d8... Author: Ferenc Wágner wferi@debian.org Date: 2019-09-25 (Wed, 25 Sep 2019)
Changed paths: M libknet/transport_sctp.c
Log Message: ----------- Fix typo: trasport -> transport
Signed-off-by: Ferenc Wágner wferi@debian.org
Commit: 9f10144d0274c89423d8422ef826a02459e602d8 https://github.com/kronosnet/kronosnet/commit/9f10144d0274c89423d8422ef826a0... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-25 (Wed, 25 Sep 2019)
Changed paths: M libknet/tests/api_knet_send.c M libknet/transport_sctp.c
Log Message: ----------- Merge pull request #258 from kronosnet/wferi/fixes
Assorted small fixups
Commit: 728ca4fb953992be7100a33c5720496ae5fef5c1 https://github.com/kronosnet/kronosnet/commit/728ca4fb953992be7100a33c572049... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-26 (Thu, 26 Sep 2019)
Changed paths: M libknet/tests/api_knet_handle_pmtud_set.c M libknet/tests/api_knet_link_set_enable.c M libknet/tests/test-common.c M libknet/tests/test-common.h
Log Message: ----------- [tests] add common function to sleep based on how the test suite is running
Address issue while waiting for host to be up and PMTUd first run.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: a0128b93d4a6b5637cbc5885a0a64e333d93a414 https://github.com/kronosnet/kronosnet/commit/a0128b93d4a6b5637cbc5885a0a64e... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-09-26 (Thu, 26 Sep 2019)
Changed paths: M libknet/tests/api_knet_handle_pmtud_set.c M libknet/tests/api_knet_link_set_enable.c M libknet/tests/test-common.c M libknet/tests/test-common.h
Log Message: ----------- Merge pull request #260 from kronosnet/test-suite
[tests] add common function to sleep based on how the test suite is r…
Commit: f2bb002911d669f1b8c07cba5f86c580d4e30bf3 https://github.com/kronosnet/kronosnet/commit/f2bb002911d669f1b8c07cba5f86c5... Author: Thomas Lamprecht t.lamprecht@proxmox.com Date: 2019-10-08 (Tue, 08 Oct 2019)
Changed paths: M man/doxyxml.c
Log Message: ----------- doxyxml: print_param: fix heap-buffer-overflow on read
in read_struct we can get the pi->paramtype assigned with:
pi->paramtype = type?strdup(type):strdup("");
And in print_param we then always check the last character by getting the strlen and subtracting one. But in the case where either type was NULL and we assigned an empty string, or type wasn't null but pointing to an empty string we ran into an read-heap-buffer-overflow as here strlen is zero, and so we the first if branch evaluated to
if (pi->paramtype[-1] == '*') {
which isn't valid. Depending on the OS, protection of surrounding area due to said OS or the compiler, this can crash the program.
Similar issue was the case for the next check for double pointers, here for all strings with strlen < 2.
To solve this get the strlen early and check if we cannot underflow before doing the real read.
Signed-off-by: Thomas Lamprecht t.lamprecht@proxmox.com
Commit: f2f1fe9162ca82d45187ab0b26009207932686f0 https://github.com/kronosnet/kronosnet/commit/f2f1fe9162ca82d45187ab0b260092... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-09 (Wed, 09 Oct 2019)
Changed paths: M man/doxyxml.c
Log Message: ----------- Merge pull request #262 from ThomasLamprecht/fix-doxyxml-overflow
doxyxml: print_param: fix heap-buffer-overflow on read
Commit: 38e40998ec6b843218251a2b56cb056d5b9fbc6e https://github.com/kronosnet/kronosnet/commit/38e40998ec6b843218251a2b56cb05... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-09 (Wed, 09 Oct 2019)
Changed paths: M configure.ac M man/Makefile.am
Log Message: ----------- [build] add --with-sanitizers= option for sanitizer builds
this option is stricly meant for runtime debugging purposes. do NOT use in production.
check gcc/clang man pages on how to use ASAN/UBSAN/TSAN.
Also allow users to specificy SANITIZERS_CFLAGS and SANITIZERS_LDFLAGS for advanced use.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 7c470fe6fe90d7a523ca0bfc238ed89e0948f940 https://github.com/kronosnet/kronosnet/commit/7c470fe6fe90d7a523ca0bfc238ed8... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-09 (Wed, 09 Oct 2019)
Changed paths: M configure.ac M man/Makefile.am
Log Message: ----------- Merge pull request #263 from kronosnet/runtime-debug
[build] add --with-sanitizers= option for sanitizer builds
Commit: a081ff7fab6bb98f9ff2a88b1593776b0c27b6f1 https://github.com/kronosnet/kronosnet/commit/a081ff7fab6bb98f9ff2a88b159377... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-15 (Tue, 15 Oct 2019)
Changed paths: M libknet/host.c
Log Message: ----------- [host] rename variables to make it easier to read the code
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 3fb0166ebd14e37a2fb9fe7aeae53d09e0e66b74 https://github.com/kronosnet/kronosnet/commit/3fb0166ebd14e37a2fb9fe7aeae53d... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-15 (Tue, 15 Oct 2019)
Changed paths: M libknet/host.c
Log Message: ----------- [host] fix defrag buffers reclaim logic
The problem:
- let's assume a 2 nodes (A and B) cluster setup - node A sends fragmented packets to node B and there is packet loss on the network. - node B receives all those fragments and attempts to reassemble them. - node A sends packet seq_num X in Y fragments. - node B receives only part of the fragments and stores them in a defrag buf. - packet loss stops. - node A continues to send packets and a seq_num roll-over takes place. - node A sends a new packet seq_num X in Y fragments. - node B gets confused here because the parts of the old packet seq_num X are still stored and the buffer has not been reclaimed. - node B continues to rebuild packet seq_num X with old stale data and new data from after the roll-over. - node B completes reassembling the packet and delivers junk to the application.
The solution:
Add a much stronger buffer reclaim logic that will apply on each received packet and not only when defrag buffers are needed, as there might be a mix of fragmented and not fragmented packets in-flight.
The new logic creates a window of N packets that can be handled at the same time (based on the number of buffers) and clear everything else.
Fixes https://github.com/kronosnet/kronosnet/issues/261
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 8b2863b392d275ea50fa19e8e8bebe40a6134707 https://github.com/kronosnet/kronosnet/commit/8b2863b392d275ea50fa19e8e8bebe... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-15 (Tue, 15 Oct 2019)
Changed paths: M libknet/threads_rx.c
Log Message: ----------- [rx] copy data into the defrag buffer only if we know the size of the frame
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: d39c189900ef1a5647c7264e799f793ab9fd93e2 https://github.com/kronosnet/kronosnet/commit/d39c189900ef1a5647c7264e799f79... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-15 (Tue, 15 Oct 2019)
Changed paths: M libknet/tests/knet_bench.c
Log Message: ----------- [test] add ability to knet_bench to specify a fixed packet size for perf test
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 34c08ae7e7903d79f06569b4f506a00c15af0238 https://github.com/kronosnet/kronosnet/commit/34c08ae7e7903d79f06569b4f506a0... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-15 (Tue, 15 Oct 2019)
Changed paths: M libknet/threads_pmtud.c
Log Message: ----------- [PMTUd] invalidate MTU for a link if the value is lower than minimum
Under heavy network load and packet loss, calculated MTU can be too small. In that case we need to invalidate the link mtu, that would remove the link from the rotation (and traffic) and would give PMTUd time to get the right MTU in the next round.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 292b2e06380d75001f8991648eebf9764c102d24 https://github.com/kronosnet/kronosnet/commit/292b2e06380d75001f8991648eebf9... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-15 (Tue, 15 Oct 2019)
Changed paths: M libknet/host.c M libknet/tests/knet_bench.c M libknet/threads_pmtud.c M libknet/threads_rx.c
Log Message: ----------- Merge pull request #264 from kronosnet/netload-fixes
Netload fixes
Commit: aaea6cbe4b1d2dfc1eba266ec94fd785d6aa28bd https://github.com/kronosnet/kronosnet/commit/aaea6cbe4b1d2dfc1eba266ec94fd7... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths: M libknet/tests/knet_bench.c
Log Message: ----------- [test] add packet verification option to knet_bench
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 42b9b2c80b7ca160ee89f219ef8e66821f9f66bd https://github.com/kronosnet/kronosnet/commit/42b9b2c80b7ca160ee89f219ef8e66... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths: M libknet/tests/knet_bench.c
Log Message: ----------- Merge pull request #265 from kronosnet/netload-fixes
[test] add packet verification option to knet_bench
Commit: 86665cb6f74cffc05affe54e7135f9a5174312cd https://github.com/kronosnet/kronosnet/commit/86665cb6f74cffc05affe54e7135f9... Author: Ferenc Wágner wferi@debian.org Date: 2019-10-18 (Fri, 18 Oct 2019)
Changed paths: M libknet/tests/api_knet_send.c
Log Message: ----------- [test] append newline to knet_send timeout message
Commit: 7f6846dd36c0eef7dc5e824756c320fa9cc4a520 https://github.com/kronosnet/kronosnet/commit/7f6846dd36c0eef7dc5e824756c320... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-18 (Fri, 18 Oct 2019)
Changed paths: M libknet/tests/api_knet_send.c
Log Message: ----------- Merge pull request #266 from kronosnet/wferi/newline
[test] append newline to knet_send timeout message
Commit: ff26b372bd6e31c8e3bf9c1df18debf219a80127 https://github.com/kronosnet/kronosnet/commit/ff26b372bd6e31c8e3bf9c1df18deb... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-18 (Fri, 18 Oct 2019)
Changed paths: M libknet/threads_rx.c
Log Message: ----------- [RX] Discard incoming packets if knet cannot reply back.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 41401bc005f40256af8bfe6c47e51afa4106ac36 https://github.com/kronosnet/kronosnet/commit/41401bc005f40256af8bfe6c47e51a... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-18 (Fri, 18 Oct 2019)
Changed paths: M libknet/tests/api_knet_send.c M libknet/threads_tx.c
Log Message: ----------- [TX] discard too big packets when reading from socketpairs
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: decf2dfdb25f5ff01faf0e4ac67fd31fbe0674cb https://github.com/kronosnet/kronosnet/commit/decf2dfdb25f5ff01faf0e4ac67fd3... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-19 (Sat, 19 Oct 2019)
Changed paths: M libknet/threads_rx.c
Log Message: ----------- [RX] handle short write to the application properly
this change affects only applications that are not using knet generated socketpairs to deliver/receive data to/from knet.
If an application uses a fd that is not SOCK_SEQPACKET (basically streaming), we have to handle short writes accordingly, and knet will continue delivering as long as there is progress.
The application is responsible to verify that the data packet is complete as the delivery is not guaranteed to be complete. The application can either embed the size of the packet in their data structure or use the socket error notification callback that will be invoked in case of errors or 0 data delivery.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: d0fb48b50d67ec521f896b0fb879b9bf7a595c65 https://github.com/kronosnet/kronosnet/commit/d0fb48b50d67ec521f896b0fb879b9... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-19 (Sat, 19 Oct 2019)
Changed paths: M libknet/threads_rx.c
Log Message: ----------- [RX] silence defrag buffer expiration debug error
when using active-active links, it is simply too noisy and doesn't provide very useful information.
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 6ecd353a409241ee82a8cf3d72880adb71cd92dd https://github.com/kronosnet/kronosnet/commit/6ecd353a409241ee82a8cf3d72880a... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-23 (Wed, 23 Oct 2019)
Changed paths: M libknet/tests/api_knet_send.c M libknet/threads_rx.c M libknet/threads_tx.c
Log Message: ----------- Merge pull request #268 from kronosnet/netload-fixes
Netload fixes
Commit: 04be484f53f7da59d83aaa4ef9058b8474c01477 https://github.com/kronosnet/kronosnet/commit/04be484f53f7da59d83aaa4ef9058b... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-27 (Sun, 27 Oct 2019)
Changed paths: M configure.ac M libknet/crypto_openssl.c
Log Message: ----------- [build] fix openssl version detection when not using pkg-config
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 7f11ac2c6805667464d31e54212ee038e1278fb1 https://github.com/kronosnet/kronosnet/commit/7f11ac2c6805667464d31e54212ee0... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-27 (Sun, 27 Oct 2019)
Changed paths: M configure.ac M libknet/crypto_openssl.c
Log Message: ----------- Merge pull request #270 from kronosnet/bsd-build-fix
[build] fix openssl version detection when not using pkg-config
Commit: 21472f5d0a91916fa816ae3e3111a6fa91aa4c9a https://github.com/kronosnet/kronosnet/commit/21472f5d0a91916fa816ae3e3111a6... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-29 (Tue, 29 Oct 2019)
Changed paths: M libknet/handle.c
Log Message: ----------- [handle] make sure to unlock config handle on failure
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 5c368dfa49882b9a935c3074dc98d71d0148751e https://github.com/kronosnet/kronosnet/commit/5c368dfa49882b9a935c3074dc98d7... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-10-29 (Tue, 29 Oct 2019)
Changed paths: M libknet/handle.c
Log Message: ----------- Merge pull request #272 from kronosnet/cov-scan-errors
[handle] make sure to unlock config handle on failure
Commit: 93db765f404113b37b4a93501f83988b1a7fe88d https://github.com/kronosnet/kronosnet/commit/93db765f404113b37b4a93501f8398... Author: wferi wferi@debian.org Date: 2019-11-03 (Sun, 03 Nov 2019)
Changed paths: M libknet/handle.c
Log Message: ----------- [handle] fix typo in error log message
Commit: 1b46617722b9351f97582044b57b22bcca3d8f0c https://github.com/kronosnet/kronosnet/commit/1b46617722b9351f97582044b57b22... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-11-04 (Mon, 04 Nov 2019)
Changed paths: M libknet/handle.c
Log Message: ----------- Merge pull request #273 from kronosnet/wferi-patch-1
[handle] fix typo in error log message
Commit: 9b34354b9a1d713e850b80d0432c4a746802b86d https://github.com/kronosnet/kronosnet/commit/9b34354b9a1d713e850b80d0432c4a... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-11-20 (Wed, 20 Nov 2019)
Changed paths: M libknet/tests/int_links_acl_ip.c
Log Message: ----------- [tests] mark array as static
fixes an odd segfault when running the test on ppc when built with clang
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: d50049ca2e36494165701d678d856b4857ae6f9c https://github.com/kronosnet/kronosnet/commit/d50049ca2e36494165701d678d856b... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2019-11-20 (Wed, 20 Nov 2019)
Changed paths: M libknet/tests/int_links_acl_ip.c
Log Message: ----------- Merge pull request #274 from kronosnet/ppc-clang
[tests] mark array as static
Commit: eccdcf5e4354d105ec1877185d226d6820e508cb https://github.com/kronosnet/kronosnet/commit/eccdcf5e4354d105ec1877185d226d... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-22 (Wed, 22 Jan 2020)
Changed paths: M libknet/host.c
Log Message: ----------- [host] use KNET_MAX_HOST_LEN consistently
detected by gcc10
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: b546e216c43bf6e25438fc7c6136bb36e7444b59 https://github.com/kronosnet/kronosnet/commit/b546e216c43bf6e25438fc7c6136bb... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-22 (Wed, 22 Jan 2020)
Changed paths: M libnozzle/internals.h M libnozzle/libnozzle.c
Log Message: ----------- [nozzle] use interface name size consistently and drop strncpy in favour of memmove
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: fa6565c72aa7ecab9d812cd7711dace7e5a5626b https://github.com/kronosnet/kronosnet/commit/fa6565c72aa7ecab9d812cd7711dac... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-22 (Wed, 22 Jan 2020)
Changed paths: M libknet/host.c M libnozzle/internals.h M libnozzle/libnozzle.c
Log Message: ----------- Merge pull request #277 from kronosnet/gcc10
Fix errors detected by gcc10
Commit: d2a30ac0135c15eebce25a89f943e9062e1b3f6c https://github.com/kronosnet/kronosnet/commit/d2a30ac0135c15eebce25a89f943e9... Author: Christine Caulfield ccaulfie@redhat.com Date: 2020-01-24 (Fri, 24 Jan 2020)
Changed paths: M libknet/transport_udp.c
Log Message: ----------- [udp] don't make socket spin if a network I/F is down
UDP treats ENETUNREACH as a temporary error and just retries, but this causes the TX thread to spin just doing sendto() therefore blocking all other traffic.
(To reproduce this try starting corosync with 2 links configured in corosync.conf but only one of them configured to the 'right' address - it will spin in a tight loop and need to be killed with -9)
SCTP does not seem to suffer from this.
Commit: e40c7cb51e25af25bc967210e4ec0a7fb5d46b42 https://github.com/kronosnet/kronosnet/commit/e40c7cb51e25af25bc967210e4ec0a... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-24 (Fri, 24 Jan 2020)
Changed paths: M libknet/transport_udp.c
Log Message: ----------- Merge pull request #278 from kronosnet/dont-spin-enetunreach
[udp] don't make socket spin if a network I/F is down
Commit: 312db03e67bd0938743b03ccc605a005a64b74ea https://github.com/kronosnet/kronosnet/commit/312db03e67bd0938743b03ccc605a0... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-25 (Sat, 25 Jan 2020)
Changed paths: M libknet/transport_udp.c
Log Message: ----------- [udp] simplify code (same logic)
Signed-off-by: Fabio M. Di Nitto fdinitto@redhat.com
Commit: 30a9e0a20e745dd031cd2cd8f19a33c8aa420c61 https://github.com/kronosnet/kronosnet/commit/30a9e0a20e745dd031cd2cd8f19a33... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-27 (Mon, 27 Jan 2020)
Changed paths: M libknet/transport_udp.c
Log Message: ----------- Merge pull request #279 from kronosnet/nitpick
[udp] simplify code (same logic)
Commit: b84b48c61368544d14324d6173d3918d24df9837 https://github.com/kronosnet/kronosnet/commit/b84b48c61368544d14324d6173d391... Author: Christine Caulfield ccaulfie@redhat.com Date: 2020-01-30 (Thu, 30 Jan 2020)
Changed paths: M libknet/transport_udp.c
Log Message: ----------- [udp] Better fix for -ENETUNREACH
This fix for the ENETUNREACH problem works better than the last one in that it also works with Linux kernels > 5.0.0 (which return -ENETUNREACH) if an interfaces is brought down, and also on FreeBSD which returns ENETDOWN.
Commit: 49e3b23c0f536c2dad65ed4661dd819d6b06f680 https://github.com/kronosnet/kronosnet/commit/49e3b23c0f536c2dad65ed4661dd81... Author: Fabio M. Di Nitto fdinitto@redhat.com Date: 2020-01-30 (Thu, 30 Jan 2020)
Changed paths: M libknet/transport_udp.c
Log Message: ----------- Merge pull request #282 from kronosnet/better-eunreach-patch
[udp] Better fix for -ENETUNREACH
Commit: dc749a2017cfc87cec54501607b2b3a46c0a38bc https://github.com/kronosnet/kronosnet/commit/dc749a2017cfc87cec54501607b2b3... Author: Chrissie Caulfield ccaulfie@redhat.com Date: 2020-01-30 (Thu, 30 Jan 2020)
Changed paths: M configure.ac M libknet/Makefile.am M libknet/common.c M libknet/compat.c M libknet/crypto.c M libknet/crypto_model.h M libknet/crypto_nss.c M libknet/crypto_openssl.c M libknet/handle.c M libknet/host.c M libknet/internals.h M libknet/libknet.h M libknet/links.c M libknet/links.h A libknet/onwire.c M libknet/onwire.h M libknet/tests/Makefile.am M libknet/tests/api-check.mk A libknet/tests/api_knet_handle_pmtud_set.c M libknet/tests/api_knet_link_set_enable.c M libknet/tests/api_knet_send.c M libknet/tests/api_knet_send_crypto.c A libknet/tests/fun_pmtud_crypto.c M libknet/tests/int_links_acl_ip.c M libknet/tests/knet_bench.c M libknet/tests/test-common.c M libknet/tests/test-common.h M libknet/threads_common.c M libknet/threads_heartbeat.c M libknet/threads_pmtud.c M libknet/threads_rx.c M libknet/threads_tx.c M libknet/transport_sctp.c M libknet/transport_udp.c M libnozzle/internals.h M libnozzle/libnozzle.c M libnozzle/tests/test-common.c M man/Makefile.am M man/doxyxml.c
Log Message: ----------- Merge branch 'master' into doxycov
Compare: https://github.com/kronosnet/kronosnet/compare/f973f098a235...dc749a2017cf