Branch: refs/heads/rhel8
Home:
https://github.com/kronosnet/kronosnet
Commit: f1a5de2141a73716c09566f294e3873add5c3ff3
https://github.com/kronosnet/kronosnet/commit/f1a5de2141a73716c09566f294e38…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/links.c
Log Message:
-----------
[links] fix memory corryption of link structure
the index would overflow the buffer and overwrite data in the link
structure. Depending on what was written the cluster could fall
apart in many ways, from crashing, to hung.
Fixes:
https://github.com/kronosnet/kronosnet/issues/255
thanks to the proxmox developers and community for reporting the issue
and for all the help reproducing / debugging the problem.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: b67c63101246b400c7512cb1adbc590ac06cb6ee
https://github.com/kronosnet/kronosnet/commit/b67c63101246b400c7512cb1adbc5…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/crypto.c
Log Message:
-----------
[crypto] fix log information
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: a89c2cd6d3863abe0f3ae0165239177a7461ee5e
https://github.com/kronosnet/kronosnet/commit/a89c2cd6d3863abe0f3ae01652391…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/transport_udp.c
Log Message:
-----------
[udp] log information about detected kernel MTU
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 650ef6d26e83dd7827b2e913c52a1fac67ea60d4
https://github.com/kronosnet/kronosnet/commit/650ef6d26e83dd7827b2e913c52a1…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/threads_pmtud.c
Log Message:
-----------
[docs] add knet packet layout
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: dbed772f0cb9070826eac6524646bd2ea7cce8c0
https://github.com/kronosnet/kronosnet/commit/dbed772f0cb9070826eac6524646b…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/threads_pmtud.c
Log Message:
-----------
[PMTUd] fix MTU calculation when using crypto and add docs
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: a9460c72fafe452b7cb584598aa43a87b44428f0
https://github.com/kronosnet/kronosnet/commit/a9460c72fafe452b7cb584598aa43…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/Makefile.am
M libknet/crypto.c
M libknet/crypto_model.h
M libknet/crypto_nss.c
M libknet/crypto_openssl.c
M libknet/internals.h
M libknet/links.c
A libknet/onwire.c
M libknet/onwire.h
M libknet/tests/Makefile.am
M libknet/tests/api_knet_send_crypto.c
A libknet/tests/fun_pmtud_crypto.c
M libknet/threads_common.c
M libknet/threads_pmtud.c
Log Message:
-----------
[PMTUd] rework the whole math to calculate MTU
internal changes:
- drop the concept of sec_header_size that was completely wrong
and unnecessary
- bump crypto API to version 3 due to the above change
- clarify the difference between link->proto_overhead and
link->status->proto_overhead. We cannot rename the status
one as it would also change ABI.
- add onwire.c with documentation on the packet format
and what various len(s) mean in context.
- add 3 new functions to calculate MTUs back and forth
and use them around, hopefully with enough clarification
on why things are done in a given way.
- heavily change thread_pmtud.c to use those new facilities.
- fix major calculation issues when using crypto (non-crypto
was not affected by the problem).
- fix checks around to make sure they match the new math.
- fix padding calculation.
- add functional PMTUd crypto test
this test can take several hours (12+) and should be executed
on a controlled environment since it automatically changes
loopback MTU to run tests.
- fix way the lowest MTU is calculated during a PMTUd run
to avoid spurious double notifications.
- drop redundant checks.
user visible changes:
- Global MTU is now calculated properly when using crypto
and values will be in general bigger than before due
to incorrect padding calculation in the previous implementation.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 499f589404db791d8e68c84c8ba3a857aeea5083
https://github.com/kronosnet/kronosnet/commit/499f589404db791d8e68c84c8ba3a…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/internals.h
M libknet/links.c
M libknet/links.h
M libknet/threads_pmtud.c
Log Message:
-----------
[PMTUd] add dynamic pong timeout when using crypto
problem originally reported by proxmox community, users
observed that under pressure the MTU would flap back and forth
between 2 values due to other node response timeout.
implement a dynamic timeout multiplier when using crypto that
should solve the problem in a more flexible fashion.
When a timeout hits, those new logs will show:
[knet]: [info] host: host: 1 (passive) best link: 0 (pri: 0)
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (4) for host 1
link: 0
[knet]: [info] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 65429
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
[knet]: [info] pmtud: Global data MTU changed to: 65429
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (8) for host 1
link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (16) for host 1
link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (32) for host 1
link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (64) for host 1
link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (128) for host 1
link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
and when the latency reduces and it is safe to be more responsive again:
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Decreasing PMTUd response timeout multiplier to (64) for host 1
link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
....
testing this patch on normal hosts is a bit challenging tho.
Patch was tested by hardcoding a super low timeout here:
diff --git a/libknet/threads_pmtud.c b/libknet/threads_pmtud.c
index 4f0ba0f..5e2b89b 100644
--- a/libknet/threads_pmtud.c
+++ b/libknet/threads_pmtud.c
@@ -261,7 +271,8 @@ retry:
/*
* crypto, under pressure, is a royal PITA
*/
- pong_timeout_adj_tmp = dst_link->pong_timeout_adj * 2;
+ //pong_timeout_adj_tmp = dst_link->pong_timeout_adj *
dst_link->pmtud_crypto_timeout_multiplier;
+ pong_timeout_adj_tmp = 30 *
dst_link->pmtud_crypto_timeout_multiplier;
} else {
pong_timeout_adj_tmp = dst_link->pong_timeout_adj;
}
and using a long running version of api_knet_send_crypto_test with a short PMTUd setfreq
(10 sec).
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 5f3476849523e9ee486481b429b471a1ab3cac20
https://github.com/kronosnet/kronosnet/commit/5f3476849523e9ee486481b429b47…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/handle.c
Log Message:
-----------
[handle] make sure that the pmtud buf contains at least knet header size
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 3b3b6d2a7e1fee7eb41c6bacc1005ff90f7dd5cb
https://github.com/kronosnet/kronosnet/commit/3b3b6d2a7e1fee7eb41c6bacc1005…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/tests/knet_bench.c
Log Message:
-----------
[tests] fix knet_bench coverity errors
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: d74380a82c00716aafb780f5602182fce90d381f
https://github.com/kronosnet/kronosnet/commit/d74380a82c00716aafb780f560218…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/threads_pmtud.c
Log Message:
-----------
[PMTUd] do not double unlock global read lock
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 01242c683b18b813a67c13d3fc0546fec34f9f7c
https://github.com/kronosnet/kronosnet/commit/01242c683b18b813a67c13d3fc054…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/threads_pmtud.c
Log Message:
-----------
[pmtud] switch to use async version of dstcache update due to locking context (read vs
write)
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: a70f0adf0d4d38ed614bf2eef1a4e66fec2f2c92
https://github.com/kronosnet/kronosnet/commit/a70f0adf0d4d38ed614bf2eef1a4e…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libnozzle/tests/test-common.c
Log Message:
-----------
[tests] fix ip generation boundaries
https://ci.kronosnet.org/job/knet-build-all-voting/1450/knet-build-all-voti…
and similar, when pid = 255, the secondary IP would hit 256 that is of course invalid.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 63567e1e6b6ebb91fe1df43b910d6b9bd78d528f
https://github.com/kronosnet/kronosnet/commit/63567e1e6b6ebb91fe1df43b910d6…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/threads_pmtud.c
Log Message:
-----------
[PMTUd] invalidate MTU for a link if the value is lower than minimum
Under heavy network load and packet loss, calculated MTU can be
too small. In that case we need to invalidate the link mtu,
that would remove the link from the rotation (and traffic) and
would give PMTUd time to get the right MTU in the next round.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: db21da87bba6017c8343f9c6f255b21813ffd5d0
https://github.com/kronosnet/kronosnet/commit/db21da87bba6017c8343f9c6f255b…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/host.c
Log Message:
-----------
[host] rename variables to make it easier to read the code
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 1e473cf26d55c2b6ff8d5bfaa5aa689554de803c
https://github.com/kronosnet/kronosnet/commit/1e473cf26d55c2b6ff8d5bfaa5aa6…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/host.c
Log Message:
-----------
[host] fix defrag buffers reclaim logic
The problem:
- let's assume a 2 nodes (A and B) cluster setup
- node A sends fragmented packets to node B and there is
packet loss on the network.
- node B receives all those fragments and attempts to
reassemble them.
- node A sends packet seq_num X in Y fragments.
- node B receives only part of the fragments and stores
them in a defrag buf.
- packet loss stops.
- node A continues to send packets and a seq_num
roll-over takes place.
- node A sends a new packet seq_num X in Y fragments.
- node B gets confused here because the parts of the old
packet seq_num X are still stored and the buffer
has not been reclaimed.
- node B continues to rebuild packet seq_num X with
old stale data and new data from after the roll-over.
- node B completes reassembling the packet and delivers
junk to the application.
The solution:
Add a much stronger buffer reclaim logic that will apply
on each received packet and not only when defrag buffers
are needed, as there might be a mix of fragmented and not
fragmented packets in-flight.
The new logic creates a window of N packets that can be
handled at the same time (based on the number of buffers)
and clear everything else.
Fixes
https://github.com/kronosnet/kronosnet/issues/261
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 5bd88ebd63af20577095c2c98975f0f1781ba46a
https://github.com/kronosnet/kronosnet/commit/5bd88ebd63af20577095c2c98975f…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/threads_rx.c
Log Message:
-----------
[rx] copy data into the defrag buffer only if we know the size of the frame
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: cd59986900510119d8e7b63d33ad35466d480858
https://github.com/kronosnet/kronosnet/commit/cd59986900510119d8e7b63d33ad3…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/tests/knet_bench.c
Log Message:
-----------
[test] add ability to knet_bench to specify a fixed packet size for perf test
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: e28e2ea7c7e8139a6792ec1508215d4560b53e65
https://github.com/kronosnet/kronosnet/commit/e28e2ea7c7e8139a6792ec1508215…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-10-16 (Wed, 16 Oct 2019)
Changed paths:
M libknet/tests/knet_bench.c
Log Message:
-----------
[test] add packet verification option to knet_bench
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Compare:
https://github.com/kronosnet/kronosnet/compare/f1a5de2141a7%5E...e28e2ea7c7…