Branch: refs/heads/latency-fixes
Home: https://github.com/kronosnet/kronosnet
Commit: 0f67ee86745d52d68f376c92e96e1dd6661e9f5d
https://github.com/kronosnet/kronosnet/commit/0f67ee86745d52d68f376c92e96e1…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-09-06 (Fri, 06 Sep 2019)
Changed paths:
M libknet/threads_heartbeat.c
M libknet/threads_rx.c
Log Message:
-----------
[links] stabilize latency calculation when nodes are not responsive
The following scenario is more of a corner case than normal, but
this change allows to better deal with this situation:
1) 2 nodes cluster (corosync) (node A and node B)
2) kill -stop $(pidof corosync) on node A
3) node B will continue to send ping packets to node A
4) node A is accumulating those ping packets in the kernel network socket
5) wait some seconds and unpause node A
6) node A will start processing the ping packets in the queue
and send pong replies to node B
7) node B will see an extreme increase of latency due
those "obsoleted" ping/pong packets
8) node B, as latency increases, will take longer and longer
to notice that node A is down due to the pong_timeout adjustment
for latency (required for initial cluster spike).
the solution:
1) Use average latency to calculate pong_timeout_adj vs latency_max.
Averate latency will go down again in time, while latency_max is never
reset.
2) RX thread will filter out all pong packets that have higher latency
than currently configure pong_timeout. This barrier should have
been in place even before.
this solution reduces the latency spike on node B to a perfectly
reasonable level and it will all eventually stabilize over time
as latency samples increase and latency will reduce.
Please be aware that using a pong_timeout smaller than latency will
simply mark the link down now.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Branch: refs/heads/stable1-proposed
Home: https://github.com/kronosnet/kronosnet
Commit: c798671216f6c8ecbc9bd35a8fce51d95d3ff21c
https://github.com/kronosnet/kronosnet/commit/c798671216f6c8ecbc9bd35a8fce5…
Author: Jan Friesse <jfriesse(a)redhat.com>
Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths:
M libknet/compat.c
M libknet/crypto.c
M libknet/onwire.c
Log Message:
-----------
[common] Include correct errno.h
sys/errno.h is system-specific path and errno.h should be used instead.
Signed-off-by: Jan Friesse <jfriesse(a)redhat.com>
Commit: f907ee45485ee05a61c5c227e354a7b2a07ff47c
https://github.com/kronosnet/kronosnet/commit/f907ee45485ee05a61c5c227e354a…
Author: Jan Friesse <jfriesse(a)redhat.com>
Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths:
M configure.ac
M libknet/common.c
Log Message:
-----------
[common] Conditionalize RTLD_DI_ORIGIN
RTLD_DI_ORIGIN is used to get absolute path of plugin. It is used only
for logging useful info and not strictly needed, so use it only when it
is defined (only musl is known to author of the patch)
Signed-off-by: Jan Friesse <jfriesse(a)redhat.com>
Commit: eba56bb9a905d1cfa23faf2fbb0defb525e7b373
https://github.com/kronosnet/kronosnet/commit/eba56bb9a905d1cfa23faf2fbb0de…
Author: Jan Friesse <jfriesse(a)redhat.com>
Date: 2019-09-03 (Tue, 03 Sep 2019)
Changed paths:
M libknet/handle.c
M libknet/internals.h
Log Message:
-----------
[handle] Set thread stack size on create
Musl libc has small stack size for threads. Knet needs ~300KiB (tested
at the time when this patch was created). Glibc seems to use ~8MiB. As a
compromise, 1MiB is used.
Signed-off-by: Jan Friesse <jfriesse(a)redhat.com>
Compare: https://github.com/kronosnet/kronosnet/compare/7a836938cc27...eba56bb9a905