Branch: refs/heads/lock-fix
Home: https://github.com/kronosnet/kronosnet
Commit: f45e4c67902b95bcd212275f5f6081fa31311793
https://github.com/kronosnet/kronosnet/commit/f45e4c67902b95bcd212275f5f608…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-09-09 (Mon, 09 Sep 2019)
Changed paths:
M libknet/threads_pmtud.c
Log Message:
-----------
[pmtud] switch to use async version of dstcache update due to locking context (read vs write)
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Branch: refs/heads/stable1-proposed
Home: https://github.com/kronosnet/kronosnet
Commit: 4df82e5fd847423b164f4fba70e20fd0026639ce
https://github.com/kronosnet/kronosnet/commit/4df82e5fd847423b164f4fba70e20…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-09-09 (Mon, 09 Sep 2019)
Changed paths:
M libknet/threads_heartbeat.c
M libknet/threads_rx.c
Log Message:
-----------
[links] stabilize latency calculation when nodes are not responsive
The following scenario is more of a corner case than normal, but
this change allows to better deal with this situation:
1) 2 nodes cluster (corosync) (node A and node B)
2) kill -stop $(pidof corosync) on node A
3) node B will continue to send ping packets to node A
4) node A is accumulating those ping packets in the kernel network socket
5) wait some seconds and unpause node A
6) node A will start processing the ping packets in the queue
and send pong replies to node B
7) node B will see an extreme increase of latency due
those "obsoleted" ping/pong packets
8) node B, as latency increases, will take longer and longer
to notice that node A is down due to the pong_timeout adjustment
for latency (required for initial cluster spike).
the solution:
1) Use average latency to calculate pong_timeout_adj vs latency_max.
Averate latency will go down again in time, while latency_max is never
reset.
2) RX thread will filter out all pong packets that have higher latency
than currently configure pong_timeout. This barrier should have
been in place even before.
this solution reduces the latency spike on node B to a perfectly
reasonable level and it will all eventually stabilize over time
as latency samples increase and latency will reduce.
Please be aware that using a pong_timeout smaller than latency will
simply mark the link down now.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Branch: refs/heads/master
Home: https://github.com/kronosnet/kronosnet
Commit: 0f67ee86745d52d68f376c92e96e1dd6661e9f5d
https://github.com/kronosnet/kronosnet/commit/0f67ee86745d52d68f376c92e96e1…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-09-06 (Fri, 06 Sep 2019)
Changed paths:
M libknet/threads_heartbeat.c
M libknet/threads_rx.c
Log Message:
-----------
[links] stabilize latency calculation when nodes are not responsive
The following scenario is more of a corner case than normal, but
this change allows to better deal with this situation:
1) 2 nodes cluster (corosync) (node A and node B)
2) kill -stop $(pidof corosync) on node A
3) node B will continue to send ping packets to node A
4) node A is accumulating those ping packets in the kernel network socket
5) wait some seconds and unpause node A
6) node A will start processing the ping packets in the queue
and send pong replies to node B
7) node B will see an extreme increase of latency due
those "obsoleted" ping/pong packets
8) node B, as latency increases, will take longer and longer
to notice that node A is down due to the pong_timeout adjustment
for latency (required for initial cluster spike).
the solution:
1) Use average latency to calculate pong_timeout_adj vs latency_max.
Averate latency will go down again in time, while latency_max is never
reset.
2) RX thread will filter out all pong packets that have higher latency
than currently configure pong_timeout. This barrier should have
been in place even before.
this solution reduces the latency spike on node B to a perfectly
reasonable level and it will all eventually stabilize over time
as latency samples increase and latency will reduce.
Please be aware that using a pong_timeout smaller than latency will
simply mark the link down now.
Signed-off-by: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Commit: 28e2b563e8acb0ac0eeb7a4c39efc6e7bf54ec53
https://github.com/kronosnet/kronosnet/commit/28e2b563e8acb0ac0eeb7a4c39efc…
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
Date: 2019-09-09 (Mon, 09 Sep 2019)
Changed paths:
M libknet/threads_heartbeat.c
M libknet/threads_rx.c
Log Message:
-----------
Merge pull request #251 from kronosnet/latency-fixes
[links] stabilize latency calculation when nodes are not responsive
Compare: https://github.com/kronosnet/kronosnet/compare/512e433b0b3d...28e2b563e8ac