Hey Chrissie,
i found the bugs in totemknet that made knet so slow and now I am getting up to 50% more cpgbench performance (19MB/sec on 64 bytes pckts) vs udpu (13MB/sec).
First bug is the dst_host_filter. this is not your fault, your thinking was sound but it didn't do the right thing. We need to fix the knet API doc and possibly I can also fix the code to be smarter.
The issue is that for RX packet, you make an exception where if we are sending pckts to self, you return that the pckt is unicast. This is confusing the packet deduplicator because it will try to match mcast pckts against unicast seq num.
The dst_host_filter shouldn't change unicast/mcast, but return data from the pckt. In theory this can be simplified because those information are already inside the onwire packet.
I'll review that soon enough because it's confusing and incorrect.
Anyway to unblock testing:
+#if 0 if (tx_rx == KNET_NOTIFY_RX) { dst_host_ids[0] = this_host_id; *dst_host_ids_entries = 1; res = 0; /* already home */ } else { +#endif
and remove the stray } ;)
Second is the MTU calculation that's generating unnecessary fragmented pckts.
extern void totemknet_net_mtu_adjust (void *knet_context, struct totem_config *totem_config) { - fprintf(stderr, "MTU = %d\n", totem_config->net_mtu); - totem_config->net_mtu = 1444; + // fabbione: need to export libknet header size from libknet somehow + totem_config->net_mtu -= totemip_udpip_header_size(AF_INET) + 23; }
This calculation is correct for AF_INET interfaces (we need to fix it for IPv6 and such but it's a start.
This also unveiled a less important bug in knet that doesn't export all MTU information to applications, but it doesn't affect corosync directly (that hard coded 23 should be an API call).
Using the combination of those 2 gives me a 14 to 15 MB/sec on 64 bytes packets in cpgbench using default netmtu of 1500.
The last boost is netmtu changes. Bumping netmtu in corosync.conf to 4096 I get to 19MB/sec.
Now there is a bug in corosync netmtu somewhere. I didn't bother to investigate because it's sunday,
I tried 8192 and corosync started to misbehave, as if it's not sending pckts till buffer is full (or that's the feeling I got).
Bumping netmtu to 64000 makes corosync hang.
So there is definitely some work to be done to fix it and most likely get better performances out of the system.
Cheers Fabio
On 05/15/2016 07:42 AM, Fabio M. Di Nitto wrote:
Hey Chrissie,
i found the bugs in totemknet that made knet so slow and now I am getting up to 50% more cpgbench performance (19MB/sec on 64 bytes pckts) vs udpu (13MB/sec).
First bug is the dst_host_filter. this is not your fault, your thinking was sound but it didn't do the right thing. We need to fix the knet API doc and possibly I can also fix the code to be smarter.
The issue is that for RX packet, you make an exception where if we are sending pckts to self, you return that the pckt is unicast. This is confusing the packet deduplicator because it will try to match mcast pckts against unicast seq num.
The dst_host_filter shouldn't change unicast/mcast, but return data from the pckt. In theory this can be simplified because those information are already inside the onwire packet.
I'll review that soon enough because it's confusing and incorrect.
Anyway to unblock testing:
+#if 0 if (tx_rx == KNET_NOTIFY_RX) { dst_host_ids[0] = this_host_id; *dst_host_ids_entries = 1; res = 0; /* already home */ } else { +#endif
and remove the stray } ;)
Second is the MTU calculation that's generating unnecessary fragmented pckts.
extern void totemknet_net_mtu_adjust (void *knet_context, struct totem_config *totem_config) {
fprintf(stderr, "MTU = %d\n", totem_config->net_mtu);
totem_config->net_mtu = 1444;
// fabbione: need to export libknet header size from libknet somehow
totem_config->net_mtu -= totemip_udpip_header_size(AF_INET) + 23;
}
This calculation is correct for AF_INET interfaces (we need to fix it for IPv6 and such but it's a start.
This also unveiled a less important bug in knet that doesn't export all MTU information to applications, but it doesn't affect corosync directly (that hard coded 23 should be an API call).
Using the combination of those 2 gives me a 14 to 15 MB/sec on 64 bytes packets in cpgbench using default netmtu of 1500.
The last boost is netmtu changes. Bumping netmtu in corosync.conf to 4096 I get to 19MB/sec.
Now there is a bug in corosync netmtu somewhere. I didn't bother to investigate because it's sunday,
I tried 8192 and corosync started to misbehave, as if it's not sending pckts till buffer is full (or that's the feeling I got).
Bumping netmtu to 64000 makes corosync hang.
So there is definitely some work to be done to fix it and most likely get better performances out of the system.
I also found the issue that's causing membership to take a long time to form and random changes. It is somehow related to the PMTUd thread that's doing something funky. I have an idea of what's the root problem but i'll need to do proper debugging / investigation.
Fabio
On 05/15/2016 08:14 AM, Fabio M. Di Nitto wrote:
On 05/15/2016 07:42 AM, Fabio M. Di Nitto wrote:
Hey Chrissie,
i found the bugs in totemknet that made knet so slow and now I am getting up to 50% more cpgbench performance (19MB/sec on 64 bytes pckts) vs udpu (13MB/sec).
First bug is the dst_host_filter. this is not your fault, your thinking was sound but it didn't do the right thing. We need to fix the knet API doc and possibly I can also fix the code to be smarter.
The issue is that for RX packet, you make an exception where if we are sending pckts to self, you return that the pckt is unicast. This is confusing the packet deduplicator because it will try to match mcast pckts against unicast seq num.
The dst_host_filter shouldn't change unicast/mcast, but return data from the pckt. In theory this can be simplified because those information are already inside the onwire packet.
I'll review that soon enough because it's confusing and incorrect.
Anyway to unblock testing:
+#if 0 if (tx_rx == KNET_NOTIFY_RX) { dst_host_ids[0] = this_host_id; *dst_host_ids_entries = 1; res = 0; /* already home */ } else { +#endif
and remove the stray } ;)
Second is the MTU calculation that's generating unnecessary fragmented pckts.
extern void totemknet_net_mtu_adjust (void *knet_context, struct totem_config *totem_config) {
fprintf(stderr, "MTU = %d\n", totem_config->net_mtu);
totem_config->net_mtu = 1444;
// fabbione: need to export libknet header size from libknet somehow
totem_config->net_mtu -= totemip_udpip_header_size(AF_INET) + 23;
}
This calculation is correct for AF_INET interfaces (we need to fix it for IPv6 and such but it's a start.
This also unveiled a less important bug in knet that doesn't export all MTU information to applications, but it doesn't affect corosync directly (that hard coded 23 should be an API call).
Using the combination of those 2 gives me a 14 to 15 MB/sec on 64 bytes packets in cpgbench using default netmtu of 1500.
The last boost is netmtu changes. Bumping netmtu in corosync.conf to 4096 I get to 19MB/sec.
Just to be precise, udpu on netmtu set at 4096, is almost 29MB/sec. I realized i was comparing apples and orages.
So in short (using cpgbench and 64 Bytes pckts):
udpu (1500): 13/14 MB/sec knet (1500): 14/15 MB/sec
udpu (4096): 29 MB/sec knet (4096): 19 MB/sec
Fabio
Now there is a bug in corosync netmtu somewhere. I didn't bother to investigate because it's sunday,
I tried 8192 and corosync started to misbehave, as if it's not sending pckts till buffer is full (or that's the feeling I got).
Bumping netmtu to 64000 makes corosync hang.
So there is definitely some work to be done to fix it and most likely get better performances out of the system.
I also found the issue that's causing membership to take a long time to form and random changes. It is somehow related to the PMTUd thread that's doing something funky. I have an idea of what's the root problem but i'll need to do proper debugging / investigation.
Fabio
Devel mailing list Devel@lists.kronosnet.org http://lists.kronosnet.org/mailman/listinfo/devel
On 05/15/2016 08:14 AM, Fabio M. Di Nitto wrote:
On 05/15/2016 07:42 AM, Fabio M. Di Nitto wrote:
Hey Chrissie,
I also found the issue that's causing membership to take a long time to form and random changes. It is somehow related to the PMTUd thread that's doing something funky. I have an idea of what's the root problem but i'll need to do proper debugging / investigation.
So the PMTUd thread is somehow busted. I have attached a disable-pmtud patch that expects the mtu to be 1500 on the ethernet interface.
With this patch I don't see any wonky membership changes and only from time to time (specially when membership is forming) retransmits, but on a 4 node cluster cpgbench can run smoothly over and over without issues.
Fabio
Fix is now pushed to master. Not perfect, but very stable.
Fabio
On 05/28/2016 07:17 AM, Fabio M. Di Nitto wrote:
On 05/15/2016 08:14 AM, Fabio M. Di Nitto wrote:
On 05/15/2016 07:42 AM, Fabio M. Di Nitto wrote:
Hey Chrissie,
I also found the issue that's causing membership to take a long time to form and random changes. It is somehow related to the PMTUd thread that's doing something funky. I have an idea of what's the root problem but i'll need to do proper debugging / investigation.
So the PMTUd thread is somehow busted. I have attached a disable-pmtud patch that expects the mtu to be 1500 on the ethernet interface.
With this patch I don't see any wonky membership changes and only from time to time (specially when membership is forming) retransmits, but on a 4 node cluster cpgbench can run smoothly over and over without issues.
Fabio
Devel mailing list Devel@lists.kronosnet.org http://lists.kronosnet.org/mailman/listinfo/devel