On 05/15/2016 08:14 AM, Fabio M. Di Nitto wrote:
On 05/15/2016 07:42 AM, Fabio M. Di Nitto wrote:
> Hey Chrissie,
>
> i found the bugs in totemknet that made knet so slow and now I am
> getting up to 50% more cpgbench performance (19MB/sec on 64 bytes pckts)
> vs udpu (13MB/sec).
>
> First bug is the dst_host_filter. this is not your fault, your thinking
> was sound but it didn't do the right thing. We need to fix the knet API
> doc and possibly I can also fix the code to be smarter.
>
> The issue is that for RX packet, you make an exception where if we are
> sending pckts to self, you return that the pckt is unicast. This is
> confusing the packet deduplicator because it will try to match mcast
> pckts against unicast seq num.
>
> The dst_host_filter shouldn't change unicast/mcast, but return data from
> the pckt. In theory this can be simplified because those information are
> already inside the onwire packet.
>
> I'll review that soon enough because it's confusing and incorrect.
>
> Anyway to unblock testing:
>
> +#if 0
> if (tx_rx == KNET_NOTIFY_RX) {
> dst_host_ids[0] = this_host_id;
> *dst_host_ids_entries = 1;
> res = 0; /* already home */
> }
> else {
> +#endif
>
> and remove the stray } ;)
>
> Second is the MTU calculation that's generating unnecessary fragmented
> pckts.
>
> extern void totemknet_net_mtu_adjust (void *knet_context, struct
> totem_config *totem_config)
> {
> - fprintf(stderr, "MTU = %d\n", totem_config->net_mtu);
> - totem_config->net_mtu = 1444;
> + // fabbione: need to export libknet header size from libknet somehow
> + totem_config->net_mtu -= totemip_udpip_header_size(AF_INET) + 23;
> }
>
> This calculation is correct for AF_INET interfaces (we need to fix it
> for IPv6 and such but it's a start.
>
> This also unveiled a less important bug in knet that doesn't export all
> MTU information to applications, but it doesn't affect corosync directly
> (that hard coded 23 should be an API call).
>
> Using the combination of those 2 gives me a 14 to 15 MB/sec on 64 bytes
> packets in cpgbench using default netmtu of 1500.
>
> The last boost is netmtu changes. Bumping netmtu in corosync.conf to
> 4096 I get to 19MB/sec.
Just to be precise, udpu on netmtu set at 4096, is almost 29MB/sec. I
realized i was comparing apples and orages.
So in short (using cpgbench and 64 Bytes pckts):
udpu (1500): 13/14 MB/sec
knet (1500): 14/15 MB/sec
udpu (4096): 29 MB/sec
knet (4096): 19 MB/sec
Fabio
Now there is a bug in corosync netmtu somewhere. I didn't bother to
investigate because it's sunday,
I tried 8192 and corosync started to misbehave, as if it's not sending
pckts till buffer is full (or that's the feeling I got).
Bumping netmtu to 64000 makes corosync hang.
So there is definitely some work to be done to fix it and most likely
get better performances out of the system.
I also found the issue that's causing membership to take a long time to
form and random changes. It is somehow related to the PMTUd thread
that's doing something funky. I have an idea of what's the root problem
but i'll need to do proper debugging / investigation.
Fabio
_______________________________________________
Devel mailing list
Devel(a)lists.kronosnet.org
http://lists.kronosnet.org/mailman/listinfo/devel