All,
i have been playing around with SCTP as transport for knet for a while
now, as a possibility to pipe DLM traffic via knet.
The PoC patch is working fine, but the performances vs UDP are rather
discouraging (at least using CPG bench), with SCTP being able to handle
just a bit more than 50% of UDP performances.
Before I invest too much time into it, the question become: do we really
care for a potential knet 1.0?
My take here is to either postpone or drop SCTP for now and re-evaluate
for 2.0. Idea here is being also able to make it easier to plugin
different network protocols. Right now the code is very much designed
around one single network protocol and the patch is very invasive in
some areas with little flexibility (meaning having to break API/ABI
every time we add a new protocol).
My suggestion would be to refactor the network code to be more modular
and after we have a more dynamic/better designed API/ABI, we can add N
protocols without suffering too much.
agree? disagree?
Fabio
Hey Chrissie,
i found the bugs in totemknet that made knet so slow and now I am
getting up to 50% more cpgbench performance (19MB/sec on 64 bytes pckts)
vs udpu (13MB/sec).
First bug is the dst_host_filter. this is not your fault, your thinking
was sound but it didn't do the right thing. We need to fix the knet API
doc and possibly I can also fix the code to be smarter.
The issue is that for RX packet, you make an exception where if we are
sending pckts to self, you return that the pckt is unicast. This is
confusing the packet deduplicator because it will try to match mcast
pckts against unicast seq num.
The dst_host_filter shouldn't change unicast/mcast, but return data from
the pckt. In theory this can be simplified because those information are
already inside the onwire packet.
I'll review that soon enough because it's confusing and incorrect.
Anyway to unblock testing:
+#if 0
if (tx_rx == KNET_NOTIFY_RX) {
dst_host_ids[0] = this_host_id;
*dst_host_ids_entries = 1;
res = 0; /* already home */
}
else {
+#endif
and remove the stray } ;)
Second is the MTU calculation that's generating unnecessary fragmented
pckts.
extern void totemknet_net_mtu_adjust (void *knet_context, struct
totem_config *totem_config)
{
- fprintf(stderr, "MTU = %d\n", totem_config->net_mtu);
- totem_config->net_mtu = 1444;
+ // fabbione: need to export libknet header size from libknet somehow
+ totem_config->net_mtu -= totemip_udpip_header_size(AF_INET) + 23;
}
This calculation is correct for AF_INET interfaces (we need to fix it
for IPv6 and such but it's a start.
This also unveiled a less important bug in knet that doesn't export all
MTU information to applications, but it doesn't affect corosync directly
(that hard coded 23 should be an API call).
Using the combination of those 2 gives me a 14 to 15 MB/sec on 64 bytes
packets in cpgbench using default netmtu of 1500.
The last boost is netmtu changes. Bumping netmtu in corosync.conf to
4096 I get to 19MB/sec.
Now there is a bug in corosync netmtu somewhere. I didn't bother to
investigate because it's sunday,
I tried 8192 and corosync started to misbehave, as if it's not sending
pckts till buffer is full (or that's the feeling I got).
Bumping netmtu to 64000 makes corosync hang.
So there is definitely some work to be done to fix it and most likely
get better performances out of the system.
Cheers
Fabio