On 11/28/2017 6:23 PM, Ferenc Wágner wrote:
"Fabio M. Di Nitto" fdinitto@redhat.com writes:
On 11/28/2017 10:00 AM, Ferenc Wágner wrote:
"Fabio M. Di Nitto" fdinitto@redhat.com writes:
let me stop you a minute here.
Thanks for looking at this experiment proactively! It reached the point of viability right now, so let's chat about them if you have the time.
Yeps, I have got time :-)
Apparently more than me... I'm at home with a sick child, sorry for the slow reaction.
No worries. I have got 2 kids as well. Family first.
what is the end goal of this work?
I tried to summarize it in 96a92847:
Our current practice of dlopening foreign shared libraries is problematic for several reasons: * not portable: modules and shared libraries can be different object types * dependency information is invisible (our canaries mostly solve this) * hardwiring SONAMES breaks on transitions (KNET_PKG_SONAME solves this) * symbol versioning information is lost (theoretically solvable) The preferred way out is generating dynamically loaded private modules from the main source, which then rely on the dynamic linker to load the external symbols as usual.
For a longer version please refer to my mail at https://lists.debian.org/debian-mentors/2017/11/msg00200.html and the links included by Guillem Jover.
Most of those concerns appears to be coming from a packing perspective. I understand them, and some of them are valid.
I think it's a sharp observation. While knet is evolving faster than all the plugin dependencies taken together, keeping dlopen at the outer boundary can be advantageous.
At the end, if any of the external API will change, we will notice one way or another. That is why we have the daily CI job running _exactly_ to detect breakage generated by our build dependencies.
https://ci.kronosnet.org/job/knet-all-daily/
https://docs.google.com/document/d/1q6OZD97H8ZF1WEDTJq6p84dR-6AuL_HrxDLssSdl...
We originally didn´t implement the modules model because it has several downsides for little benefit, based on the experience we had with corosync modules in the past.
I'd be interested to learn about these downsides, could you please provide some hints or keywords to search for?
I don´t think we have recorded them, but here is the rundown:
- technically speaking using modules is no different than dlopening a
shared library.
Yes, on ELF platforms that's true.
In that respect, we are simply moving the problem somewhere else. I could agree (based on the threads above) that containing the problem within the same upstream is probably saner.
It's the only way I know of to avoid hardwiring foreign library sonames. Which differ from platform to platform. Detecting them (as we currently do) works well enough in simple cases, but NSS is a pig already, having its symbols spread into several shared objects. That would be rather complicated to emulate with dlopens, not to mention symbol versioning. Honestly, I haven't had the courage to dwelve into that yet.
See my last email. Let´s move to module and solve the problem at once.
- it enforces a strict internal API/ABI between main libknet and the
plugins. Making changes to those very complex, specially during updates (see also point 4). Right now we want to keep those API/ABI free to change and expand.
I'd think keeping both sides in the same upstream makes the internal API/ABI a non-issue at the upstream level. Packaging can use strict versioned dependencies between the plugin packages and the main library to force upgrading them together (this only exposes the need to restart the application after library/plugin upgrades).
Packaging only helps you to maintain ondisk compatibility between version tho. the main library might be loaded by an application and not restarted on package upgrade.
Other solutions are module versioning (like library versioning, but independent) or plugin directory versioning, which enable coexistence of different module ABI versions (again, the module ABI could change keeping the library ABI undisturbed). See also below.
right, same as I suggested in the other email.
- it´s very difficult to debug modules due the symbol resolving
mechanism (Honza in CC can provide more details).
For what it's worth, gdb works fine for me...
wferi@lant:~/ha/kronosnet/kronosnet/libknet/tests$ LD_LIBRARY_PATH=../.libs gdb .libs/int_crypto_test [...] Reading symbols from .libs/int_crypto_test...done. (gdb) b encrypt_nss Function "encrypt_nss" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (encrypt_nss) pending. (gdb) r Starting program: /home/wferi/ha/kronosnet/kronosnet/libknet/tests/.libs/int_crypto_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff5054700 (LWP 1856)] [New Thread 0x7ffff4853700 (LWP 1857)] [New Thread 0x7ffff4052700 (LWP 1858)] [New Thread 0x7ffff3851700 (LWP 1859)] [New Thread 0x7ffff3050700 (LWP 1860)] [New Thread 0x7ffff284f700 (LWP 1861)] [New Thread 0x7ffff204e700 (LWP 1862)] Test knet_handle_crypto with nss/aes128/sha1 and normal key knet logs: [info] common: crypto_nss.so has been loaded from /home/wferi/ha/kronosnet/kronosnet/libknet/tests/../.libs/crypto_nss.so knet logs: [debug] crypto: Initizializing crypto module [nss/aes128/sha1] knet logs: [debug] nsscrypto: Initizializing nss crypto module [aes128/sha1] knet logs: [debug] crypto: security network overhead: 68 Source Data: Encrypt me!
Thread 1 "int_crypto_test" hit Breakpoint 1, nsscrypto_encrypt_and_signv (iovcnt_in=1, buf_out_len=0x7fffffffd0b0, buf_out=0x5555579090e0 "", iov_in=0x7fffffffd030, knet_h=0x7ffff5055010) at crypto_nss.c:625 625 if (encrypt_nss(knet_h, iov_in, iovcnt_in, buf_out, buf_out_len) < 0) { (gdb)
Let´s see how it goes. Apparently the plugin module used in corosync was much more complex and caused problems that theoretically we should not see here (according to Jan).
#3 i am not happy about. specially given that the project is still in it´s "early" days. this specifically has proven very challenging in some environments.
Nothing threatening comes to my mind at the moment, but I'd be happy to look into any concrete problems brought up.
agreed.
- runtime operations (updates) can be nasty. You could have
application X that has loaded libknet 1.0 with modules 1.0 apt-get update.. get libknet 1.1 or whatever, new plugins, application tries to load a module and kaboom.
Well, yes. But at least it's in one hand: with modules you've got a small internal ABI to watch out for, with direct dlopens you've got several foreign ABIs to follow. Yes, development speed matters, but as knet matures, the balance will inevitably tip, if it hasn't already.
#4 can be solved by adding some kind of hashing/signing mechanism of the modules (aka load only modules that match the build).
That sounds somewhat overkill... why not just use a version number if we really must? Or introduce an extensible module ABI (with an explicit size at the front of the model definition) and stay safe for a longer term? Just thinking out loudly...
Yeah we got to the same conclusion. See again my other reply :-)
As for the code I have seen so far, please find another way to pass log_msg down to the plugins. I am not going to export it to the world :-)
Yeah, that's a wart. I don't know how to make that symbol accessible in the usual way in the modules without including its code. And I wonder what other internal symbols may need to be exposed later. Does the format used in the log pipe constitute ABI? This might even be a show stopper.
It´s no different than installing callback. See for example:
int knet_handle_enable_pmtud_notify(knet_handle_t knet_h, void *pmtud_notify_fn_private_data, void (*pmtud_notify_fn) ( void *private_data, unsigned int data_mtu));
The module init would need to have an extra parameter to pass log_msg in the void (*log_msg_fn) (.....));
At init time, it will store the pointer to log_msg somewhere internal and it needs to do that only once. Should be fairly straightforward.
Fabio