Mitogen for Ansible status, 23 May

Mitogen for Ansible status, 23 May
This is the third update on the status of developing Mitogen for Ansible.

Too long, didn’t read

A beta is coming soon! Aside from async tasks, the master branch is looking great. Since last update there have been many features and fixes, but with important forks in the road ahead, particularly around efficient support for many-host. Read on..

Just tuning in?
- 2017-09-15: Mitogen, an infrastructure code baseline that sucks less
- 2018-03-06: Quadrupling Ansible performance with Mitogen
- 2018-03-28: Crowdfunding Mitogen: day 23
- 2018-04-20: Crowdfunding Mitogen: day 46
Done: File Transfer

File transfer previously worked by constructing one RPC representing the complete file, which for large files resulted in an explosion in memory usage on each machine as the message was enqueued and transferred, with communication at each hop blocked until the message was delivered. This has required a rewrite since the original code was written, but a simple solution proved elusive.

Today file transfer is all but solved: files are streamed in 128KiB-sized messages, using a dedicated service that aggregates pending transfers by their most directly connected stream, serving one file at a time before progressing to the next transfer. An initial burst of 128KiB chunks is generated to fill a link with a 1MiB BDP, with further chunks sent as acknowledgements begin to arrive from the receiver. As an optimization, files 32KiB or smaller are still delivered in a single RPC, avoiding one roundtrip in a common scenario.

Compared to sftp(1) or scp(1), the new service has vastly lower setup overhead (1 RTT vs. 5) and far better safety properties, ensuring concurrent use of the API by unrelated ansible-playbook runs cannot create a situation where an inconsistent file may be observed by users, or a corrupt file is deployed with no indication a problem exists.

Since file transfer is implemented in terms of Mitogen's message bus, it is agnostic to Connection Delegation, allowing streaming file transfers between proxied targets regardless of how the connection is set up.

Some minor problems remain: the scheduler cannot detect a timed out transfer, risking a cascading hang when Connection Delegation is in use. This is not a regression compared to previously, as Ansible does not support this operation mode. In both cases during normal operation, the timeout will eventually be noticed when the underlying SSH connection times out.

Connection Delegation

Connection Delegation enables Ansible to use one or more intermediary machines to reach a target machine or container, with connections and code uploads deduplicated at each hop in the path. For an Ansible run against many containers on one target host, only one SSH connection to the target need exist, and module code need only be uploaded once on that connection.

While not yet complete, this feature exists today and works well, however some important functionality is still missing. Presently intermediary connection setup is single threaded, non-Python (i.e. Ansible) module uploads are duplicated, and the code to infer intermediary connection configurations using the APIs available in Ansible is.. hairy at best.

Fixing deduplication and single-threaded connection setup entails starting a service thread pool within each interpreter that will act as an intermediary. This requires some reworking of the nascent service framework, also making it easier to use for non-Ansible programs, and lays the groundwork for Topology-aware File Synchronization.

Custom module_utils

From the department of surprises, this one is a true classic. Ansible supports an undocumented (upstream docs patch) but nonetheless commonly used mechanism for bundling third party modules and overriding built-in support modules as part of the ZIP file deployed to the target. It implements this by virtualizing a core Ansible package namespace: ansible.module_utils, causing what Python finds there to vary on a per-task basis, and crucially, to have its implementation diverge entirely from the equivalent import in the Ansible controller process.

It is suffice to say I nearly lost my mind on discovering this "feature", not due to the functionality it provides, but the manner in which it opts to provide it. Rather than loading a core package namespace as a regular Python package using Mitogen's built-in mechanism, every Ansible module must undergo additional dependency scanning using its unique search path, and any dependencies found must correctly override existing loaded modules appearing in the target interpreter's namespace at runtime.

Given Mitogen's intended single-reusable-interpreter design, there is no way to support this without tempting strange behaviours appearing across tasks whose ansible.module_utils search path varies. While it is easy to arrange for ansible.module_utils.third_party_module to be installed, it is impossible to uninstall it while ensuring every reference to the previous implementation, including instances of every type defined by it, are extricated from the reusable interpreter post-execution, which is necessary if the next module to use the interpreter imports an entirely distinct implementation of ansible.module_utils.third_party_module.
Today, instead the interpreter forks when an extended or overridden module is found, and a custom importer is used to implement the overrides. This introduces an unavoidable inefficiency when the feature it in use, but it is still far better than always forking, or running the risk of varying module_utils search paths causing unfixable crashes.

Container Connections

To aid a common use-case for Connection Delegation, new connection types were added to support Linux containers and FreeBSD jails. It is now possible to run Ansible within a remote container reached via SSH, solving a common upstream feature request.

Presently although the container must have Python installed, matching Ansible's existing behaviour, it occurred to me that when the host machine has Python installed, there is no reason why Python needs to exist within the container. This would make a powerful feature made easy through Mitogen's design, and in a common use case, would support the ability to run auditing/compliance playbooks against app containers that were otherwise never customized for use with Ansible.

Su Become Method Support

Low-hanging fruit from the original crowdfunding plan. Now su(1) may be used for privilege escalation as easily as sudo(1).

Sudo/Su Connection Types

To support testing and somewhat uncommon use cases where a large number of user accounts may be targeted for parallel deployment on a small number of machines, there now exist explicit mitogen_sudo and mitogen_su connection types that, in combination with Connection Delegation, allow a single SSH connection to exist to a remote machine while exposing user accounts as individual (and therefore parallelizable) targets in Ansible inventory.

This sits somewhere between "hack" and "gorgeous", I really have no idea which, however it does make it simple to exploit Ansible's parallelism in certain setups, such as traditional web hosting where each customer exists as a UNIX account on a small number of machines.

Security

Unidirectional Routing exists and is always enabled for Ansible. This prohibits what was previously a new communication style available to targets, that, although ideally benign and potentially very powerful, fundamentally altered Ansible's security model and risked solution acceptance. It was possible for targets to send each other messages, and although permission checks occur on reception and thus should be harmless, represented the ability for otherwise air-gapped networks to be temporarily bridged for the duration of a run.

Secrets Masking

Mitogen supports new Blob() and Secret() string wrappers whose repr() contains a substitute for the actual value. These are employed in the Ansible extension, ensuring passwords and bulk file transfer data are no longer logged when verbose output is enabled. The types are preserved on deserialization, ensuring log messages generated by targets receive identical treatment.

User/misc bug fixes
Asynchronous Tasks (.. again, and again)

Ongoing work on the asynchronous task implementation has caused it to evolve once again, this time to make use of a new subtree detachment feature in the core library. The new approach is about 70% of what is needed for the final design, with one major hitch remaining.

Since an asynchronous task must outlive its parent, it must have a copy of every dependency needed by the module it will execute prior to disconnecting from the parent. This is exorbitantly fiddly work, interacting with many aspects including not least custom module_utils, and represents the last major obstacle in producing a functionally complete extension release.

Industrial grade multiplexing

Mitogen now supports swapping select(2) for epoll(7) or kqueue(2) depending on the host operating system, blasting through the maximum file descriptor limit of select(2), and ensuring this is no longer a hindrance for many-target runs. Children initially use the select(2) multiplexer (tiny and guaranteed available) until they become parents, when the implementation is transparently swapped for the real deal.

In future some interface tweaks are desirable to make full use of the new multiplexers: at least epoll(4) supports options that significantly reduce the system calls necessary to configure it. Although I have not measured a performance regression due to these calls, their presence is bothersome.

Many-Target Performance

Some expected growing pains appeared when real multiplexing was implemented. For testing I adopted a network of VMs running DebOps common.yml, with a quota for up to 500 targets, but so far, it is not possible to approach that without drowning in the kinks that start to appear. While some of these almost certainly lie on the Mitogen side, when profiling with only 40 targets enabled, inefficiencies in Mitogen are buried in the report by extreme inefficiencies present in Ansible itself.

Among the problems:
- 25% runtime wasted calling glob() (task setup stress test, not a real playbook)
- 10% runtime wasted enumerating template filters (stress test)
- 50% runtime wasted constructing task variables pre-fork (real run)
- >50% runtime wasted recompiling templates (stress test)
And with that we reach a nexus: we have almost exhausted what can be accomplished working from the bottom-up, profiling on a micro scale is no longer sufficient to meet project goals, while fixing problems identified through profiling on a macro scale exceeds the project scope. Therefore, (lightning bolts, wild cackles), a new plan emerges..

Branching for a beta

With the exception of async tasks I consider the master branch to be in excellent health - for smaller target counts. For larger runs, wider-reaching work is necessary, but it does not make sense to disrupt the existing design due to it. Therefore master will be branched with the new branch kept open for fixes, not least the final pieces of async, while continuing work in parallel on a new increment.

Extension v2

Vanilla Ansible forks each time it executes a task, with the corresponding action plug-in gaining control of the main thread until completion, upon which all state aside from the task result is lost. When running under the extension, a connection multiplexer process is forked once at startup, and a separate broker thread exists in each forked task subprocess that connects back to the connection multiplexer process over a UNIX socket - necessary in the current design to have a persistent location to manage connections.

The new design comes in the form of a complete reworking of the Ansible linear strategy. Today's extension wraps Ansible's strategies while preserving their process and execution model. To implement the enhancements above sensibly, additional persistence is required and it becomes necessary to tackle a strategy implementation head-on.

The old desire for per-CPU connection multiplexers is incorporated, but moves those multiplexers back into Ansible, much like the pre-crowdfund extension. The top-level controller process gains a Mitogen broker thread with per-CPU forked children acting as connection multiplexers, and hosting service threads on which action plug-ins can sleep. Unlike vanilla Ansible, these processes exist for the duration of the run rather than per-task.

From the vantage point of only $ncpus processes, it is easy to fix template precompilation, plug-in path caching, connection caching, target<->worker affinity, and ensuring task variable generation is parallelized. Some sizeable obstacles exist, not least:
- Liberal shared data structure mutation in the task executor that must be fixed to handle threading, mostly contained to PlayContext.
- Preserving the existing callback plug-in model. Callbacks must always fire in the top-level process.
- Synchronization or serialization overhead, pick one. Either the strategy logic runs duplicate in each child (requiring coordination with the top-level process), or it runs once in the parent, and configuration must be serialized for every task.
Can't this be done upstream?

It should, but I've experimented and there simply isn't time. If >1 week is reasonable to add missing documentation, there is no hope real patches will land before full-time work must conclude. For upstreaming to happen the onus lies with the 20+ strong permanent team, it's simply not possible to commit unbounded time to land even trivial changes, a far cry from occasional patches to a privately controlled repository.

At least 16k words have been spent since conversations started around September 2017, and while they bore some fruit over time, few actionable outcomes have resulted, and the detectable levels of team-originated engagement regarding the work has been minimal. There is no expectation of fireworks, however it may be helpful to realize after 3 months no evidence exists of any member testing the code and experiencing success or failure, let alone a report of such.

It's sufficient to say after so long I find this increasingly troublesome, and while I cannot hope to understand internal priorities, as an outside contributor funded by end users, soliciting engagement on a well-documented enhancement that in some scenarios nets an order of magnitude performance improvement to a commercial product, some rather basic questions come to mind.
Code Quality

There is a final uneasy aspect to upstreaming, and it is that of being left with the task of cleaning up, with no guarantee the mess won't simply return. Some of this code is in an abject (253 LOC, 37 locals) state (279 LOC, 24 locals) of sin (306 LOC, 38 locals), for 2018 and in a product less than 72 months old, that has been funded almost since inception. While I have begun refactoring the strategy plug-in within the confines of the Mitogen repository, responsibility for benefitting from that work in mainline rests with others.

Until next time!
- May 23, 2018
- Home