<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>sweetness.hmmz.org</title><link>https://sweetness.hmmz.org/</link><description></description><lastBuildDate>Mon, 28 Oct 2019 12:30:00 +0000</lastBuildDate><item><title>Operon: Extreme Performance For Ansible</title><link>https://sweetness.hmmz.org/2019-10-28-operon.html</link><description>&lt;p&gt;&lt;base target="_blank"&gt;&lt;/p&gt;
&lt;p&gt;I'm very excited to unveil &lt;strong&gt;&lt;a href="https://networkgenomics.com/operon/"&gt;Operon&lt;/a&gt;&lt;/strong&gt;, a
high performance replacement for Ansible&amp;reg; Engine, tailored for large
installations and offered by subscription. Operon runs your existing playbooks,
modules, plug-ins and third party tools without modification using an upgraded
engine, dramatically increasing the practical number of nodes addressable in a
single run, and potentially saving hours on every invocation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operon can be installed independently or side-by-side&lt;/strong&gt; with Ansible Engine,
enabling it to be gradually introduced to your existing projects or employed on
a per-run basis.&lt;/p&gt;
&lt;p&gt;Here is the runtime for 416 tasks of common.yml from
&lt;a href="https://debops.org/"&gt;DebOps&lt;/a&gt; 0.7.2 deployed via SSH:&lt;/p&gt;
&lt;p&gt;
&lt;center&gt;&lt;img src="/images/operon/1node-debops-docker-operon-mitogen.svg" style="padding-top: 1ex"&gt;&lt;/center&gt;
&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Operon reduces runtime by around 60% compared to Ansible for a single node, but
things really heat up for large runs. See how runtime scales using a 24 GiB, 8
core Xeon E5530 deploying to Google Cloud VMs over an &lt;nobr&gt;18 ms&lt;/nobr&gt; SSH
connection:&lt;/p&gt;
&lt;p&gt;
&lt;center&gt;&lt;img src="/images/operon/run-scaling.svg" style="padding-top: 1ex"&gt;&lt;/center&gt;
&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Each run executed 416 tasks per node, including loop items. In the 1,024 node
run, 490,496 tasks executed in 54 minutes, giving an average throughput of
&lt;strong&gt;&lt;nobr&gt;151 tasks per second&lt;/nobr&gt;&lt;/strong&gt;. Linear scaling is apparent,
with just under 4x time increase moving from 256 to 1,024 nodes.&lt;/p&gt;
&lt;p&gt;The 256 node Ansible run was cancelled following a lengthy period with no
output, after many re-runs to iteratively reduce forks from 40 to 10, so
Ansible would not exceed RAM. A 13 fork run may have succeeded, but further
attempts were abandoned having consumed two days worth of compute time.&lt;/p&gt;
&lt;p&gt;In the final run, Ansible completed 89% of tasks in 6h 13m prior to
cancellation:&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;p&gt;
&lt;font size="16" color="#7f0000"&gt;&lt;b&gt;256 Nodes, DebOps common.yml&lt;/b&gt;&lt;/font&gt;&lt;br&gt;
&lt;img src="/images/operon/ansible-256-cancelled-v1.svg"&gt;
&lt;br&gt;
&lt;/p&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Operon deployed to all nodes in parallel for every run presented. Operon has
imperceptible overhead executing &lt;strong&gt;&lt;nobr&gt;1,024 forks given 8 cores&lt;/nobr&gt;&lt;/strong&gt; and
cleanly scales to at least 6,144 given 24 cores. Had these results been
recorded using 16 cores rather than 8, we expect the 1,024 node run would
complete in &lt;nobr&gt;27 minutes&lt;/nobr&gt; rather than &lt;nobr&gt;54 minutes&lt;/nobr&gt;.&lt;/p&gt;
&lt;p&gt;Memory usage is highly predictable and significantly decoupled from forks. With
256 forks, Operon uses &lt;strong&gt;&lt;nobr&gt;4x less RAM&lt;/nobr&gt;&lt;/strong&gt; than Ansible
uses for 10 forks, while consuming at least 15x less controller CPU time to
achieve the same outcome.&lt;/p&gt;
&lt;p&gt;
&lt;center&gt;&lt;img src="/images/operon/ram-scaling.svg" style="padding-top: 1ex"&gt;&lt;/center&gt;
&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;This graph is crooked as the 64 node Ansible run executed with 40 forks, while
the 256 node run executed with 10 forks. Ansible required 1.6 GiB per fork
for the 256 node run, placing a severe restraint on achievable parallelism
regardless of available RAM.&lt;/p&gt;
&lt;!--
Despite the improvements presented, widespread inefficiency in the Ansible code
inherited by Operon likely means RAM usage remains up to 4x and runtime up to
10x higher than in a correct design.
--&gt;

&lt;p&gt;Operon is the progression of a design approach first debuted in &lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen for
Ansible&lt;/a&gt;. It inherits massive low-level
efficiency improvements from that work, already depended on by thousands of
users:&lt;/p&gt;
&lt;p&gt;
&lt;center&gt;&lt;img src="/images/operon/bandwidth-consumption.svg" style="padding-top: 1ex"&gt;&lt;/center&gt;
&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;center&gt;&lt;img src="/images/operon/target-cpu-usage.svg" style="padding-top: 1ex"&gt;&lt;/center&gt;
&lt;br&gt;
&lt;/p&gt;

&lt;h3&gt;Beyond software&lt;/h3&gt;
&lt;p&gt;&lt;img width="180" height="180" src="/images/operon/no-deprecations.svg" align=right style="padding-left: 15px;"&gt;&lt;/p&gt;
&lt;p&gt;Performance is a secondary effect of a &lt;strong&gt;culture shift&lt;/strong&gt; towards stronger user
friendliness, compatibility and cost internalization. There is a lot to reveal
here, but to offer a taste of what's planned, I'm pleased to announce a
&lt;strong&gt;forwards-compatible playbook syntax guarantee&lt;/strong&gt;, in addition to restoration
of specific Ansible Engine constructs marked deprecated.&lt;/p&gt;
&lt;style&gt; 
.fonk {
    border-spacing: 15px;
    border-collapse: separate;
}

.fonk1 td { padding-right: 15px; padding-bottom: 15px; }
.fonk td { width: 50%; }&lt;/style&gt;

&lt;table class=fonk width=100%&gt;
&lt;tr&gt;

&lt;td class=fonk1&gt;
    &lt;strong&gt;include:&lt;/strong&gt;
    &lt;pre&gt;- include: "i-will-always-work.yml"

&lt;/pre&gt;

&lt;td&gt;
    &lt;strong&gt;"with" loops&lt;/strong&gt;
    &lt;pre&gt;- debug: msg={{item}}
  with_items: ["i", "will", "always", "work"]&lt;/pre&gt;

&lt;tr&gt;
&lt;td class=fonk1&gt;
  &lt;strong&gt;"squash actions"&lt;/strong&gt;

  &lt;pre&gt;- apt:
    name: "{{item}}"
  with_items: ["i", "will",
               "always", "work"]&lt;/pre&gt;

&lt;td&gt;
  &lt;strong&gt;hyphens in group names&lt;/strong&gt;

  &lt;pre&gt;
  $ cat hosts
  [i-will-always-work.us.mycorp.com]
  host1

&lt;/pre&gt;

&lt;tr&gt;

&lt;td class=fonk1 colspan=2&gt;
  &lt;strong&gt;hash merging&lt;/strong&gt;

  &lt;pre&gt;
  # I will always work
  [defaults]
  hash_behaviour = merge&lt;/pre&gt;


&lt;/table&gt;

&lt;p&gt;The Ansible 2.9-compatible syntax Operon ships &lt;strong&gt;will always be supported&lt;/strong&gt;,
and future syntax deprecations in Ansible Engine do not apply in Operon.
Changes like these harm working configurations without improving capability,
and are a major source of error-prone labour during upgrades.&lt;/p&gt;
&lt;p&gt;Over time this guarantee will progressively extend to engine semantics and
outwards.&lt;/p&gt;
&lt;h3&gt;How can I get this?&lt;/h3&gt;
&lt;p&gt;Operon is initially distributed with &lt;a href="https://networkgenomics.com/operon/"&gt;support from Network
Genomics&lt;/a&gt;, backed by experience and
dedication to service unavailable elsewhere. If your team are gridlocked by
deployments or fatigued by years of breaking upgrades, &lt;strong&gt;&lt;a href="https://networkgenomics.com/operon/"&gt;consider
requesting an evaluation&lt;/a&gt;&lt;/strong&gt;, and
don't hesitate to &lt;nobr&gt;&lt;a href="mailto:dw@networkgenomics.com"&gt;drop me an
e-mail&lt;/a&gt;&lt;/nobr&gt; with any questions and concerns.&lt;/p&gt;
&lt;p&gt;Software is always better in the
open, so a public release will happen when some level of free support can be
provided. &lt;a href="https://networkgenomics.com/mail/operon-announce/subscribe/"&gt;Subscribe to the
operon-announce&lt;/a&gt;
mailing list to learn about future releases.&lt;/p&gt;
&lt;h3&gt;Will Operon help Windows performance?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Yes&lt;/strong&gt;. If you're struggling with performance deploying to Windows, please get
in touch.&lt;/p&gt;
&lt;h3&gt;Will Operon help network device performance?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Yes&lt;/strong&gt;. Operon features an architectural redesign that extends far beyond the
transport layer, and applying to all connection types equally.&lt;/p&gt;
&lt;h3&gt;Is Operon a fork of Ansible?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;No&lt;/strong&gt;. Operon is an incremental rewrite of the engine, a small component of
around 60k code lines, of which around a quarter are replaced. Every Ansible
installation includes around 715k lines, of which the vast majority is
independently maintained by the wider Ansible community, just as Operon is.&lt;/p&gt;
&lt;h3&gt;Will Operon help improve Ansible Engine?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Yes&lt;/strong&gt;. Operon is already promoting improvement within Ansible Engine, and
since it remains an upstream, an incentive exists to contribute code upstream
where practical.&lt;/p&gt;
&lt;h3&gt;Is Operon free software?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Yes&lt;/strong&gt;. Operon is covered by same GPL license that covers Ansible, and you are
free to make use of the code to the full extent of that license.&lt;/p&gt;
&lt;h3&gt;Does Operon break compatibility?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;No&lt;/strong&gt;. Operon does not break compatibility with the standard module
collection, plug-in interfaces, or the surrounding Ansible ecosystem, and never
plans to. Compatibility is a primary deliverable, including to keep pace with
future improvements, and backwards compatibility such as improved playbook
syntax stability.&lt;/p&gt;
&lt;h3&gt;I target only one node, what can Operon do for me?&lt;/h3&gt;
&lt;p&gt;Operon will help ensure the continued marketability of skills you have heavily
invested in. It offers a powerful new flexibility that previously could not
exist: your freedom to choose an engine. Whether you use it directly or not,
you already benefit from Operon.&lt;/p&gt;
&lt;p&gt;&lt;br&gt;
David&lt;/p&gt;
&lt;style&gt;
    h3 { color: #7f0000; padding-top: 1em; }
&lt;/style&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Mon, 28 Oct 2019 12:30:00 +0000</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2019-10-28:/2019-10-28-operon.html</guid></item><item><title>Mitogen v0.2.8 released</title><link>https://sweetness.hmmz.org/2019-08-18-mitogen-v0-2-8.html</link><description>&lt;p&gt;&lt;img src="/images/mito1/mitogen.svg" class="mitogen-right-180 mitogen-logo-wrap"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mitogen.networkgenomics.com/changelog.html#v0-2-8-2019-08-18"&gt;Mitogen for
Ansible&lt;/a&gt;
v0.2.8 has been released. This version (finally) supports Ansible 2.8, comes
with a supercharged replacement &lt;code&gt;fetch&lt;/code&gt; module, and includes roughly
85% of what is needed to implemement fully asynchronous connect.&lt;/p&gt;
&lt;p&gt;As usual a huge slew of fixes are included. This is a bumper release, running
to over 20k lines of diff. Get it while it's hot, and as always, &lt;a href="https://github.com/dw/mitogen/issues/new/"&gt;bug
reports&lt;/a&gt; are welcome!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Sun, 18 Aug 2019 21:45:59 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2019-08-18:/2019-08-18-mitogen-v0-2-8.html</guid></item><item><title>Threadless mode in Mitogen 0.3</title><link>https://sweetness.hmmz.org/2019-02-16-mitogen-threadless.html</link><description>&lt;p&gt;&lt;img src="/images/mito1/mitogen.svg" class="mitogen-right-180 mitogen-logo-wrap"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen&lt;/a&gt; has been explicitly
multi-threaded since the design was first conceived. This choice is hard to
regret, as it aligns well with the needs of &lt;a href="https://mitogen.networkgenomics.com/howitworks.html#use-of-threads"&gt;operating systems like
Windows&lt;/a&gt;,
makes background tasks like proxying possible, and allows painless integration
with existing programs where the user doesn't have to care how communication is
implemented. Easy blocking APIs simply work as documented from any context, and
magical timeouts, file transfers and routing happen in the background without
effort.&lt;/p&gt;
&lt;p&gt;The story has for the most part played out well, but as work on the Ansible
extension revealed, this thread-centric worldview is more than somewhat
idealized, and scenarios exist where background threads are not only
problematic, but a serious hazard that works against us.&lt;/p&gt;
&lt;p&gt;For that reason a new operating mode will hopefully soon be included, one where
relatively minor structural restrictions are traded for no background thread at
all. This article documents the reasoning behind threadless mode, and a strange
set of circumstances that allow such a major feature to be supported with the
same blocking API as exists today, and surprisingly minimal disruption to
existing code.&lt;/p&gt;
&lt;h3&gt;Recap&lt;/h3&gt;
&lt;p&gt;&lt;img width="100%" src="https://mitogen.networkgenomics.com/_images/layout.svg"&gt;&lt;/p&gt;
&lt;p&gt;Above is a rough view of Mitogen's process model, revealing a desirable
symmetry as it currently exists. In the master program and replicated children,
the user's code maintains full control of the main thread, with library
communication requirements handled by a background thread using an identical
implementation in every process.&lt;/p&gt;
&lt;p&gt;Keeping the user in control of the main thread is important, as it possesses
certain magical privileges. In Python it is the only thread from which signal
handlers can be installed or executed, and on Linux some niche system
interfaces require its participation.&lt;/p&gt;
&lt;p&gt;When a method like &lt;code&gt;remote_host.call(myfunc)&lt;/code&gt; is invoked, an outgoing message
is constructed and enqueued with the Broker thread, and a callback handler is
installed to cause any return value response message to be posted to another
queue created especially to receive it. Meanwhile the thread that invoked
&lt;code&gt;Context.call(..)&lt;/code&gt; sleeps waiting for a message on the call's dedicated reply
queue.&lt;/p&gt;
&lt;h3&gt;Latches&lt;/h3&gt;
&lt;p&gt;Those queues aren't simply &lt;code&gt;Queue.Queue&lt;/code&gt;, but a &lt;a href="https://mitogen.networkgenomics.com/howitworks.html#waking-sleeping-threads"&gt;custom
reimplementation&lt;/a&gt;
added early during Ansible extension development, as deficiencies in Python 2.x
threading began to manifest. Python 2 permits the choice between up to 50 ms
latency added to each &lt;code&gt;Queue.get()&lt;/code&gt;, or for waits to execute with UNIX signals
masked, thus preventing CTRL+C from interrupting the program. Given these
options a reimplementation made plentiful sense.&lt;/p&gt;
&lt;p&gt;The custom queue is called &lt;code&gt;Latch&lt;/code&gt;, a name chosen simply because it was short
and vaguely fitting. To say its existence is a great discomfort would be an
understatement: reimplementing synchronization was never desired, even if just
by leveraging OS facilities. True to tribal wisdom, the folly of &lt;code&gt;Latch&lt;/code&gt; has
been a vast time sink, costing many days hunting races and subtle
misbehaviours, yet without it, good performance and usability is not possible
on Python 2, and so it remains.&lt;/p&gt;
&lt;p&gt;Due to this, when any thread blocks waiting for a result from a remote process,
it always does so within &lt;code&gt;Latch&lt;/code&gt;, a detail that will soon become important.&lt;/p&gt;
&lt;h3&gt;The Broker&lt;/h3&gt;
&lt;p&gt;Threading requirements are mostly due to &lt;code&gt;Broker&lt;/code&gt;, a thread that has often
changed role over time. Today its main function is to run an I/O multiplexer,
like &lt;a href="https://twistedmatrix.com/trac/"&gt;Twisted&lt;/a&gt; or
&lt;a href="https://docs.python.org/3/library/asyncio.html"&gt;asyncio&lt;/a&gt;. Except for some
local file IO in master processes, broker thread code is asynchronous,
regardless of whether it is communicating with a remote machine via an SSH
subprocess or a local thread via a &lt;code&gt;Latch&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When a user's thread is blocked on a reply queue, that thread isn't really
blocked on a remote process - it is waiting for the broker thread to receive
and decode any reply, then post it to the queue (or &lt;code&gt;Latch&lt;/code&gt;) the thread is
sleeping on.&lt;/p&gt;
&lt;h3&gt;Performance&lt;/h3&gt;
&lt;p&gt;Having a dedicated IO thread in a multi-threaded environment simplifies
reasoning about communication, as events like unexpected disconnection always
occur in a consistent location far from user code. But as is evident, it means
every IO requires interaction of two threads in the local process, and when
that communication is with a remote Mitogen process, a further two in the
remote process.&lt;/p&gt;
&lt;p&gt;It may come as no surprise that poor interaction with the OS scheduler often
manifests, where load balancing pushes related communicating threads out across
distinct cores, where their execution schedule bears no resemblance to the
inherent lock-step communication pattern caused by the request-reply structure
of RPCs, and between threads of the same process due to the &lt;a href="https://en.wikipedia.org/wiki/Global_interpreter_lock"&gt;Global Interpreter
Lock&lt;/a&gt;. The range of
undesirable effects defies simple description, it is sufficient to say that
poor behaviour here can be disastrous.&lt;/p&gt;
&lt;p&gt;To cope with this, the Ansible extension introduced &lt;a href="https://github.com/dw/mitogen/blob/06c116257fdc2bd24fcd7b308738f3e06e6a433f/ansible_mitogen/affinity.py#L30"&gt;CPU
pinning&lt;/a&gt;.
This feature locks related threads to one core, so that as a user thread enters
a wait on the broker after sending it a message, the broker has much higher
chance of being scheduled expediently, and for its use of shared resources
(like the GIL) to be uncontended and exist in the cache of the CPU it runs on.&lt;/p&gt;
&lt;table width="100%" class="tbl"&gt;
    &lt;caption&gt;
        Runs of &lt;a
        href="https://github.com/dw/mitogen/blob/master/tests/bench/roundtrip.py"&gt;tests/bench/roundtrip.py&lt;/a&gt;
        with and without pinning.
    &lt;/caption&gt;

    &lt;tr&gt;
    &lt;th style="text-align: center;"&gt;Pinned?&lt;/th&gt;
    &lt;th style="text-align: center;" colspan=2&gt;Round-trip delay&lt;/th&gt;

    &lt;tr&gt;
    &lt;td rowspan=3 style="vertical-align: middle; text-align: center"&gt;No
    &lt;td style="text-align: center;"&gt;960 usec
    &lt;td rowspan=3 style="vertical-align: middle; text-align: center"&gt;
        Average 848 usec ± 111 usec
    &lt;tr&gt;&lt;td style="text-align: center;"&gt;782 usec
    &lt;tr&gt;&lt;td style="text-align: center;"&gt;803 usec

    &lt;tr&gt;
    &lt;td rowspan=3 style="vertical-align: middle; text-align: center"&gt;Yes
    &lt;td style="text-align: center;"&gt;198 usec
    &lt;td rowspan=3 style="vertical-align: middle; text-align: center"&gt;
        Average 197 usec ± 1 usec
    &lt;tr&gt;&lt;td style="text-align: center;"&gt;197 usec
    &lt;tr&gt;&lt;td style="text-align: center;"&gt;197 usec
&lt;/table&gt;

&lt;p&gt;It is hard to overstate the value of pinning, as revealed by the 20% speedup
&lt;a href="https://github.com/dw/mitogen/commit/c6d5aa29bafb8ff2e857b6a5caf8e20f1cd4e86b"&gt;visible in this stress
test&lt;/a&gt;,
but enabling it is a double-edged sword, as the scheduler loses the freedom to
migrate processes to balance load, and no general pinning strategy is possible
that does not approach the complexity of an entirely new scheduler. As a simple
example, if two uncooperative processes (such as Ansible and, say, a database
server) were to pin their busiest workers to the same CPU, both will suffer
disastrous contention for resources that a scheduler could alleviate if it were
permitted.&lt;/p&gt;
&lt;p&gt;While performance loss due to scheduling could be considered a scheduler bug,
it could be argued that expecting consistently low latency lock-step
communication between arbitrary threads is unreasonable, and so it is desirable
that threading rather than scheduling be considered at fault, especially as one
and not the other is within our control.&lt;/p&gt;
&lt;p&gt;The desire is not to remove threading entirely, but instead provide an option
to disable it where it makes sense. For example in Ansible, it is possible to
almost halve the running threads if worker processes were switched to a
threadless implementation, since there is no benefit in the otherwise
single-threaded &lt;code&gt;WorkerProcess&lt;/code&gt; from having a distinct broker thread.&lt;/p&gt;
&lt;h3&gt;UNIX fork()&lt;/h3&gt;
&lt;p&gt;In its UNIX manifestation, &lt;code&gt;fork()&lt;/code&gt; is a defective abstraction surviving
through symbolism and dogma, conceived at a time long predating the 1984
actualization of the problem it failed to solve. It has remained obsolete ever
since. A full description of this exceeds any one paragraph, and an article in
drafting since October already in excess of 8,000 words has not yet succeeded
in fully capturing it.&lt;/p&gt;
&lt;p&gt;For our purposes it is sufficient to know that, as when mixed with most UNIX
facilities, &lt;a href="https://rachelbythebay.com/w/2011/06/07/forked/"&gt;mixing &lt;code&gt;fork()&lt;/code&gt; with threads is extremely
unsafe&lt;/a&gt;, but many UNIX
programs presently rely on it, such as in Ansible's forking of per-task worker
processes. For that reason in the Ansible extension, Mitogen cannot be
permanently active in the top-level process, but only after fork within a
"connection multiplexer" subprocess, and within the per-task workers.&lt;/p&gt;
&lt;p&gt;In upcoming work, there is a renewed desire for a broker to be active in the
top-level process, but this is extremely difficult while remaining compatible
with Ansible's existing forking model. A threadless mode would be immediately
helpful there.&lt;/p&gt;
&lt;h3&gt;Python 2.4&lt;/h3&gt;
&lt;p&gt;Another manifestation of &lt;code&gt;fork()&lt;/code&gt; trouble comes in Python 2.4, where the
youthful implementation makes no attempt to repair its threading state after
fork, leading to incurable deadlocks across the board. For this reason when
running on Python 2.4, the Ansible extension disables its internal use of fork
for isolation of certain tasks, but it is not enough, as deadlocks while
starting subprocesses are also possible.&lt;/p&gt;
&lt;p&gt;A common idea would be to forget about Python 2.4 as it is too old, much as it
is tempting to imagine HTTP 0.9 does not exist, but as in that case, Python is
treated not just as a language runtime, but as an established network protocol
that must be implemented in order to communicate with infrastructure that will
continue to exist long into the future.&lt;/p&gt;
&lt;h3&gt;Implementation Approach&lt;/h3&gt;
&lt;p&gt;Recall it is not possible for a user thread to block without waiting on a
&lt;code&gt;Latch&lt;/code&gt;.  With threadless mode, we can instead reinterpret the presence of a
waiting &lt;code&gt;Latch&lt;/code&gt; as the user's indication some network IO is pending, and since
the user cannot become unblocked until that IO is complete, and has given up
forward execution in favour of waiting, &lt;code&gt;Latch.get()&lt;/code&gt; becomes the only location
where the IO loop must run, and only until the &lt;code&gt;Latch&lt;/code&gt; that caused it to run
has some result posted to it by the previous iteration.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nv"&gt;@mitogen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threadless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;host1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ssh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;a.b.c&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;host2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ssh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hostname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;c.b.a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;call1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;host1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;hostname&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;call2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;host2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;hostname&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;call1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;unpickle&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;call2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;unpickle&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;In the example, after the (presently blocking) connection procedure completes,
neither &lt;code&gt;call_async()&lt;/code&gt; wakes any broker thread, as none exists. Instead they
enqueue messages for the broker to run, but the broker implementation does not
start execution until &lt;code&gt;call1.get()&lt;/code&gt;, where &lt;code&gt;get()&lt;/code&gt; is internally synchronized
using &lt;code&gt;Latch&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The broker loop ceases after a result becomes available for the &lt;code&gt;Latch&lt;/code&gt; that is
executing it, only to be restarted again for &lt;code&gt;call2.get()&lt;/code&gt;, where it again runs
until its result is available. In this way asynchronous execution progresses
opportunistically, and only when the calling thread indicated it cannot
progress until a result is available.&lt;/p&gt;
&lt;p&gt;Owing to the inconvenient existence of &lt;code&gt;Latch&lt;/code&gt;, an initial prototype was
functional with only a 30 line change. In this way, an ugly and undesirable
custom synchronization primitive has accidentally become the centrepiece of an
important new feature.&lt;/p&gt;
&lt;h3&gt;Size Benefit&lt;/h3&gt;
&lt;p&gt;The intention is that threadless mode will become the new default in a future
version. As it has much lower synchronization requirements, it becomes possible
to move large pieces of code out of the bootstrap, including any relating to
implementing the UNIX self-pipe trick, as required by &lt;code&gt;Latch&lt;/code&gt;, and to wake the
broker thread from user threads.&lt;/p&gt;
&lt;p&gt;Instead this code can be moved to a new &lt;code&gt;mitogen.threads&lt;/code&gt; module, where it can
progressively upgrade an existing threadless &lt;code&gt;mitogen.core&lt;/code&gt;, much like
&lt;code&gt;mitogen.parent&lt;/code&gt; already progressively upgrades it with an industrial-strength
&lt;code&gt;Poller&lt;/code&gt; as required.&lt;/p&gt;
&lt;p&gt;Any code that can be removed from the bootstrap has an immediate benefit on
cold start performance with large numbers of targets, as the bottleneck during
cold start is often a restriction on bandwidth.&lt;/p&gt;
&lt;h3&gt;Performance Benefit&lt;/h3&gt;
&lt;p&gt;Threadless mode tallies in well with existing desires to lower latency and
resource consumption, such as &lt;a href="https://sweetness.hmmz.org/2018-08-30-mitogen-03-xmit.html"&gt;the plan to reduce context
switches&lt;/a&gt;.&lt;/p&gt;
&lt;style&gt;
    .right-aligned td,
    .right-aligned th {
        text-align: right;
    }
&lt;/style&gt;

&lt;table width="80%" class="tbl right-aligned"&gt;
    &lt;caption&gt;
        Runs of &lt;a
        href="https://github.com/dw/mitogen/blob/master/tests/bench/roundtrip.py"&gt;tests/bench/roundtrip.py&lt;/a&gt;
        with and without threadless
    &lt;/caption&gt;

    &lt;tr&gt;
    &lt;th&gt;&lt;/th&gt;
    &lt;th&gt;Threaded+Pinned
    &lt;th&gt;Threadless

    &lt;tr&gt;
    &lt;td&gt;Average Round-trip Time
    &lt;td&gt;201 usec
    &lt;td&gt;131 usec (-34.82%)

    &lt;tr&gt;
    &lt;td&gt;Elapsed Time
    &lt;td&gt;4.220 sec
    &lt;td&gt;3.243 sec (-23.15%)

    &lt;tr&gt;
    &lt;td&gt;Context Switches
    &lt;td&gt;304,330
    &lt;td&gt;40,037 (-86.84%)

    &lt;tr&gt;
    &lt;td&gt;Instructions
    &lt;td&gt;10,663,813,051
    &lt;td&gt;8,876,096,105 (-16.76%)

    &lt;tr&gt;
    &lt;td&gt;Branches
    &lt;td&gt;2,146,781,967
    &lt;td&gt;1,784,930,498 (-15.85%)

    &lt;tr&gt;
    &lt;td&gt;Page Faults
    &lt;td&gt;6,412
    &lt;td&gt;17,529 (+173.37%)
&lt;/table&gt;

&lt;p&gt;Because no broker thread exists, no system calls are required to wake it when a
message is enqueued, nor are any necessary to wake the user thread when a reply
is received, nor any
&lt;a href="http://man7.org/linux/man-pages/man2/futex.2.html"&gt;futex()&lt;/a&gt; calls due to one
just-woke thread contending on a GIL that has not yet been released by a
just-about-to-sleep peer. The effect across two communicating processes is a
huge reduction in kernel/user mode switches, contributing to vastly reduced
round-trip latency.&lt;/p&gt;
&lt;p&gt;In the table an as-yet undiagnosed jump in page faults is visible. One
possibility is that either the Python or C library allocator employs a
different strategy in the absence of threads, the other is that a memory leak
exists in the prototype.&lt;/p&gt;
&lt;h3&gt;Restrictions&lt;/h3&gt;
&lt;p&gt;Naturally this will place some restraints on execution. Transparent routing
will no longer be quite so transparent, as it is not possible to execute a
function call in a remote process that is also acting as a proxy to another
process: proxying will not run while &lt;code&gt;Dispatcher&lt;/code&gt; is busy executing the
function call.&lt;/p&gt;
&lt;p&gt;One simple solution is to start an additional child of the proxying process in
which function calls will run, leaving its parent dedicated just to routing,
i.e. exclusively dedicated to running what was previously the broker thread. It
is expected this will require only a few lines of additional code to support in
the Ansible extension.&lt;/p&gt;
&lt;p&gt;For children of a threadless master, &lt;code&gt;import&lt;/code&gt; statements will hang while the
master is otherwise busy, but this is not much of a problem, since &lt;code&gt;import&lt;/code&gt;
statements usually happen once shortly after the first parent-&amp;gt;child call, when
the master will be waiting in a &lt;code&gt;Latch&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For threadless children, no background thread exists to notice a parent has
disconnected, and to ensure the process shuts down gracefully in case the main
thread has hung. Some options are possible, including starting a subprocess for
the task, or supporting &lt;code&gt;SIGIO&lt;/code&gt;-based asynchronous IO, so the broker thread can
run from the signal handler and notice the parent is gone.&lt;/p&gt;
&lt;p&gt;Another restriction is that when threadless mode is enabled, Mitogen primitives
cannot be used from multiple threads. After some consideration, while possible
to support, it does not seem worth the complexity, and would prevent the
aforementioned reduction of bootstrap code size.&lt;/p&gt;
&lt;h3&gt;Ongoing Work&lt;/h3&gt;
&lt;p&gt;Mitogen has quite an ugly concept of
&lt;a href="https://mitogen.networkgenomics.com/services.html"&gt;Services&lt;/a&gt;, added in a
hurry during the initial Ansible extension development. Services represent a
bundle of a callable method exposed to the network, a security policy
determining who may call it, and an execution policy governing its
concurrency requirements. Service execution always happens in a background
thread pool, and is used to implement things like file transfer in the Ansible
extension.&lt;/p&gt;
&lt;p&gt;Despite heavy use, it has always been an ugly feature as it partially
duplicates the normal parent-&amp;gt;child function call mechanism. Looking at
services from the perspective of threadless mode reveals some notion of a
"threadless service", and how such a threadless service looks even more similar
to a function call than previously.&lt;/p&gt;
&lt;p&gt;It is possible that as part of the threadless work, the unification of function
calls and services may finally happen, although no design for it is certain yet.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;There are doubtlessly many edge cases left to discover, but threadless mode
looks very doable, and promises to make Mitogen suitable in even more scenarios
than before.&lt;/p&gt;
&lt;p&gt;Until next time!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2017-09-15: &lt;a target="_blank" href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-03-06: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-07-10: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-07-10-mitogen-released.html"&gt;Mitogen released!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Sat, 16 Feb 2019 22:00:00 +0000</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2019-02-16:/2019-02-16-mitogen-threadless.html</guid></item><item><title>Mitogen v0.2.4 released</title><link>https://sweetness.hmmz.org/2019-02-10-mitogen-v0-2-4.html</link><description>&lt;p&gt;&lt;img src="/images/mito1/mitogen.svg" class="mitogen-right-180 mitogen-logo-wrap"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mitogen.networkgenomics.com/changelog.html#v0-2-5-2019-02-14"&gt;Mitogen for
Ansible&lt;/a&gt;
&lt;s&gt;v0.2.4&lt;/s&gt; v0.2.5 has been released. This version is noteworthy as it
contains major refinements to the core libary and Ansible extension to improve
its behaviour during larger Ansible runs.&lt;/p&gt;
&lt;p&gt;Work on scalability is far from complete, as it progresses towards inclusion of
a patch held back since last summer to introduce per-CPU multiplexers. The
current idea is to exhaust profiling gains from a single process before landing
it, as all single-CPU gains continue to apply in that case, and there is much
less risk of inefficiency being hidden in noise created by multiple multiplexer
processes.&lt;/p&gt;
&lt;p&gt;Please kick the tires, and as always, &lt;a href="https://github.com/dw/mitogen/issues/new/"&gt;bug
reports&lt;/a&gt; are welcome!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2017-09-15: &lt;a target="_blank" href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-03-06: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-07-10: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-07-10-mitogen-released.html"&gt;Mitogen released!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Sun, 10 Feb 2019 23:59:59 +0000</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2019-02-10:/2019-02-10-mitogen-v0-2-4.html</guid></item><item><title>Transmit optimisations in Mitogen 0.3</title><link>https://sweetness.hmmz.org/2018-08-30-mitogen-03-xmit.html</link><description>&lt;p&gt;An early goal for Mitogen was to make it simple to retrofit, avoiding any
"opinionated" choices likely to cause needless or impossible changes in
downstream code. Despite being internally asynchronous, a blocking and mostly
thread-safe API is exposed, with management of the asynchrony punted to a
thread, making integrating with a deployment script hopefully as easy as with a
GUI.&lt;/p&gt;
&lt;p&gt;Although rough edges remain due to this, such as struggles with subprocess
reaping, based on experience working on &lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen for
Ansible&lt;/a&gt; and ignoring complexities unique
to that environment, the design appears to mostly function as intended.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Mostly&lt;/em&gt; being operative as, due to the API choice, and despite gains already
witnessed in the extension, some internals remain overly simplistic. Naturally
as has been the lesson throughout, this of course means inefficient:
horrifyingly, crying-in-the-shower inefficient.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://sweetness.hmmz.org/2018-08-27-fork-in-the-road.html"&gt;While recently&lt;/a&gt;
attacking some of Ansible's grosser naivities, now the excesses of continuous
forking are gone, dirty laundry is again visible on Mitogen's side. This post
describes one offender: message transmission and routing, how it looks today,
why it is a tragedy, and how things will improve.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2017-09-15: &lt;a target="_blank" href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-03-06: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-07-10: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-07-10-mitogen-released.html"&gt;Mitogen released!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-08-27: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-08-27-fork-in-the-road.html"&gt;A fork in the road for Mitogen&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To recap, communication with a bootstrapped child is
&lt;a href="https://mitogen.networkgenomics.com/howitworks.html#stream-protocol"&gt;message-oriented&lt;/a&gt;
to escape the limitations of stream-oriented IO. When an application makes a
call, the sending thread enqueues a message with the broker thread, which is
responsible for all IO, then sleeps waiting for the broker to deliver a reply.&lt;/p&gt;
&lt;p&gt;This has many benefits: mutually ignorant threads can share a child without
coordination, since a central broker exists behind the scenes. Errors can only
occur on the broker thread, so handling is not spread throughout user code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Message Transmission&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Examining just the Mitogen aspects of transmission for an SSH-connected Ansible
target, below are the rough steps repeated for every message in the stable
branch.&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/mitogen-03-xmit/xmitpath1.svg" style="width: 80%; padding-left: 10%;"&gt;&lt;br&gt;
Despite removing most system calls to fit things in one diagram, there is still
plenty to absorb, and clearly many parts to what is conceptually a simple task.
A component called &lt;code&gt;Waker&lt;/code&gt; abstracts waking the broker thread. This
implements a variant of the &lt;a href="https://cr.yp.to/docs/selfpipe.html"&gt;UNIX self-pipe
trick&lt;/a&gt;, waking it by writing to a pipe it
is sleeping on.&lt;/p&gt;
&lt;p&gt;When the broker wakes, it calls waker's &lt;code&gt;on_receive&lt;/code&gt; handler,
causing any deferred functions to execute on its thread. Here the asynchronous
half of the router runs, picking a stream to forward the message.&lt;/p&gt;
&lt;p&gt;The stream responds by asking the broker to tell it when the SSH stream becomes
writeable, which is implemented differently depending on OS, but in most cases
it entails yet more system calls.&lt;/p&gt;
&lt;p&gt;Since usually the SSH input buffer is empty, the broker immediately wakes again
to call the stream's &lt;code&gt;on_transmit&lt;/code&gt; handler, finally passing the
message to SSH before marking the stream unwriteable again. At this point
execution moves to SSH, for little than to read from a socket, do some crypto
and write to another socket.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Better Message Transmission&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In total transmission requires at least 2 task switches, 2 loop iterations, at
least 5 reads/writes, and 2 poller reconfigurations.&lt;/p&gt;
&lt;p&gt;While superficially logical, one problem is already obvious: transmitting
always entails waking a thread, a nontrivial operation on UNIX. Another is the
biggest performance bottleneck, the IO loop, is forced to iterate twice for
every transmission, in part to cope with the possibility the SSH input buffer
is full.&lt;/p&gt;
&lt;p&gt;What if we were more optimistic: an error won't occur, and the SSH input buffer
probably has space. Since we aren't expecting to cleanup a failure, there is no
reason to involve the broker either. The new sequence:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/mitogen-03-xmit/xmitpath2.svg" style="width: 60%; padding-left: 10%;"&gt;&lt;br&gt;
Coordination is replaced with a lock, and the sending thread writes directly to
SSH. We no longer check for writeability: simply try the write and if it fails,
or buffered data exists, defer to the broker like before.&lt;/p&gt;
&lt;p&gt;Now we have 1 task switch, 0 loop iterations, 2 lock operations, 3
reads/writes, and 0 poller reconfigurations, but still there is that unsightly
task switch.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Even Better Message Transmission&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The Ansible extension and &lt;a href="https://sweetness.hmmz.org/2018-08-27-fork-in-the-road.html"&gt;new strategy
work&lt;/a&gt; both offer
something Ansible previously relied on SSH multiplexing to provide: a process
where connection state persists during a run. As persistence is under our
control, one final step becomes possible. Simply move SSH in-process:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/mitogen-03-xmit/xmitpath3.svg" style="Width: 60%; padding-left: 10%;"&gt;&lt;br&gt;
Now we have 0 task switches, 0 loop iterations, 2 lock operations, 1 write, and
0 poller reconfigurations, or simply put, the minimum possible to support a
threaded program communicating via SSH.&lt;/p&gt;
&lt;p&gt;Some exciting possibilities emerge: passwords can be typed without allocating a
PTY. Since usually &lt;a href="https://github.com/dw/mitogen/issues/337"&gt;Linux only supports 4,096
PTYs&lt;/a&gt;, this raises the scalability
upper bound while reducing resource usage. Much better buffering is possible,
eliminating Mitogen's own buffer, and optimally sizing SSH sockets to &lt;a href="https://mitogen.networkgenomics.com/internals.html#mitogen.core.CHUNK_SIZE"&gt;support
file
transfers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Of course downsides exist: unlike &lt;a href="https://www.libssh.org/"&gt;libssh&lt;/a&gt; or
&lt;a href="https://libssh2.org/"&gt;libssh2&lt;/a&gt;, OpenSSH is part of a typical workflow,
supports every authentication style, and it is common to stash configuration in
&lt;code&gt;~/.ssh/config&lt;/code&gt;. Although libssh supports SSH configuration parsing,
it's unclear how well it works in practice, and at least the author of
&lt;a href="https://parallel-ssh.org/"&gt;ParallelSSH&lt;/a&gt; (and wrappers for both libraries)
appears to have chosen libssh2 over it for reasons I'd like to discover.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Routing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For completeness, and since the diagrams exist already, here is routing between
two SSH children from the context of their parent on the stable branch:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/mitogen-03-xmit/route1.svg" style="width: 90%; padding-left: 5%;"&gt;&lt;br&gt;
While internal switching is avoided, those nasty loop iterations are visible,
as are the surrounding task switches. Optimistic sending benefits routing too:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/mitogen-03-xmit/route2.svg" style="width: 90%; padding-left: 5%;"&gt;&lt;br&gt;
Now the loop iterates once. Finally, with an in-process SSH client:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/mitogen-03-xmit/route3.svg" style="Width: 90%; padding-left: 5%;"&gt;&lt;br&gt;
A single thread is woken, receives the message to be forwarded, delivers it,
and sleeps all on one stack.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Complexity is fractal, but shying from it just leads to mediocre software. Both
improvements exist as branches, and both will be supported by the Ansible
extension in addition to the new work.&lt;/p&gt;
&lt;p&gt;Until next time!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Thu, 30 Aug 2018 19:12:00 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-08-30:/2018-08-30-mitogen-03-xmit.html</guid></item><item><title>A fork in the road for Mitogen</title><link>https://sweetness.hmmz.org/2018-08-27-fork-in-the-road.html</link><description>&lt;p&gt;&lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen for Ansible&lt;/a&gt;'s original plan
described facets of a scheme centered on features made possible by a rigorous
single cohesive distributed program model, but of those facets, it quickly
became clear that most users are really only interested in the big one: a much
faster Ansible.&lt;/p&gt;
&lt;p&gt;While I'd prefer feature work, this priority is fine: better performance
usually entails enhancements that benefit the overall scheme, and improving
people's lives in this manner is highly rewarding, so incentives remain
aligned. It is impossible not to find renewed energy when faced with comments
like this:&lt;/p&gt;
&lt;blockquote&gt;
    Enabling the mitogen plugin in ansible feels like switching from floppy to SSD&lt;br&gt;
    &lt;a href="https://t.co/nCshkioX9h"&gt;https://t.co/nCshkioX9h&lt;/a&gt;
&lt;/blockquote&gt;

&lt;p&gt;Although feedback on the project has been very positive, the existing solution
is sometimes not enough. Limitations in the extension and Ansible really bite,
most often manifesting when running against many targets. In these scenarios,
it is heartbreaking to see the work fail to help those who could benefit from
it most, and that's what I'd like to talk about.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Controller-side Performance&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some time ago I began refactoring Ansible's
&lt;a href="https://docs.ansible.com/ansible/2.5/user_guide/playbooks_strategies.html"&gt;linear&lt;/a&gt;
strategy, aiming to get it to where controller-side enhancements might exist
without adding more spaghetti, while becoming familiar with requirements for
later features. To recap, the strategy plugin is responsible for almost every
post-parsing task, including worker management. It is in many ways the beating
heart at the core of every Ansible run.&lt;/p&gt;
&lt;p&gt;After some months and one particularly enlightening conversation that work was
resumed, eventually subsuming all of the remaining strategy support and result
processing code, forming one huge refactor of a big chunk of upstream that has
been gathering dust for almost a month.&lt;/p&gt;
&lt;p&gt;The result exists today and is truly wonderful. It integrates Mitogen into the
heart of Ansible without baking it in, introduces a carefully designed process
model with strong persistence properties, eliminating most bottlenecks endured
by the extension and vanilla Ansible, and provides an architectural basis for
the next planned iteration of scalability work, Windows compatibility, some
features mentioned, and quite a few that have been kept quiet.&lt;/p&gt;
&lt;p&gt;With the new strategy it is possible to almost perfectly saturate an 8 vCPU
machine given 100 targets, with minimal loss of speedup compared to
single-target. Regarding single target, simple loops against localhost are up
to 4x faster than the current stable extension.&lt;/p&gt;
&lt;p&gt;While there are at least 2 obvious additional enhancements possible with this
work, development reached a natural break in order to allow stablizing one
piece of the puzzle at a time. Once this is done, it is clear exactly where to
pick things up next.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Deep Cuts&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img width="162" height="405" src="/images/ansible-stack.svg" align=right style="padding-left: 15px;"&gt;&lt;/p&gt;
&lt;p&gt;There's just a small hitch: this work goes deep, entailing changes that, while
so far would be possible as monkey-patches, are highly version-specific, and
unlikely to remain monkey-patchable as the branch receives real-world usage.
There must be a mechanism to ship unknown future patches to upstream code.&lt;/p&gt;
&lt;p&gt;It was hoped it could land after Ansible 2.7, benefitting from related changes
planned upstream, but they appear to have been delayed or abandoned, and so a
situation exists where improvements cannot be shipped for at least another 4-6
months, assuming the related changes finally arrived in Ansible 2.8.&lt;/p&gt;
&lt;p&gt;To the right is a rough approximation of components involved in executing a
playbook. Those modified or replaced by the stable extension are green, yellow
are replaced by the branch-in-waiting. Finally in orange are components
affected by planned features and optimizations.&lt;/p&gt;
&lt;p&gt;Although there are tens of thousands of lines of surrounding code, as should
hopefully be clear, the number of untouched major components involved in a run
has been dwindling fast. Put simply, the existing mechanism for delivering
improvements is reaching its limit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The F Word&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Any seasoned developer, especially those familiar with the size of the Ansible
code base, will hopefully understand the predicament. There is no problem
delivering improvements today, assuming an unsupported one-off code dump was
all anyone wanted, but that is never the case.&lt;/p&gt;
&lt;p&gt;The problem lies in entering an unsustainable permanent marriage with a large
project, not forgetting this outcome was an explicit non-goal from the start.
Simultaneously over the months significant trust has been garnered to deliver
these kinds of improvements, and abandoning one of the best yet would seem
foolish.&lt;/p&gt;
&lt;p&gt;Something of a many-variabled optimization process has recently come to an end,
and I've found a solution that I am comfortable with. While making a release
needs more time and may still not be definite, it seemed worth documenting at
least some of the reasoning behind it before it comes.&lt;/p&gt;
&lt;p&gt;Even though this outcome was undesirable, and although the solution in mind is
not without restraint, it is still a cloud with many silver linings. For
instance, new user configuration steps can be reduced to almost zero, core
features can be added with minimal friction, and creative limitations are
significantly uncapped.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What About The Extension?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The planned structure keeps the extension front-and-centre, so regardless of
outcome it will continue to receive the majority of feature work and
maintenance. It is definitely not going away.&lt;/p&gt;
&lt;p&gt;With a third stable release looming, it's probably high time for a quick
update. Many bugs were squashed since July, with stable work recently centered
around problems with Ansible 2.6. This involved some changes to temporary file
handling, and in the process, discovery of a huge missed optimization.&lt;/p&gt;
&lt;p&gt;v0.2.3 will need only 2 roundtrips for each
&lt;a href="https://docs.ansible.com/ansible/latest/modules/copy_module.html"&gt;copy&lt;/a&gt; and
&lt;a href="https://docs.ansible.com/ansible/latest/modules/template_module.html"&gt;template&lt;/a&gt;,
or in terms of a 250ms transcontinental link, 10 seconds to copy 20 files vs.
30 seconds previously, or 2 minutes compared to vanilla's best configuration.
This work is delayed somewhat as a new RPC chaining mechanism is added to
better support all similar future changes, and identical situations likely to
appear in similar tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2017-09-15: &lt;a target="_blank" href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-03-06: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-03-28: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-28-crowdfunding-mitogen-day-23.html"&gt;Crowdfunding Mitogen: day 23&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-04-20: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-04-20-crowfunding-mitogen-day-46.html"&gt;Crowdfunding Mitogen: day 46&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-05-23: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-05-23-mitogen-for-ansible-status-23-may.html"&gt;Mitogen for Ansible status, 23 May&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2018-07-10: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-07-10-mitogen-released.html"&gt;Mitogen released!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Until next time!
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Mon, 27 Aug 2018 02:00:00 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-08-27:/2018-08-27-fork-in-the-road.html</guid></item><item><title>Mitogen released!</title><link>https://sweetness.hmmz.org/2018-07-10-mitogen-released.html</link><description>
    &lt;img src="/images/mito1/cell_division.svg" class="mitogen-right-180 mitogen-logo-wrap"&gt;
    &lt;!-- &lt;img style="padding: 4px; float: right;" src="/images/mito1/cell_division.svg" width="232"&gt; --&gt;

    &lt;p&gt;
    After 4 months development, a design phase stretching back 10 years and
    more than 1,300 commits, I am pleased to finally announce the first stable
    series of Mitogen and the Mitogen for Ansible extension.
    &lt;/p&gt;

    &lt;p&gt;
    &lt;a target="_blank" href="https://mitogen.networkgenomics.com/"&gt;Mitogen is a Python zero-deploy
    distributed programming library&lt;/a&gt; designed to drastically increase the
    functional capability of infrastructure software operating via SSH. &lt;a target="_blank" href="https://networkgenomics.com/ansible/"&gt;Mitogen for Ansible&lt;/a&gt; is a
    drop-in replacement for Ansible's lower tiers, netting huge speed and
    efficiency improvements for common playbooks.
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;What's There&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    This initial series covers a widely compatible drop-in Ansible extension on
    Python 2.6, 2.7, and 3.6, a preview of the first value-added functionality
    for Ansible (&lt;a target="_blank" href="https://mitogen.networkgenomics.com/ansible_detailed.html#connection-delegation"&gt;Connection
    Delegation&lt;/a&gt;), and a freeze of the underlying library required to support
    it.
    &lt;/p&gt;

    &lt;p&gt;
    With the exception of some gotchas listed in the &lt;a target="_blank" href="https://mitogen.networkgenomics.com/changelog.html"&gt;release
    notes&lt;/a&gt;, you should expect the Ansible extension to &lt;em&gt;just work&lt;/em&gt;,
    and if it doesn't &lt;a target="_blank" href="https://goo.gl/yLKZiJ"&gt;please let me know via
    GitHub&lt;/a&gt;.
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Demo&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Refer to the posts under &lt;em&gt;Just Tuning In?&lt;/em&gt; below for a 1000 foot
    view of the direction this work is heading, but for an idea of how things
    are today, watch the first minute of this recording, demonstrating a
    loop-heavy configuration of Mitogen's tests executing against the local
    machine.
    &lt;/p&gt;

    &lt;p&gt;
    &lt;iframe src="https://player.vimeo.com/video/283272293?title=0&amp;amp;byline=0&amp;amp;portrait=0" width="680" height="415" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Installation&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    To install Mitogen for Ansible, just follow the &lt;strong&gt;&lt;a target="_blank" href="https://mitogen.networkgenomics.com/ansible_detailed.html#installation"&gt;5
    easy steps&lt;/a&gt;&lt;/strong&gt; in the documentation. For non-Ansible users the
    library is available from PyPI via &lt;code&gt;pip install mitogen&lt;/code&gt;.
    Introductory documentation for the library is very weak right now, it will
    improve over the course of the stable series.
    &lt;/p&gt;

    &lt;!--
    &lt;p&gt;&lt;strong&gt;Early User Feedback&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;
    &lt;li&gt;"With mitogen &lt;strong&gt;my playbook runtime went from 45 minutes to just
        under 3 minutes&lt;/strong&gt;. Awesome work!"&lt;/li&gt;
    &lt;li&gt;"The runtime was reduced from &lt;strong&gt;1.5 hours on 4 servers to just
        under 3 minutes&lt;/strong&gt;. Thanks!"&lt;/li&gt;
    &lt;li&gt;"Oh, performance improvement using Mitogen is &lt;strong&gt;huge&lt;/strong&gt;. As
        mentioned before, running with Mitogen enables takes 7m36 (give or take
        a few seconds). Without Mitogen, the same run takes 19m49! &lt;strong&gt;I'm not
        even deploying without Mitogen anymore&lt;/strong&gt; :)"&lt;/li&gt;
    &lt;li&gt;"&lt;strong&gt;Works like a charm&lt;/strong&gt;, thank you for your quick response"&lt;/li&gt;
    &lt;li&gt;"I tried it out. &lt;strong&gt;He is not kidding about the speed
        increase&lt;/strong&gt;."
    &lt;li&gt;"I don't know what kind of dark magic @dmw_83 has done, but his Mitogen
        strategy took Clojars' Ansible runs from &lt;strong&gt;14 minutes to 2
        minutes&lt;/strong&gt;. I still can't quite believe it."&lt;/li&gt;
    &lt;/ul&gt;
    --&gt;

    &lt;p&gt;&lt;strong&gt;Thanks to all the supporters!&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Mitogen development in 2018 was sponsored by a
    &lt;strong&gt;&lt;a target="_blank" href="https://mitogen.networkgenomics.com/contributors.html"&gt;fabulous
    group of individuals and businesses&lt;/a&gt;&lt;/strong&gt; through a crowdfunding
    campaign launched in February. Thanks to everyone who participated by
    pledging, testing, writing bug reports, and helping with upfront planning.
    A huge special thanks to the primary sponsor:
    &lt;/p&gt;

    &lt;div style="background: #efefef; padding: 16px; margin: 1.5em 0;"&gt;
        &lt;div style="float: left; padding: 8px 32px 16px 8px;"&gt;
            &lt;img src="/images/mito-released/cgi.svg" height="110" width="238"&gt;
        &lt;/div&gt;
        &lt;div&gt;
            &lt;p&gt;
            Founded in 1976, CGI is one of the world’s largest IT and business
            consulting services firms, helping clients achieve their goals,
            including becoming customer-centric digital organizations.
            &lt;/p&gt;

            &lt;p&gt;
            &lt;br clear="all"&gt;
            For career opportunities, please visit &lt;a target="_blank" href="https://cgi-group.co.uk/defence-and-intelligence-opportunities"&gt;cgi-group.co.uk/defence-and-intelligence-opportunities&lt;/a&gt;.
            &lt;/p&gt;

            &lt;p style="margin-bottom: 0px;"&gt;
            To &lt;a target="_blank" href="https://cgi.njoyn.com/CGI/xweb/XWeb.asp?page=jobdetails&amp;amp;CLID=21001&amp;amp;SBDID=21814&amp;amp;jobid=J0118-0787"&gt;directly
            apply&lt;/a&gt; to a UK team currently using Mitogen, contact us
            regarding &lt;a target="_blank" href="https://cgi.njoyn.com/CGI/xweb/XWeb.asp?page=jobdetails&amp;amp;CLID=21001&amp;amp;SBDID=21814&amp;amp;jobid=J0118-0787"&gt;Open
            Source Developer/DevOps&lt;/a&gt; opportunities.
            &lt;/p&gt;
        &lt;/div&gt;
    &lt;/div&gt;

    &lt;p&gt;&lt;strong&gt;What's next&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Feature work will resume after most issues are ironed out of the stable
    series -- in particular I'm expecting more bugs around Python 3 and cross
    2/3 interoperability. Once 0.2.x looks solid, one important goal is a
    complete and widely compatible Connection Delegation feature, including a
    rewrite of the &lt;code&gt;fakessh&lt;/code&gt; component to support transparent use of
    the &lt;code&gt;synchronize&lt;/code&gt; module.
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;
    &lt;li&gt;2017-09-15: &lt;a target="_blank" href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-03-06: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-03-28: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-28-crowdfunding-mitogen-day-23.html"&gt;Crowdfunding Mitogen: day 23&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-04-20: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-04-20-crowfunding-mitogen-day-46.html"&gt;Crowdfunding Mitogen: day 46&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-05-23: &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-05-23-mitogen-for-ansible-status-23-may.html"&gt;Mitogen for Ansible status, 23 May&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;
    Until next time!
    &lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Tue, 10 Jul 2018 18:08:00 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-07-10:/2018-07-10-mitogen-released.html</guid></item><item><title>Mitogen for Ansible status, 23 May</title><link>https://sweetness.hmmz.org/2018-05-23-mitogen-for-ansible-status-23-may.html</link><description>
    &lt;p&gt;This is the third update on the status of developing &lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen for Ansible&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Too long, didn&amp;rsquo;t read&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    A beta is coming soon! Aside from async tasks, the &lt;code&gt;master&lt;/code&gt; branch
    is looking great. Since last update there have been many features and fixes,
    but with important forks in the road ahead, particularly around efficient
    support for many-host. Read on..
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;
    &lt;li&gt;2017-09-15: &lt;a href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-03-06: &lt;a href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-03-28: &lt;a href="https://sweetness.hmmz.org/2018-03-28-crowdfunding-mitogen-day-23.html"&gt;Crowdfunding Mitogen: day 23&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-04-20: &lt;a href="https://sweetness.hmmz.org/2018-04-20-crowfunding-mitogen-day-46.html"&gt;Crowdfunding Mitogen: day 46&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;


    &lt;p&gt;&lt;strong&gt;Done: File Transfer&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    &lt;a target="_blank" href="http://mitogen.networkgenomics.com/ansible_detailed.html#file-transfer"&gt; File
    transfer&lt;/a&gt; previously worked by constructing one RPC representing the
    complete file, which for large files resulted in an explosion in memory usage
    on each machine as the message was enqueued and transferred, with communication
    at each hop blocked until the message was delivered. This has required a
    rewrite since the original code was written, but a simple solution proved
    elusive.
    &lt;/p&gt;

    &lt;p&gt;
    Today file transfer is all but solved: files are streamed in 128KiB-sized
    messages, using a dedicated service that aggregates pending transfers by their
    most directly connected stream, serving one file at a time before progressing
    to the next transfer. An initial burst of 128KiB chunks is generated to fill a
    link with a 1MiB &lt;a target="_blank" href="https://en.wikipedia.org/wiki/Bandwidth-delay_product"&gt;BDP&lt;/a&gt;, with
    further chunks sent as acknowledgements begin to arrive from the receiver. As
    an optimization, files 32KiB or smaller are still delivered in a single RPC,
    avoiding one roundtrip in a common scenario.
    &lt;/p&gt;

    &lt;p&gt;
    Compared to &lt;em&gt;sftp(1)&lt;/em&gt; or &lt;em&gt;scp(1)&lt;/em&gt;, the new service has vastly
    lower setup overhead (1 RTT vs. 5) and far better safety properties, ensuring
    concurrent use of the API by unrelated &lt;code&gt;ansible-playbook&lt;/code&gt; runs
    cannot create a situation where an inconsistent file may be observed by users,
    or a &lt;a target="_blank" href="https://github.com/ansible/proposals/issues/25#issuecomment-385208218"&gt;corrupt
    file is deployed with no indication a problem exists&lt;/a&gt;.
    &lt;/p&gt;

    &lt;p&gt;
    Since file transfer is implemented in terms of Mitogen's message bus, it is
    agnostic to Connection Delegation, allowing streaming file transfers between
    proxied targets regardless of how the connection is set up.
    &lt;/p&gt;

    &lt;p&gt;
    Some minor problems remain: the scheduler cannot detect a timed out transfer,
    risking a cascading hang when Connection Delegation is in use. This is not a
    regression compared to previously, as Ansible does not support this operation
    mode. In both cases during normal operation, the timeout will eventually be
    noticed when the underlying SSH connection times out.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Connection Delegation&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    &lt;a target="_blank" href="http://mitogen.networkgenomics.com/ansible_detailed.html#connection-delegation"&gt;Connection
    Delegation&lt;/a&gt; enables Ansible to use one or more intermediary machines to
    reach a target machine or container, with connections and code uploads
    deduplicated at each hop in the path. For an Ansible run against many
    containers on one target host, only one SSH connection to the target need
    exist, and module code need only be uploaded once on that connection.
    &lt;/p&gt;

    &lt;p&gt;
    While not yet complete, this feature exists today and works well, however some
    important functionality is still missing. Presently intermediary connection
    setup is single threaded, non-Python (i.e. Ansible) module uploads are
    duplicated, and the code to infer intermediary connection configurations using
    the APIs available in Ansible is.. hairy at best.
    &lt;/p&gt;

    &lt;p&gt;
    Fixing deduplication and single-threaded connection setup entails starting a
    service thread pool within each interpreter that will act as an intermediary.
    This requires some &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/213"&gt;reworking of the nascent
    service framework&lt;/a&gt;, also making it easier to use for non-Ansible programs,
    and lays the groundwork for Topology-aware File Synchronization.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Custom &lt;code&gt;module_utils&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    From the department of surprises, this one is a true classic. Ansible supports
    an undocumented (&lt;a target="_blank" href="http://k3.botanicus.net/tmp/modutils.html"&gt;upstream docs patch&lt;/a&gt;) but
    nonetheless commonly used mechanism for bundling third party modules and
    overriding built-in support modules as part of the ZIP file deployed to the
    target. It implements this by virtualizing a core Ansible package namespace:
    &lt;code&gt;ansible.module_utils&lt;/code&gt;, causing what Python finds there to vary on a
    per-task basis, and crucially, to have its implementation diverge entirely from
    the equivalent import in the Ansible controller process.
    &lt;/p&gt;

    &lt;p&gt;
    It is suffice to say I nearly lost my mind on discovering this "feature", not
    due to the functionality it provides, but the manner in which it opts to
    provide it. Rather than loading a core package namespace as a regular Python
    package using Mitogen's built-in mechanism, every Ansible module must undergo
    additional dependency scanning using its unique search path, and any
    dependencies found must correctly override existing loaded modules appearing in
    the target interpreter's namespace at runtime.
    &lt;/p&gt;

    &lt;p&gt;
    Given Mitogen's intended single-reusable-interpreter design, there is no way to
    support this without tempting strange behaviours appearing across tasks whose
    &lt;code&gt;ansible.module_utils&lt;/code&gt; search path varies. While it is easy
    to arrange for &lt;code&gt;ansible.module_utils.third_party_module&lt;/code&gt; to be
    installed, it is impossible to uninstall it while ensuring every reference to
    the previous implementation, including instances of every type defined by it,
    are extricated from the reusable interpreter post-execution, which is necessary
    if the next module to use the interpreter imports an entirely distinct
    implementation of &lt;code&gt;ansible.module_utils.third_party_module&lt;/code&gt;.

    &lt;p&gt;
    Today, instead the interpreter forks when an extended or overridden module is
    found, and a custom importer is used to implement the overrides. This
    introduces an unavoidable inefficiency when the feature it in use, but it is
    still far better than always forking, or running the risk of varying
    &lt;code&gt;module_utils&lt;/code&gt; search paths causing unfixable crashes.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Container Connections&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    To aid a common use-case for Connection Delegation, &lt;a target="_blank" href="http://mitogen.networkgenomics.com/ansible_detailed.html#connection-types"&gt;new connection types&lt;/a&gt; were added to support Linux containers and FreeBSD
    jails. It is now possible to run Ansible within a remote container reached via
    SSH, solving a &lt;a target="_blank" href="https://github.com/ansible/proposals/issues/25"&gt;common upstream feature
    request&lt;/a&gt;.
    &lt;/p&gt;

    &lt;p&gt;
    Presently although the container must have Python installed, matching
    Ansible's existing behaviour, it occurred to me that when the host machine has
    Python installed, &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/223"&gt;there is no reason why Python
    needs to exist within the container&lt;/a&gt;. This would make a powerful feature
    made easy through Mitogen's design, and in a &lt;a target="_blank" href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/sect-using_openscap_with_ansible"&gt;common
    use case&lt;/a&gt;, would support the ability to run auditing/compliance playbooks
    against app containers that were otherwise never customized for use with
    Ansible.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Su Become Method Support&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Low-hanging fruit from the original crowdfunding plan. Now &lt;em&gt;su(1)&lt;/em&gt; may
    be used for privilege escalation as easily as &lt;em&gt;sudo(1)&lt;/em&gt;.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Sudo/Su Connection Types&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    To support testing and somewhat uncommon use cases where a large number of user
    accounts may be targeted for parallel deployment on a small number of
    machines, there now exist explicit &lt;code&gt;mitogen_sudo&lt;/code&gt; and
    &lt;code&gt;mitogen_su&lt;/code&gt; connection types that, in combination with Connection
    Delegation, allow a single SSH connection to exist to a remote machine while
    exposing user accounts as individual (and therefore parallelizable) targets in
    Ansible inventory.
    &lt;/p&gt;

    &lt;p&gt;
    This sits somewhere between "hack" and "gorgeous", I really have no idea which,
    however it does make it simple to exploit Ansible's parallelism in certain
    setups, such as traditional web hosting where each customer exists as a UNIX
    account on a small number of machines.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    &lt;a target="_blank" href="http://mitogen.networkgenomics.com/api.html#mitogen.core.Router.unidirectional"&gt;Unidirectional
    Routing&lt;/a&gt; exists and is always enabled for Ansible. This prohibits what was
    previously a new communication style available to targets, that, although
    ideally benign and potentially very powerful, fundamentally altered Ansible's
    security model and risked solution acceptance. It was possible for targets to
    send each other messages, and although permission checks occur on reception and
    thus should be harmless, represented the ability for otherwise air-gapped
    networks to be temporarily bridged for the duration of a run.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Secrets Masking&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Mitogen supports new &lt;em&gt;Blob()&lt;/em&gt; and &lt;em&gt;Secret()&lt;/em&gt; string wrappers
    whose &lt;code&gt;repr()&lt;/code&gt; contains a substitute for the actual value. These are
    employed in the Ansible extension, ensuring passwords and bulk file transfer
    data are no longer logged when verbose output is enabled. The types are
    preserved on deserialization, ensuring log messages generated by targets
    receive identical treatment.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;User/misc bug fixes&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/issues/179"&gt;Monster hang due
        to UNIX socket memory pressure on Linux&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/commit/dc4433ac"&gt;Closed many gaps where hangs could occur due to disconnection&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/commit/92a25655078a06facf3e6c34217bcd1de1272738"&gt;Child main thread does not gracefully handle CTRL+C&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/commit/c0ced6d04a7b09a69ccde0e2280cf2f209656f80"&gt;Fixed huge forking FD leak&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/commit/d9b7cad0dce57006ebbdb1c972225c64a6c8e2e2"&gt;Temp file handling is 'too accurate', needs to match Ansiballz&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/commit/f9e1905ec6d81d00f1ab38cb2b7b4e3c0e729664"&gt;Needless disk writes for new-style modules&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/mitogen/commit/05a5f2b6e5841210f4a6e2540071e1205fc67b98"&gt;Rare "TimeoutError" appears&lt;/a&gt; (2 hours for a 4 byte fix!)
    &lt;/ul&gt;


    &lt;p&gt;&lt;strong&gt;Asynchronous Tasks (.. again, and again)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Ongoing work on the asynchronous task implementation has caused it to evolve
    once again, this time to make use of a new &lt;a target="_blank" href="http://mitogen.networkgenomics.com/#detached-subtrees"&gt;subtree
    detachment&lt;/a&gt; feature in the core library. The new approach is about 70% of
    what is needed for the final design, with one major hitch remaining.
    &lt;/p&gt;

    &lt;p&gt;
    Since an asynchronous task must outlive its parent, it must have a copy of
    every dependency needed by the module it will execute prior to disconnecting
    from the parent. This is exorbitantly fiddly work, interacting with many
    aspects including not least custom &lt;code&gt;module_utils&lt;/code&gt;, and represents
    the last major obstacle in producing a functionally complete extension release.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Industrial grade multiplexing&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Mitogen now supports swapping
    &lt;em&gt;&lt;a target="_blank" href="http://man7.org/linux/man-pages/man2/select.2.html"&gt;select(2)&lt;/a&gt;&lt;/em&gt; for
    &lt;em&gt;&lt;a target="_blank" href="http://man7.org/linux/man-pages/man7/epoll.7.html"&gt;epoll(7)&lt;/a&gt;&lt;/em&gt; or
    &lt;em&gt;&lt;a target="_blank" href="https://www.freebsd.org/cgi/man.cgi?query=kqueue&amp;amp;sektion=2"&gt;kqueue(2)&lt;/a&gt;&lt;/em&gt;
    depending on the host operating system, blasting through the maximum file
    descriptor limit of &lt;em&gt;select(2)&lt;/em&gt;, and ensuring this is no longer a
    hindrance for many-target runs. Children initially use the &lt;em&gt;select(2)&lt;/em&gt;
    multiplexer (tiny and guaranteed available) until they become parents, when the
    implementation is transparently swapped for the real deal.
    &lt;/p&gt;

    &lt;p&gt;
    In future some interface tweaks are desirable to make full use of the new
    multiplexers: at least &lt;em&gt;epoll(4)&lt;/em&gt; supports options that significantly
    reduce the system calls necessary to configure it. Although I have not measured
    a performance regression due to these calls, their presence is bothersome.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Many-Target Performance&lt;/strong&gt;&lt;p&gt;

    &lt;p&gt;
    &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/250"&gt;Some
    expected growing pains&lt;/a&gt; appeared when real multiplexing was implemented. For
    testing I adopted a network of VMs running &lt;a href="https://docs.debops.org/"&gt;DebOps&lt;/a&gt; &lt;code&gt;common.yml&lt;/code&gt;, with a
    quota for up to 500 targets, but so far, it is not possible to approach that
    without drowning in the kinks that start to appear. While some of these almost
    certainly lie on the Mitogen side, when profiling with only 40 targets enabled,
    inefficiencies in Mitogen are buried in the report by extreme inefficiencies
    present in Ansible itself.
    &lt;/p&gt;

    &lt;p&gt;
    Among the problems:

    &lt;ul&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/ansible/commit/05e3f877"&gt;25% runtime
        wasted calling glob()&lt;/a&gt; (task setup stress test, not a real
        playbook)&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/ansible/commit/d6052c41"&gt;10% runtime
        wasted enumerating template filters&lt;/a&gt; (stress test)&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/ansible/commit/7e1f88c1"&gt;50% runtime
        wasted constructing task variables pre-fork&lt;/a&gt; (real run)&lt;/li&gt;
    &lt;li&gt;&lt;a target="_blank" href="https://github.com/dw/ansible/commit/8a16a735"&gt;&gt;50% runtime
        wasted recompiling templates&lt;/a&gt; (stress test)&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;
    And with that we reach a nexus: we have almost exhausted what can be
    accomplished working from the bottom-up, profiling on a micro scale is no
    longer sufficient to meet project goals, while fixing problems identified
    through profiling on a macro scale exceeds the project scope. Therefore,
    (lightning bolts, wild cackles), a new plan emerges..
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Branching for a beta&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    With the exception of async tasks I consider the master branch to be in
    excellent health - for smaller target counts. For larger runs, wider-reaching
    work is necessary, but it does not make sense to disrupt the existing design
    due to it. Therefore &lt;code&gt;master&lt;/code&gt; will be branched with the new branch
    kept open for fixes, not least the final pieces of async, while continuing work
    in parallel on a new increment.
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;Extension v2&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Vanilla Ansible forks each time it executes a task, with the corresponding
    action plug-in gaining control of the main thread until completion, upon which
    all state aside from the task result is lost. When running under the extension,
    a connection multiplexer process is forked once at startup, and a separate
    broker thread exists in each forked task subprocess that connects back to the
    connection multiplexer process over a UNIX socket - necessary in the current
    design to have a persistent location to manage connections.
    &lt;/p&gt;

    &lt;p&gt;
    The new design comes in the form of a &lt;a target="_blank" href="https://github.com/dw/mitogen/commit/3cd1d0b9"&gt;complete reworking&lt;/a&gt; of
    the &lt;a target="_blank" href="http://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html"&gt;Ansible
    linear strategy&lt;/a&gt;. Today's extension wraps Ansible's strategies while
    preserving their process and execution model. To implement the enhancements
    above sensibly, additional persistence is required and it becomes necessary to
    tackle a strategy implementation head-on.
    &lt;/p&gt;

    &lt;p&gt;
    The old desire for per-CPU connection multiplexers is incorporated, but moves
    those multiplexers back into Ansible, much like the pre-crowdfund extension.
    The top-level controller process gains a Mitogen broker thread with per-CPU
    forked children acting as connection multiplexers, and hosting service threads
    on which action plug-ins can sleep. Unlike vanilla Ansible, these processes
    exist for the duration of the run rather than per-task.
    &lt;/p&gt;

    &lt;p&gt;
    From the vantage point of only &lt;em&gt;$ncpus&lt;/em&gt; processes, it is easy to fix
    template precompilation, plug-in path caching, connection caching,
    target&amp;lt;-&amp;gt;worker affinity, and ensuring task variable generation is
    parallelized. Some sizeable obstacles exist, not least:
    &lt;/p&gt;

    &lt;ul&gt;
    &lt;li&gt;Liberal shared data structure mutation in the task executor that
        must be fixed to handle threading, mostly contained to
        &lt;code&gt;PlayContext&lt;/code&gt;.&lt;/li&gt;
    &lt;li&gt;Preserving the existing callback plug-in model. Callbacks must always fire
        in the top-level process.&lt;/li&gt;
    &lt;li&gt;Synchronization or serialization overhead, pick one. Either the strategy
        logic runs duplicate in each child (requiring coordination with the
        top-level process), or it runs once in the parent, and configuration must
        be serialized for every task.&lt;/li&gt;
    &lt;/ul&gt;


    &lt;p&gt;&lt;strong&gt;Can't this be done upstream?&lt;/strong&gt;&lt;p&gt;

    &lt;p&gt;
    It should, but &lt;a target="_blank" href="https://github.com/ansible/ansible/pulls/?q=author:dw"&gt;I've
    experimented&lt;/a&gt; and there simply isn't time. If &amp;gt;1 week is reasonable to &lt;a href="https://github.com/ansible/ansible/pull/40059"&gt;add missing
    documentation&lt;/a&gt;, there is no hope real patches will land before full-time
    work must conclude. For upstreaming to happen the onus lies with the &lt;a target="_blank" href="https://gist.github.com/dw/28d91a45e888529c3c12a726ad30b6d5"&gt;20+ strong
    permanent team&lt;/a&gt;, it's simply not possible to commit unbounded time to land
    even trivial changes, a far cry from occasional patches to a privately
    controlled repository.
    &lt;/p&gt;

    &lt;p&gt;
    At least 16k words have been spent since conversations started around September
    2017, and while they bore some fruit over time, few actionable outcomes have
    resulted, and the detectable levels of team-originated engagement regarding the
    work has been minimal. There is no expectation of fireworks, however it may be
    helpful to realize after 3 months no evidence exists of any member testing the
    code and experiencing success or failure, let alone a report of such.
    &lt;/p&gt;

    &lt;p&gt;
    It's sufficient to say after so long I find this increasingly troublesome, and
    while I cannot hope to understand internal priorities, as an outside
    contributor funded by end users, soliciting engagement on a well-documented
    enhancement that in some scenarios nets an order of magnitude performance
    improvement to a commercial product, some rather basic questions come to mind.


    &lt;p&gt;&lt;strong&gt;Code Quality&lt;/strong&gt;&lt;p&gt;

    &lt;p&gt;
    There is a final uneasy aspect to upstreaming, and it is that of being left
    with the task of cleaning up, with no guarantee the mess won't simply return.
    Some of this code is in an &lt;a target="_blank" href="https://github.com/ansible/ansible/blob/062f0444/lib/ansible/plugins/strategy/__init__.py#L353-L658"&gt;abject&lt;/a&gt; (253 LOC, 37 locals)
    &lt;a target="_blank" href="https://github.com/ansible/ansible/blob/062f0444/lib/ansible/plugins/connection/ssh.py#L608-L886"&gt;state&lt;/a&gt; (279 LOC, 24 locals) of
    &lt;a target="_blank" href="https://github.com/ansible/ansible/blob/062f0444/lib/ansible/plugins/strategy/linear.py#L200-L452"&gt;sin&lt;/a&gt; (306 LOC, 38 locals), for
    2018 and in a product less than 72 months old, that has been funded almost
    since inception. While I have begun refactoring the strategy plug-in within the
    confines of the Mitogen repository, responsibility for benefitting from that
    work in mainline rests with others.
    &lt;/p&gt;

    &lt;p&gt;
    Until next time!
    &lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Wed, 23 May 2018 00:32:49 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-05-23:/2018-05-23-mitogen-for-ansible-status-23-may.html</guid></item><item><title>Crowfunding Mitogen: day 46</title><link>https://sweetness.hmmz.org/2018-04-20-crowfunding-mitogen-day-46.html</link><description>
    &lt;p&gt;
    This is the second update on the status of developing the &lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen extension for
    Ansible&lt;/a&gt;, only 2 weeks late!&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Too long, didn’t read&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Gearing up to remove the scary warning labels and release a beta! Running a
    little behind, but not terribly. Every major risk is solved except file
    transfer, which should be addressed this week.&lt;/p&gt;

    &lt;p&gt;23 days, 257 commits, 186 files changed, 7292 insertions(+), 1503 deletions(-)&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Just tuning in?&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;2017-09-15: &lt;a href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Mitogen, an infrastructure code baseline that sucks less&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-03-06: &lt;a href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Quadrupling Ansible performance with Mitogen&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;2018-03-28: &lt;a href="Rhttps://sweetness.hmmz.org/2018-03-28-crowdfunding-mitogen-day-23.html"&gt;Crowdfunding Mitogen: day 23&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Started: Python 3 Support&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;A very rough &lt;a href="https://github.com/dw/mitogen/tree/python3"&gt;branch&lt;/a&gt; exists
    for this, and I’m landing volleys of fixes when I have downtime between bigger
    pieces of work. Ideally this should have been ready for the end of April, but
    it may take a few weeks more.&lt;/p&gt;

    &lt;p&gt;I originally hoped to have a clear board before starting this, instead it is
    being interwoven as busywork when I need a break from whatever else I’m working
    on.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: multiplexer throughput&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;The situation has improved massively.
    &lt;a href="https://github.com/dw/mitogen/issues/148"&gt;Hybrid TTY/socketpair mode&lt;/a&gt; is a
    thing and as promised it significantly helps, but just not quite as much as I
    hoped.&lt;/p&gt;

    &lt;p&gt;Today on a 2011-era Macbook Pro Mitogen can pump an SSH client/daemon at around
    13MB/sec, whereas &lt;code&gt;scp&lt;/code&gt; in the same configuration hits closer to 19MB/sec. In
    the case of SSH, moving beyond this is not possible without a patched SSH
    installation, since SSH hard-wires its buffer sizes around 16KB, with no
    ability to override them at runtime.&lt;/p&gt;

    &lt;p&gt;With multiple SSH connections that 13MB should cleanly multiply up, since every
    connection can be served in a single IO loop iteration.&lt;/p&gt;

    &lt;p&gt;A bunch of related performance fixes were landed, including removal of yet
    another special case for handling deferred function calls, only taking locks
    when necessary, and reducing the frequency of the stream implementations
    modifying the status of their descriptors' readability/writeability.&lt;/p&gt;

    &lt;p&gt;As we’re in the ballpark of existing tools, I’m no longer considering this
    as much of a priority as before. There is definitely more low-hanging fruit,
    but out-of-the-box behaviour should no longer raise eyebrows.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: task isolation&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;As before, by default each script is compiled once, however it is now
    re-executed in a spotless namespace prior to each invocation, working around
    any globals/class variable sharing issues that may be present. The cost of this
    is negligible, on the order of 100 usec.&lt;/p&gt;

    &lt;p&gt;When this is insufficient, a &lt;code&gt;mitogen_task_isolation=fork&lt;/code&gt; per-task variable
    exists to allow explicitly forcing a particular module to run in a new process.
    Enabling this by default causes something on the order of a 33% slowdown, which
    is much better than expected, but still not good enough to enable forking by
    default.&lt;/p&gt;

    &lt;p&gt;Aside from building up a blacklist of modules that should always be forked,
    task isolation is pretty much all done, with just a few performance
    regressions remaining to fix in the forking case.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: exotic module support&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Every style of Ansible module is supported aside from the prehistorical
    “module replacer” type. That means today all of these work and are covered by automated tests:&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;Built-in new-style Python scripts&lt;/li&gt;
    &lt;li&gt;User-supplied new-style Python scripts&lt;/li&gt;
    &lt;li&gt;Ancient key=value style input scripts&lt;/li&gt;
    &lt;li&gt;Statically linked Go programs&lt;/li&gt;
    &lt;li&gt;Perl scripts&lt;/li&gt;
    &lt;/ul&gt;&lt;p&gt;Python module support was updated to remove the monkey-patching in use before.
    Instead, &lt;code&gt;sys.stdin&lt;/code&gt;, &lt;code&gt;sys.stdout&lt;/code&gt; and &lt;code&gt;sys.stderr&lt;/code&gt; are redirected to
    StringIO objects, allowing a much larger variety of custom user scripts to be
    run in-process even when they don’t use the new-style Ansible module APIs.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: free strategy support&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    The "free" strategy can now be used by specifying &lt;code&gt;ANSIBLE_STRATEGY=mitogen_free&lt;/code&gt;. The &lt;code&gt;mitogen&lt;/code&gt; strategy is now an alias of &lt;code&gt;mitogen_linear&lt;/code&gt;.
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: temporary file handling&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;This should be identical to Ansible’s handling in all cases.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: interpreter recycling&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;An upper bound exists to prevent a remote machine from being spammed with
    thousands of Python interpreters, which was previously possible when e.g. using
    a &lt;code&gt;with_items&lt;/code&gt; loop that templatized &lt;code&gt;become_user&lt;/code&gt;.&lt;/p&gt;

    &lt;p&gt;Once 20 interpreters exist, the extension shuts down the most recently created
    interpreter before starting a new one. This strategy isn’t perfect, but it
    should suffice to avoid raised eyebrows in most common cases for the time
    being.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: precise standard IO emulation&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Ansible’s complex semantics for when it does/does not merge &lt;code&gt;stdout&lt;/code&gt; and
    &lt;code&gt;stderr&lt;/code&gt; during module runs are respected in every case, including
    emulation of extraneous &lt;code&gt;\r&lt;/code&gt; characters. This may seem like a tiny and
    pointless nit, however it is almost certainly the difference between a tested
    real-world playbook succeeding under the extension or breaking horribly.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: async tasks&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;We’re on the third iteration of asynchronous tasks, and I really don’t want to
    waste any more time on it. The new implementation works a lot more like
    Ansible’s existing implementaion, for as much as that implementation can be
    said to “work” at all.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Done: better error messages&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Connection errors no longer crash with an inscrutible stack trace, but trigger
    Ansible’s internal error handling by raising the right exception types.&lt;/p&gt;

    &lt;p&gt;Mitogen’s logging integration with the Ansible display framework is much
    improved, and errors and warnings correctly show up on the console in red
    without having to specify &lt;code&gt;-vvv&lt;/code&gt;.&lt;/p&gt;

    &lt;p&gt;Still more work to do on this when internal RPCs fail, but that’s less likely
    to be triggered than a connection error.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;New debugging mode&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;An “emergency” debugging mode has been added, in the form of
    &lt;code&gt;MITOGEN_DUMP_THREAD_STACKS=1&lt;/code&gt;. When this is present, every interpreter will
    dump the stack of every thread into the logging framework every 5 seconds,
    allowing hangs to be more easily diagnosed directly from the controller
    machine’s logs.&lt;/p&gt;

    &lt;p&gt;While adding this, it struck me that there is a really sweet piece of
    functionality missing here that would be easy to add – an interactive
    debugger. This might turn up in the form of an in-process web server allowing
    viewing the full context hierarchy, and running code snippets against remotely
    executing stacks, much like Werkzeug’s interactive debugger.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Performance regressions&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;In addition to simply not being my focus recently, a lot of the new
    functionality has introduced &lt;code&gt;import&lt;/code&gt; statements that impact code running in
    the target, and so performance has likely slipped a little from the original
    posted benchmarks, most likely during run startup in the presence of a high
    latency network.&lt;/p&gt;

    &lt;p&gt;I will be back to investigate these problems (and fix those for which no
    investigation is required – the module loader!) once all remaining
    functionality is stable.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;File Transfer&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;This seemingly simple function has required the greatest deal of thought out of
    every issues I’ve encountered so far. The initial problem relates to flow
    control, and the absense of any natural mechanism to block a producer (file
    server) while intermediary pipe buffers (i.e. the SSH connection) are filled.&lt;/p&gt;

    &lt;p&gt;Even when flow control exists, an additional problem arises since with Mitogen
    there is no guarantee that one SSH connection = one target machine, especially
    once connection delegation is implemented. Some kind of bandwidth sharing
    mechanism must also exist, without poorly reimplementing the entirety of TCP/IP
    in a Python script.&lt;/p&gt;

    &lt;p&gt;For the initial release I have settled on basic design that should ensure the
    available bandwidth is fully utilized, with each upload target having its file
    data served on a first-come-first-served basis.&lt;/p&gt;

    &lt;p&gt;When any file transfer is active, one of the service threads in the associated
    connection multiplexer process (the same ones used for setting up connections)
    will be dedicated to a long-running loop that monitors every connected stream’s
    transmit queue size, enqueuing additional file chunks as the queue drains.&lt;/p&gt;

    &lt;p&gt;Files are served one-at-a-time to make it more likely that if a run is
    interrupted, rather than having every partial file transfer thrown away, at
    least a few targets will have received the full file, allowing that copy to be
    skipped when the play is restarted.&lt;/p&gt;

    &lt;p&gt;The initial implementation will almost certainly be replaced eventually, but
    this basic design should be sufficient for what is needed today, and should
    continue to suffice when connection delegation is implemented.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Testing / CI&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;The smattering of unit and integration tests that exist are running &lt;em&gt;and
    passing&lt;/em&gt; under &lt;a href="https://travis-ci.org/dw/mitogen/"&gt;Travis CI&lt;/a&gt;. In preparation
    for a release, &lt;code&gt;master&lt;/code&gt; is considered always-healthy and my development
    has moved to a new &lt;code&gt;dmw&lt;/code&gt; branch.&lt;/p&gt;

    &lt;p&gt;I’m taking a “mostly top down” approach to testing, written in the form of
    Ansible playbooks, as this gives the widest degree of coverage, ensuring that
    high level Ansible behaviour is matched with/without the extension installed.
    For each new test written, the result must pass under regular Ansible in
    addition to Ansible with the extension.&lt;/p&gt;

    &lt;p&gt;“Bottom up” type tests are written as needs arise, usually when Ansible’s user
    interface doesn’t sufficiently expose whatever is being tested.&lt;/p&gt;

    &lt;p&gt;Also visible in Travis is a &lt;code&gt;debops_common&lt;/code&gt; target: this is running all 255
    tasks from &lt;a href="https://docs.debops.org/en/master/"&gt;DebOps&lt;/a&gt; &lt;code&gt;common.yml&lt;/code&gt; against
    a Docker instance. It’s the first in what should be 4-5 similar DebOps jobs,
    deploying real software with the final extension.&lt;/p&gt;

    &lt;p&gt;I have begun exploring integrating the extension with Ansible’s own integration
    tests, but it looks likely this is too large a job for Travis. Work here is
    ongoing.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;A few items have been chipped off &lt;a href="https://github.com/dw/mitogen/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity"&gt;the list&lt;/a&gt;.&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;Message source verification was audited everywhere, and is covered by
    automated tests.&lt;/li&gt;
    &lt;li&gt;All internal message handlers specify a policy indicating what kind of
    participants are allowed to deliver messages to them.&lt;/li&gt;
    &lt;li&gt;As above, but for &lt;code&gt;mitogen.service&lt;/code&gt;. A service cannot be exposed without
    attaching an access policy to it.&lt;/li&gt;
    &lt;/ul&gt;&lt;p&gt;Notably absent is unidirectional routing mode. I will make time to finish that
    shortly.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;User bug fixes&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;Poor refactoring broke select EINTR handling&lt;/li&gt;
    &lt;li&gt;SSH password was being supplied as the sudo password&lt;/li&gt;
    &lt;li&gt;Acquiring a controlling TTY was fixed on FreeBSD&lt;/li&gt;
    &lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Super busy, slightly behind! Until next time..&lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Fri, 20 Apr 2018 18:41:31 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-04-20:/2018-04-20-crowfunding-mitogen-day-46.html</guid></item><item><title>Crowdfunding Mitogen: day 23</title><link>https://sweetness.hmmz.org/2018-03-28-crowdfunding-mitogen-day-23.html</link><description>
    &lt;p&gt;This is the first in what I hope will be at least a bi-weekly series to keep
    backers up to date on the current state of delivering the &lt;a href="https://networkgenomics.com/ansible/"&gt;Mitogen extension for
    Ansible&lt;/a&gt;. I’m trying to use every second I have wisely until every major
    time risk is taken care of, so please forgive the knowledge-dump style of
    this post :)&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Haven't been following?&lt;/strong&gt;&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;&lt;a href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;Introduction to Mitogen&lt;/a&gt; (September 2017)&lt;/li&gt;
    &lt;li&gt;&lt;a href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Introduction to the Mitogen extension for Ansible&lt;/a&gt; (March 2018)
    &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Too long, didn’t read&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Well ahead of time. Some exciting new stuff popped up, none of it intractably
    scary.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Funding Update&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;I have some fabulous news on funding: in addition to what was already public
    on Kickstarter, significant additional funding has become available, enough
    that I should be able to dedicate full time to the project for at least
    another 10 weeks!&lt;/p&gt;

    &lt;p&gt;Naturally this has some fantastic implications, including making it
    significantly likely that I’ll be able to implement &lt;strong&gt;Topology-aware File
    Synchronization&lt;/strong&gt;.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/dw/mitogen/issues/16"&gt;Python 3 Support&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;I could not commit to this due to worrying Python 3 would become a huge and
    destablizing time sink, ruining any chance of delivering more immediately
    useful functionality.&lt;/p&gt;

    &lt;p&gt;The missing piece (exception syntax) to support from Python 2.4 all the way to
    3.x has been found - it came via an extraordinarily fruitful IRC chat with the
    Ansible guys, and was originally implemented in Ansible itself by &lt;a href="https://github.com/mgedmin"&gt;Marius
    Gedminas&lt;/a&gt;. With this last piece of the puzzle, the
    only bugs left to worry about are renamed imports and the usual bytes/str
    battles. Both are trivial to address with strong tests - something already due
    for the coming weeks. It now seems almost guaranteed Python 3 will be
    completed as part of this work, although I am still holding off on a 100%
    commitment until more pressing concerns are addressed.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;New Risk: multiplexer throughput&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Some truly insane performance bugs have been found and fixed already,
    particularly around the &lt;a href="https://github.com/dw/mitogen/issues/139"&gt;stress caused by delivering huge single
    messages&lt;/a&gt;, however during that work
    a new issue was found: IO multiplexer throughput truly sucks for many small
    messages.&lt;/p&gt;

    &lt;p&gt;This doesn’t impact things much except in one area: file transfer. While I
    haven’t implemented a final solution for file transfer yet, as part of that I
    will need to address what (for now) seems a hard single-thread performance
    limit: Mitogen’s current IO loop cannot push more than ~300MiB/sec in
    128KiB-sized chunks, or to put it another way, best case 3MiB/sec given 100
    targets.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Single thread performance&lt;/strong&gt;: the obvious solution is sharding the
    multiplexer across multiple processes, and already that was likely required
    for completing the multithreaded connect work. This is a straightforward
    change that promises to comfortably saturate a Gigabit Ethernet port using a
    2011 era Macbook while leaving plenty of room for components further up
    (Ansible) and down (ssh) the stack.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;TTY layer&lt;/strong&gt;: I’ve already implemented some fixes for this (increase buffer
    sizes, reduce loop iterations), but found some ugly new problems as a result:
    the TTY layer in every major UNIX has, at best, around a 4KiB buffer, forcing
    many syscalls and loop iterations, and it seems on no OS is this buffer
    tunable. Fear not, &lt;a href="https://github.com/dw/mitogen/issues/148"&gt;there is already a kick-ass solution for this
    too&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;This problem should disappear entirely by the time real file transfer support
    is implemented - today the extension is still delivering files as a single
    large message. The blocker to fixing that is a missing flow control mechanism
    to prevent saturation of the message queue, which requires a little research.
    This hopefully isn’t going to be a huge amount of work, and I’ve already got a
    bunch of no-brainer yet hacky ways to fix it.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;New risk: task isolation&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;It was only a matter of time, &lt;a href="https://github.com/dw/mitogen/issues/154"&gt;but the first isolation-related bug was
    found&lt;/a&gt;, due to a class variable in a
    built-in Ansible module that persists some state across invocations of the
    module’s &lt;code&gt;main()&lt;/code&gt; function. I’d been expecting something of this sort, so
    already had ideas for solving it when it came up, and really it was quite a
    surprise that only one such bug was reported out of all those reports from
    initial testers.&lt;/p&gt;

    &lt;p&gt;The obvious solution is forking a child for each task by
    default, however as always &lt;a href="http://mitogen.networkgenomics.com/api.html#mitogen.master.Router.fork"&gt;the devil is in the
    details&lt;/a&gt;,
    and in many intractable ways forking actually &lt;em&gt;introduces&lt;/em&gt; state sharing
    problems far deadlier than those it promises to solve, in addition to
    introducing a huge (3ms on Xeon) penalty that is needless in most cases.
    Basically forking is absolute hell to get right - even for a tiny 2 kLOC
    library written almost entirely by one author who wrote his first &lt;code&gt;fork()&lt;/code&gt;
    call somewhere in the region of 20 years ago, and I’m certain this is liable
    to become a support nightmare.&lt;/p&gt;

    &lt;p&gt;The most valuable de facto protection afforded by fork - memory safety, is
    pretty redundant in an almost perfectly memory safe language like Python,
    that’s why the language is so popular at all.&lt;/p&gt;

    &lt;p&gt;Meanwhile forking is needed anyway for robust implementation of asynchronous
    tasks, so while implementing it would never have been wasted work, it is not
    obvious to me that forking could or should ever become the default mode. It
    amounts to a very ripe field for impossible to spot bugs of much harder
    classes than the simple solution of running everything in a single process,
    where we only need to care about version conflicts, crap monkey patches,
    needlessly global variables and memory/resource leaks.&lt;/p&gt;

    &lt;p&gt;I’m still exploring the solution space for this one, current thinking is
    &lt;em&gt;maybe&lt;/em&gt; (&lt;strong&gt;maybe!&lt;/strong&gt; this is totally greenfield) something like:&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;&lt;p&gt;Built-in list of fixups for ridiculously easy to repair bugs, like the
    &lt;code&gt;yum_repository&lt;/code&gt; example above.&lt;/p&gt;&lt;/li&gt;
    &lt;li&gt;&lt;p&gt;Whitelist for in-process execution any module known (and manually audited)
    to be perfectly safe. Common &lt;code&gt;with_items&lt;/code&gt; modules like &lt;code&gt;lineinfile&lt;/code&gt; easily
    fit in this class.&lt;/p&gt;&lt;/li&gt;
    &lt;li&gt;&lt;p&gt;Whitelist for in-process safe but nonetheless leaky modules, such as the
    buggy &lt;code&gt;yum_repository&lt;/code&gt; module above that simply needs its bytecode
    re-executed (100usec) to paper over the bug. Can’t decide whether to keep
    this mode or not - or simply merge it with the above mode.&lt;/p&gt;&lt;/li&gt;
    &lt;li&gt;&lt;p&gt;Default to forking (3ms - max 333 &lt;code&gt;with_items&lt;/code&gt;/sec) for all unknown bespoke
    (user) modules and built-in modules of dubious quality, with a
    &lt;code&gt;mitogen_task_isolation&lt;/code&gt; variable permitting the mode to be overridden by
    the user on a per-task basis. &lt;em&gt;“Oh that one loop is eating 45 minutes? Try
    it with &lt;code&gt;mitogen_task_isolation=none&lt;/code&gt;”&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
    &lt;/ul&gt;&lt;p&gt;All the Mitogen-side forking bits are implemented already, and I’m deferring
    the Ansible-side bits to be done simultaneous to supporting exotic module
    types, since that whole chunk of code needs a rewrite and no point in
    rewriting it twice.&lt;/p&gt;

    &lt;p&gt;Meanwhile whatever the outcome of this work, be assured you will always have
    your cake and eat it - this project is all about fixing performance, not
    regressing it. I hope this entire topic becomes a tiny implementation detail
    in the coming weeks.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;CI&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;On the testing front I was absolutely overjoyed to discover
    &lt;a href="https://debops.org/"&gt;DebOps&lt;/a&gt; by way of a Mitogen bug report. This deserves a
    whole article on its own, meanwhile it represents what is likely to be a huge
    piece of the testing puzzle.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Multithreaded connect&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;A big chunk is already implemented in order to fix an unrelated bug! The
    default pool size has 16 threads in one process, so there will only be a minor
    performance penalty for the first task to run when the number of targets
    exceeds 16. Meanwhile, the queue size is adjustable via an environment
    variable. I’ll tidy this up later.&lt;/p&gt;

    &lt;p&gt;Even though it basically already exists, I’m not yet focused on making
    multithreaded connect work - including analysing the various performance
    weirdness that appears when running Mitogen against multiple targets. These
    definitely exist, I just haven’t made time yet to determine whether it’s an
    Ansible-side scaling issue or a Mitogen-side issue. Stay tuned and don’t
    worry! Multi-target runs are already zippy, and I’m certain any issues found
    can be addressed.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;At least a full day will be dedicated to nothing but coming up with new attack
    scenarios, meanwhile I’m feeling pretty good about security already. The
    fabulous &lt;a href="https://github.com/moreati"&gt;Alex Willmer&lt;/a&gt; has been busily inventing
    new cPickle attack scenarios, and some of them are absolutely fantastically
    scary! He’s sitting on at least one exciting new attack that represents a
    no-brainer decider on the viability of keeping cPickle or replacing it.&lt;/p&gt;

    &lt;p&gt;Serialization aside, I’ve been busy comparing Ansible’s existing security
    model to what the extension provides today, and have at least identified
    &lt;a href="https://github.com/dw/mitogen/issues/132"&gt;unidirectional routing mode&lt;/a&gt; as a
    must-have for delivering the extension. Regarding that, it is possible to have
    a single playbook safely target 2 otherwise completely partitioned networks.
    Today with Mitogen, one network could route messages towards workers in the
    other network using the controller as a bridge. While this should be harmless
    (given existing security mitigations), it still introduces a scary capability
    for an attacker that shouldn’t exist.&lt;/p&gt;

    &lt;p&gt;&lt;a href="https://github.com/dw/mitogen/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity"&gt;Some more security bugs I’m fixing here&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Deferring Windows support&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Really screwed up on planning here - turns out Ansible on Windows does not use
    Python whatsoever, and so implementing the support in Mitogen would mean
    increasing the installation requirements for Windows targets. That’s stupid,
    it violates Ansible’s zero-install design and was explicitly a non-goal from
    the get go.&lt;/p&gt;

    &lt;p&gt;Meanwhile WinRM has extremely poor options for bidirectional IO, and likely
    viable Mitogen support for Windows will include introducing a, say,
    SSL-encrypted reversion connection from the target machine in order to get
    efficient IO.&lt;/p&gt;

    &lt;p&gt;I will shortly be polling everyone who has pledged towards the project, and if
    nobody speaks up to save Windows, it’s being pushed to the back of the queue.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;A big, big thanks, once again!&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;It goes without saying but none of this work has been a lone effort, starting
    from planning, article review, funding, testing, and an endless series of
    suggestions, questions and recommendations coming from so many people. Thanks
    to everyone, whether you contributed a single $1 or a single typo bug report.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Super busy, but also super on target! Until next time..&lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Wed, 28 Mar 2018 08:22:33 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-03-28:/2018-03-28-crowdfunding-mitogen-day-23.html</guid></item><item><title>Kickstarting free software: one week later</title><link>https://sweetness.hmmz.org/2018-03-13-kickstarting-free-software-one-week-later.html</link><description>

    &lt;p&gt;It&amp;rsquo;s been an incredibly intense first week crowdfunding the &lt;strong&gt;&lt;a target="_blank" href="https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html"&gt;Mitogen
    extension for Ansible&lt;/a&gt;&lt;/strong&gt; involving far more effort than anticipated, where I
    have worked almost flat out from waking until the early hours just to ensure
    any queries are answered thoroughly. I cannot complain, because it has been so
    much fun that I&amp;rsquo;d change almost nothing of the experience, and already the
    campaign has reached 46% from the exposure it received.&lt;/p&gt;

    &lt;p&gt;As a recap &lt;strong&gt;Mitogen is a library for writing distributed programs that require
    zero deployment&lt;/strong&gt;, with the prototype extension implementing an architectural
    change that &lt;strong&gt;vastly improves Ansible&amp;rsquo;s performance&lt;/strong&gt; in common scenarios,
    laying a framework to extend this advantage far beyond simple overhead
    reduction.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Initial testers&lt;/strong&gt;&lt;/p&gt;

    &lt;div style="float:right; border:1px solid silver;margin-left: 16px;"&gt;
    &lt;iframe src="https://www.kickstarter.com/projects/548438714/mitogen-extension-for-ansible/widget/card.html?v=2" width="220" height="420" frameborder="0" scrolling="no" target="_blank"&gt;&lt;/iframe&gt;
    &lt;/div&gt;


    &lt;p&gt;A great deal of work has simply been staying on top of bug reports and ensuring
    experiences with the prototype are solid – for each report from one tester, we
    can assume 10 more hit the same bug but did not or could not report it.&lt;/p&gt;

    &lt;p&gt;Of the many reports received, I have addressed almost all of them promptly.
    Some fabulous
    &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/113"&gt;bugs&lt;/a&gt;
    have been
    &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/110"&gt;found&lt;/a&gt;
    and
    &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/114"&gt;fixed&lt;/a&gt;
    along with
    &lt;a target="_blank" href="https://www.reddit.com/r/programming/comments/82exu8/quadrupling_ansible_performance_with_mitogen/dvh6g44/"&gt;one report via Reddit&lt;/a&gt;
    of &lt;strong&gt;a performance improvement so fantastical&lt;/strong&gt; that it exceeds even my most
    contrived overhead-heavy example:&lt;/p&gt;

    &lt;blockquote&gt;
        "With mitogen my playbook runtime went from &lt;strong&gt;45 minutes to just
        under 3 minutes&lt;/strong&gt;. Awesome work!"
    &lt;/blockquote&gt;


    &lt;p&gt;This is a common theme – anywhere &lt;a target="_blank" href="http://docs.ansible.com/ansible/latest/playbooks_loops.html"&gt;with_items&lt;/a&gt;
    appears, Mitogen has the most profound impact. The obvious reason is that
    during loops the same module is executed repeatedly, and after one iteration is
    guaranteed to be compiled and ready on the target.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;So many lessons!&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Developing the campaign from a thought exercise one idle Sunday evening into an
    actually practical project has taken a &lt;em&gt;lot&lt;/em&gt; of work – far more than I
    anticipated, and at almost every step I have learned something novel. This is
    all reuseable knowledge for anyone attempting a similar project in future, and
    I will write it up as time permits.&lt;/p&gt;

    &lt;p&gt;Regardless of outcomes the campaign has already proven one very exciting
    result: &lt;strong&gt;real users will stake real money towards something as seemingly
    mundane as free infrastructure&lt;/strong&gt;, and I think that&amp;rsquo;s beyond amazing. In a world
    content to throw millions of dollars at junk ICOs almost weekly, crowdfunding
    free software seems to me a practice that should happen far more often.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Thank you&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;I wish to thank everyone for the support shown thus far, and I&amp;rsquo;d encourage you
    to consider tapping that Ansible user you know on the shoulder to let them know
    about the project. For those working close to infrastructure consulting, please
    consider using the final week to corner your boss regarding associating your
    company logo with a sexy project that promises to receive many eyeballs over
    the coming years.&lt;/p&gt;

    &lt;p&gt;Thanks for reading!&lt;/p&gt;

    &lt;p&gt;&lt;br&gt;
    David.&lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Tue, 13 Mar 2018 14:20:28 +0000</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-03-13:/2018-03-13-kickstarting-free-software-one-week-later.html</guid></item><item><title>Quadrupling Ansible performance with Mitogen</title><link>https://sweetness.hmmz.org/2018-03-06-quadrupling-ansible-performance-with-mitogen.html</link><description>
    &lt;p&gt;&lt;em&gt;[tl;dr: the &lt;a target="_blank" href="https://networkgenomics.com/ansible/"&gt;Mitogen extension for
    Ansible&lt;/a&gt; exists today and it&amp;rsquo;s as awesome as I promised, but I
    want to push things much further. If you value free time,  &lt;a href="#support"&gt;this project needs your support&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

    &lt;p&gt;Allegedly on site as a developer, two summers ago I found myself in a situation
    you are no doubt familiar with, where despite preferences unrelated problems
    inevitably gravitate towards whoever can deal with them. Following an
    exhausting day spent watching a dog-slow Ansible job fail repeatedly, one
    evening I dusted off a personal aid to help me relax: an ancient, perpetually
    unfinished hobby project whose sole function until then had simply been to
    remind me things can always improve.&lt;/p&gt;

    &lt;img src="/images/mito3/cell_division.svg" class="mitogen-right-180 mitogen-logo-wrap"&gt;

    &lt;p&gt;Something of a miracle had struck by the early hours of next morning, as almost
    every outstanding issue had been solved, and to my disbelief the code ran
    reliably. 18 months later and for the first time in living memory, I am excited
    to report delivery of that project, one of sufficient complexity as to have
    warranted extreme persistence - in this case from concept to implementation,
    over more than a decade.&lt;/p&gt;

    &lt;p&gt;The miracle? It comes in the form of
    &lt;a target="_blank" href="http://mitogen.networkgenomics.com/"&gt;Mitogen&lt;/a&gt;
    - a tiny Python library you won&amp;rsquo;t have heard of, but I hope as an Ansible user
    you will soon eternally be glad for, on discovering &lt;a target="_blank" href="https://docs.ansible.com/ansible/2.4/ansible-playbook.html"&gt;ansible-playbook&lt;/a&gt;
    now completes in very reasonable time even in the face of deeply unreasonable
    operating conditions.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Mitogen is a library for writing distributed programs that require zero
    deployment&lt;/strong&gt;, specifically designed to fit the needs of infrastructure software
    like Ansible. Without upfront configuration it supports any UNIX machine
    featuring an installed Python interpreter, which is to say almost all of them.
    While the concept is hard to explain - even to fellow engineers, its value is
    easy to grasp:&lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito3/run_hostname_100_times_io.svg" width="674"&gt;
    &lt;/figure&gt;

    &lt;p&gt;This trace shows two Ansible runs of a &lt;a target="_blank" href="https://github.com/dw/mitogen/blob/master/examples/playbook/run_hostname_100_times.yml"&gt;basic
    100-step playbook&lt;/a&gt; over a 1 ms latency network against a single target host.
    The first run employs &lt;a target="_blank" href="http://docs.ansible.com/ansible/latest/dev_guide/developing_program_flow_modules.html#pipelining"&gt;SSH
    pipelining&lt;/a&gt;, Ansible&amp;rsquo;s current most optimal configuration, where it consumes
    almost 4.5 Mbytes network bandwidth in a running time of 59 secs.&lt;/p&gt;

    &lt;p&gt;The second uses the prototype &lt;a target="_blank" href="https://networkgenomics.com/ansible/"&gt;Mitogen extension for
    Ansible&lt;/a&gt;, with a far more reasonable 90 Kbytes consumed in 8.1 secs.
    &lt;strong&gt;An unmodified playbook executes over 7 times faster while consuming
    50x less bandwidth&lt;/strong&gt;.&lt;/p&gt;

    &lt;p&gt;Less than &lt;strong&gt;half the CPU time&lt;/strong&gt; was consumed on the host machine, meaning that
    by one metric it should handle at least &lt;strong&gt;twice as many targets&lt;/strong&gt;. Crucially
    &lt;strong&gt;no changes were required to the target machine&lt;/strong&gt;, including new software or
    nasty on-disk caches to contend with.&lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito3/run_hostname_100_times_cpu.svg" width="674"&gt;
    &lt;/figure&gt;

    &lt;p&gt;While only pure overhead is measured above, the benefits very much extend
    to real-world scenarios. See the &lt;a target="_blank" href="https://networkgenomics.com/ansible/"&gt;documentation&lt;/a&gt; and &lt;a target="_blank" href="https://github.com/dw/mitogen/issues/85#issuecomment-366499788"&gt;issue
    #85&lt;/a&gt; (4.2x time, 3.1x CPU) for examples.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;How is this possible?&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Mitogen is perhaps most easily described as a kind of &lt;strong&gt;network-capable &lt;a target="_blank" href="http://man7.org/linux/man-pages/man2/fork.2.html"&gt;fork()&lt;/a&gt; on
    steroids&lt;/strong&gt;. It allows programs to establish lazily-loaded duplicates on remote
    hosts, without requiring any upfront remote disk writes, and to communicate
    with those copies once they exist. The copies can in turn recursively split to
    produce further children - with bidirectional message routing between every
    copy handled automatically.&lt;/p&gt;

    &lt;p&gt;In the context of Ansible, unlike with SSH pipelining where up to one SSH
    invocation, sudo invocation and script compilation are required for every
    playbook step, and with all scripts re-uploaded for each step, with Mitogen
    &lt;em&gt;only one of each exists per target for the duration of the playbook run&lt;/em&gt;, with
    all code cached in RAM between steps. &lt;strong&gt;Absolutely everything is reused&lt;/strong&gt;,
    saving 300-800 ms on every step.&lt;/p&gt;

    &lt;p&gt;The extension represents around a week&amp;rsquo;s work, replaces hundreds of lines of
    horrid shell-related code in Ansible, and is already at the point where on one
    real-world playbook, &lt;strong&gt;Ansible is only 2% slower than equivalent SSH
    commands&lt;/strong&gt;. Presently connection establishment is single-threaded, so the
    prototype is only good for a few hosts, but rest assured this limitation&amp;rsquo;s days
    are numbered.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Not just a speed up, a paradigm shift you&amp;rsquo;ll adore&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="padding: 4px; float: right;" src="/images/mito3/morpheus.jpg" width="224" height="135" title="Matrix memes never die"&gt;

    &lt;p&gt;If this seems impressive and couldn&amp;rsquo;t be improved upon, prepare for some deep
    shocks. You can think of the extension not just as a performance
    improvement, but something of a surreptitious &lt;em&gt;beachhead&lt;/em&gt; from which I intend
    to thoroughly assault your sense of reality.&lt;/p&gt;

    &lt;p&gt;This performance is a side effect of a far more interesting property: Ansible
    is no longer running on just the host machine, but &lt;em&gt;temporarily distributed
    throughout the target network for the duration of the run&lt;/em&gt;, with bidirectional
    communication between all pieces, and you won&amp;rsquo;t believe the crazy functionality
    this enables.&lt;/p&gt;

    &lt;p&gt;What if I told you it were possible not only to eliminate that final 2%, but
    turn it sharply negative, while simultaneously reducing resource consumption?
    &lt;em&gt;&amp;ldquo;Surely Ansible can&amp;rsquo;t execute faster than equivalent raw SSH commands?&amp;rdquo;&lt;/em&gt; You
    bet it can! And if you care about such things, &lt;strong&gt;this could be yours by
    Autumn&lt;/strong&gt;. Read on..&lt;/p&gt;

    &lt;div clear="all"&gt;&lt;/div&gt;


    &lt;p&gt;&lt;strong&gt;Pushing brains into the ether, no evil agents required&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="padding: 4px; float: right;" src="/images/mito3/smith2.jpg" width="131" height="204" title="Never send a human to do a machine's job"&gt;

    &lt;p&gt;&lt;a target="_blank" href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;As I teased last
    year&lt;/a&gt;, Ansible takes its name from a faster-than-light communication
    device from science fiction, yet despite these improvements it is still
    fundamentally bound by the speed with which information physically propagates.
    Pull and agent-based tooling is strongly advantageous here: control flow occurs
    at the same point as the measurements necessary to inform that flow, and no
    penalty is incurred for traversing the network.&lt;/p&gt;

    &lt;p&gt;Today, reducing latency in Ansible means running it within the target network,
    or in &lt;a target="_blank" href="http://docs.ansible.com/ansible/2.4/ansible-pull.html"&gt;pull mode&lt;/a&gt;,
    where the playbook is stored on the target alongside for example, secrets for
    decrypting any vaults, and the hairy mechanics required to keep that in sync
    and executing when appropriate. This is a far cry from the simplicity of
    tapping &lt;code&gt;ansible-playbook live.yml&lt;/code&gt; on your laptop, and so it is an option of
    last resort.&lt;/p&gt;

    &lt;p&gt;What would be &lt;em&gt;amazing&lt;/em&gt; is some hybrid where we could have the performance and
    scaleability benefits of pull, combined with the stateless simplicity of push,
    without introducing dedicated hosts or permanent caches and agents running on
    the target machines, that amount to &lt;em&gt;persistent intermediate state&lt;/em&gt; and
    introduce huge headaches of their own, all without sacrificing the fabulous
    ability to shut everything down with a simple &lt;em&gt;CTRL+C&lt;/em&gt;.&lt;/p&gt;

    &lt;div style="clear: both;"&gt;&lt;/div&gt;


    &lt;p&gt;&lt;strong&gt;The opening volley: connection delegation&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="padding: 4px; float: right;" src="/images/mito3/jumpbox.png" width="268" height="426"&gt;

    &lt;p&gt;As a first step to exploiting previously impossible functionality, I will
    enhance the extension to support delegating connection establishment to a
    machine on the target network, avoiding the cost of establishing hundreds of
    SSH connections over a low throughput, high latency network link.&lt;/p&gt;

    &lt;p&gt;Unlike with SSH proxying, this has the huge benefit of &lt;strong&gt;caching and serving
    Ansible code from RAM on the intermediary&lt;/strong&gt;, avoiding uploading approximatey
    50KiB of code for every playbook step, and ensuring those cached responses are
    delivered over the low latency LAN fabric on the target network. For 100 target
    machines, this replaces the transmission of &lt;strong&gt;5 Mbytes of data for every
    playbook step&lt;/strong&gt; with on the order of kilobytes worth of tiny remote procedure
    calls.&lt;/p&gt;

    &lt;p&gt;All the Mitogen-side infrastructure for this exists today, and is already used
    to implement &lt;a target="_new" href="http://docs.ansible.com/ansible/latest/become.html#become"&gt;become&lt;/a&gt; support.
    It could be flipped on with a few lines of code in the Ansible extension, but
    there are a few more &lt;a target="_blank" href="https://sweetness.hmmz.org/2018-02-13-much-ado-about-latency-mitogen-and-the-bfg9000-of.html"&gt;importer
    bugs to fix&lt;/a&gt; before it&amp;rsquo;ll work perfectly.&lt;/p&gt;

    &lt;p&gt;Finally as a reminder, since Mitogen operates recursively &lt;strong&gt;delegation also
    operates recursively&lt;/strong&gt;, with code caching and connection establishment
    happening at each hop. Not only is this useful for navigating slow links and
    complicated firewall setups, as we&amp;rsquo;ll see, it enables some exciting new
    scenarios.&lt;/p&gt;

    &lt;div style="clear: both;"&gt;&lt;/div&gt;


    &lt;p&gt;&lt;strong&gt;Asynchronous Connect&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Ansible is intended to manage many machines simultaneously, and while the
    extension&amp;rsquo;s improvements presently work well for single-machine playbooks, that
    is all but a niche application for many users.&lt;/p&gt;

    &lt;p&gt;Having the newfound ability to delegate connection establishment to an
    intermediary on the target network, far away from our laptop&amp;rsquo;s high latency 3G
    connection, and with the ability to further sub-delegate from that
    intermediary, we can implement a &lt;em&gt;divide and conquer&lt;/em&gt; strategy, forming a large
    tree comprising the final network of target machines for the playbook run, with
    responsibility for caching and connection multiplexing evenly divided across
    the tree, neatly avoiding single resource bottlenecks.&lt;/p&gt;

    &lt;figure style="text-align: center;"&gt;
        &lt;img src="/images/mito3/async.png" width="449"&gt;
    &lt;/figure&gt;


    &lt;p&gt;I will rewrite Mitogen&amp;rsquo;s connection establishment to be asynchronous: creation
    of &lt;em&gt;many&lt;/em&gt; downstream connections can be scheduled in parallel, with the ability
    to enqueue commands prior to completion, including recursive commands that
    would cause those connections to in turn be used as intermediaries.&lt;/p&gt;

    &lt;p&gt;The cost of establishing connections should become only the cost of code upload
    (~50KiB) and the latency of a single SSH connection per tree layer, as
    connections at each layer occur in parallel. For an imaginary 1,700 node
    cluster split into quarters of 17 racks and 25 nodes per rack, connection via a
    300 ms 3G network should complete in well under 15 seconds.&lt;/p&gt;

    &lt;div style="clear: both;"&gt;&lt;/div&gt;

    &lt;p&gt;&lt;strong&gt;Topology-aware file synchronization&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;So you have a playbook on your laptop deploying a Django application via the
    &lt;a target="_blank" href="http://docs.ansible.com/ansible/latest/synchronize_module.html"&gt;synchronize&lt;/a&gt;
    module, to 100 Ubuntu machines running in a datacentre 300 ms away. Each run of
    the playbook entails a groan followed by a long walk, as a 3.8 second &lt;a target="_blank" href="http://man7.org/linux/man-pages/man1/rsync.1.html"&gt;rsync&lt;/a&gt; run is
    invoked 100 times via your 3G connection, just to synchronize a 3 Mbyte
    asset the design team won&amp;rsquo;t stop tweaking. Not only are there 6
    minutes of roundtrips buried in those invocations, but that puny 3G
    connection is forced to send a total of 300 Mbytes toward the target
    network.&lt;/p&gt;

    &lt;p&gt;What is the point of continually re-sending that file to the same set of
    machines in some far-off network? What if it could be uploaded exactly once,
    then automatically cached and redistributed within the target network,
    producing exactly one upload per layer in the hierarchy:&lt;/p&gt;

    &lt;figure style="text-align: center;"&gt;
        &lt;img src="/images/mito3/distribute.png" width="560"&gt;
    &lt;/figure&gt;

    &lt;p&gt;Why stop at delegating connection establishment and module caching? Now we have
    a partial copy of Ansible within the network, nothing prevents implementing all
    kinds of smarts. Here is another feature that is a cinch to build once
    bidirectional communication exists between topology-aware code, which the
    prototype extension already provides today.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Generalized forwarding&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;After a brutal 4 hour meeting involving 10 executives our hero Bob, Senior
    Disaster Architect III, emerges bloodstained yet victorious against the
    tyrannical security team, as his backends can talk with impunity to the entire
    Internet just so &lt;code&gt;apt-get&lt;/code&gt; can reach &lt;code&gt;packages.debian.org&lt;/code&gt; for the 15
    seconds Bob&amp;rsquo;s daily Ansible CI job requires.&lt;/p&gt;

    &lt;p&gt;That evening, having regaled his giddy betrothed (HR Coordinator II) with
    heroic story of war, Bob catches a brief yet chilling glimmer of doubt for all
    that transpired. &lt;em&gt;&amp;ldquo;Was there another way?&amp;rdquo;&lt;/em&gt; he sleepily ponders, before
    succumbing to a cosier battle waged by those fatigued and heavy eyelids.
    Suddenly aware again, Bob emerges bathed in a mysterious utopian dreamscape
    where CI jobs executed infinitely quickly, war and poverty did not exist, and
    the impossible had always been possible.&lt;/p&gt;

    &lt;figure style="text-align: center;"&gt;
        &lt;img src="/images/mito3/pipe.png" width="576"&gt;
    &lt;/figure&gt;

    &lt;p&gt;Building on &lt;a target="_blank" href="http://mitogen.networkgenomics.com/howitworks.html#message-routing"&gt;Mitogen&amp;rsquo;s
    message routing&lt;/a&gt;, forwarding all kinds of pipes and network sockets becomes
    trivial, including schemes that would allow exposing a transient, locked down
    HTTP proxy to Bob&amp;rsquo;s &lt;code&gt;apt-get&lt;/code&gt; invocation only for as long as necessary, all
    with a few lines of YAML in a playbook.&lt;/p&gt;

    &lt;p&gt;While this is already possible with SSH forwarding, the hand-configuration
    involved is messy, and becomes extremely hairy when the target of the forward
    is not the host machine. My initial goal is to support forwarding of UNIX and
    TCP sockets, as they cover all use cases I have in mind. Speaking of which..&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Topology-aware Git pull&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Another common security fail seen in Ansible playbooks is to call Git directly
    from target machines, including granting those machines access to a Git
    server. This is a horrid violation: even read-only access implies the machine
    needs permanent firewall rules &lt;strong&gt;that shouldn&amp;rsquo;t exist&lt;/strong&gt;, just for the scant
    moments a pull is in progress. Granting backends access to a site as complex as
    GitHub.com, you may as well abandon all outbound firewalling, as this is enough
    for even the puniest script kiddy to exfiltrate a production database.&lt;/p&gt;

    &lt;p&gt;What if Git could run with the permissions of the local Ansible user, on the
    user&amp;rsquo;s own machine, and be served efficiently to the target machines only for
    the duration of the push, faster than 100 machines talking to GitHub.com, and
    &lt;em&gt;only to the single read-only repository intended&lt;/em&gt;?&lt;/p&gt;

    &lt;figure style="text-align: center;"&gt;
        &lt;img src="/images/mito3/topogit.png" width="606"&gt;
    &lt;/figure&gt;

    &lt;p&gt;Building on generalized forwarding, topology-aware Git repeats all the caching
    and single-upload tricks of file synchronization, but this time implementing
    the Git protocol between each node.&lt;/p&gt;

    &lt;p&gt;In the scheme I will implement, a single round-trip is necessary for &lt;a target="_blank" href="https://git-scm.com/docs/git-fetch-pack.html"&gt;git-fetch-pack&lt;/a&gt; to pull
    just the changed objects from the laptop over the high latency 3G link, before
    propagating at LAN speeds throughout the target network, with &lt;a target="_blank" href="https://git-scm.com/docs/git-ls-remote.html"&gt;git-ls-remote&lt;/a&gt; output
    delivered as part of the message that initiates the pull. Not only is the
    result more efficient than a normal &lt;a target="_blank" href="https://git-scm.com/docs/git-pull.html"&gt;git-pull&lt;/a&gt;, but backends no
    longer require network access to Git.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;The final word: Inversion of control&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="padding: 4px; float: right;" src="/images/mito3/ioc.jpg" width="157" height="316"&gt;

    &lt;p&gt;Remember we talked about making Ansible run faster than equivalent SSH
    commands? Well, today Ansible requires one network round-trip per playbook
    step, so just like SSH, it must pay the penalty for every round-trip unless
    something gives, and that something is the partial delegation of control to the
    target machine itself.&lt;/p&gt;

    &lt;p&gt;With inversion of control, the role of &lt;code&gt;ansible-playbook&lt;/code&gt; simply becomes that
    of shipping code and selective chunks of data to target machines, where those
    machines can execute and make control decisions without necessitating a
    conversation with the master after each step, just to figure out what to
    execute next.&lt;/p&gt;

    &lt;p&gt;Ansible has all the framework to enable implementing this today, by
    significantly extending the prototype extension&amp;rsquo;s existing strategy plug-in,
    and teaching it how to automatically send and wait on batches of tasks, rather
    than on single tasks at a time.&lt;/p&gt;

    &lt;p&gt;Aside from improved performance, the semantics of the existing &lt;code&gt;linear&lt;/code&gt;
    strategy will be preserved, and playbooks need not be changed to cope: on the
    target machine tasks will not suddenly begin running concurrently, or in any
    order different to previously.&lt;/p&gt;

    &lt;div style="clear: both;"&gt;&lt;/div&gt;


    &lt;p&gt;&lt;strong&gt;App-level connection persistence&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="padding: 4px; float: right;" src="/images/mito3/persistent.png" width="136" height="366"&gt;

    &lt;p&gt;As a final battle against latency during playbook development and debugging, I
    will support detaching the connection tree from &lt;code&gt;ansible-playbook&lt;/code&gt; on exit,
    and teach the extension to reuse it at startup. This will reduce the overhead
    of repeat runs, especially against many targets, to the order of hundreds of
    milliseconds, as no new SSH connections, module compilations or code uploads
    are required.&lt;/p&gt;

    &lt;p&gt;Connection persistence opens the floodgates for adding sweet new tooling,
    although I&amp;rsquo;m not sure how desirable it is to expose an implementation detail
    like this forever, while also extending the interface provided by Ansible
    itself. As a simple example, we could provide an &lt;code&gt;ansible-ssh&lt;/code&gt; tool that
    reuses the connection tree along with Ansible&amp;rsquo;s tunnelling, delegation, dynamic
    inventory and authentication configuration to forward a pipe to a remote shell.&lt;/p&gt;

    &lt;p&gt;&lt;a name="support" id="support"&gt;&lt;/a&gt;
    &lt;strong&gt;The cost of slow tooling&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Ansible has &lt;a target="_blank" href="https://github.com/ansible/ansible"&gt;over
    28,500 stars on GitHub&lt;/a&gt;, representing just those users who have a GitHub
    account and ever thought to star it, and appears to grow by 150 stars per week.
    Around London the going rate to hire one user is $100/hour, and conservatively,
    we could expect that user is trotting out a 15 minute run of &lt;code&gt;ansible-playbook
    live.yml&lt;/code&gt; at least once per week.&lt;/p&gt;

    &lt;p&gt;We can expect that if Ansible is running merely twice as slowly as necessary,
    7.5 minutes of that run is lost productivity, and across those 28,500 users,
    the economic cost is in the region of &lt;strong&gt;$356,250 per invocation&lt;/strong&gt; or
    &lt;strong&gt;$17,100,000 per year&lt;/strong&gt;. In reality the average user is running Ansible far
    more often, including thousands of times per minute under various CI systems
    worldwide, and those runs often last far longer than 15 minutes, but I&amp;rsquo;d
    recommend that mental guesstimation is left as an exercise to readers who are
    already blind drunk.&lt;/p&gt;

    &lt;div style="clear: both;"&gt;&lt;/div&gt;


    &lt;p&gt;&lt;strong&gt;The future is beautiful if you want it to be&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="padding: 4px; float: right;" src="/images/mito3/dw.jpg" width="150" height="150"&gt;

    &lt;p&gt;My name is David, and nothing jinxes my day quite like slow tooling. I have
    poured easily 500 hours in some form into this project over a decade and on my
    own time. The project has now reached an inflection point where the fun part is
    over, &lt;em&gt;the science is done and the effect is real&lt;/em&gt;, and only a small, highly
    predictable set of milestones remain to deliver what I hope you agree is a much
    brighter future.&lt;/p&gt;

    &lt;p&gt;Before reading I doubt you would have believed it possible to provide the
    features described without a complex infrastructure running in the target
    network, now I hope you&amp;rsquo;ll join me in disproving one final impossibility.&lt;/p&gt;

    &lt;p&gt;While everything here will exist in time, &lt;strong&gt;it cannot exist in 2018 without
    your support&lt;/strong&gt;, and that&amp;rsquo;s why I&amp;rsquo;d like to try something crazy, that would
    allow me to devote myself to delivering a vastly improved daily routine for
    thousands of people just like you and me.&lt;/p&gt;

    &lt;p&gt;You may have guessed already: &lt;strong&gt;I want you to crowdfund awesome tooling&lt;/strong&gt;.&lt;/p&gt;

    &lt;div style="float:right; border:1px solid silver;margin-left: 16px;"&gt;
    &lt;iframe src="https://www.kickstarter.com/projects/548438714/mitogen-extension-for-ansible/widget/card.html?v=2" width="220" height="420" frameborder="0" scrolling="no" target="_blank"&gt;&lt;/iframe&gt;
    &lt;/div&gt;


    &lt;p&gt;What value would you place on an extra productive hour every working week? In
    the UK that&amp;rsquo;s an easy question: it&amp;rsquo;s around $4,800 per year. And what risk is
    there to contributing $100 to an already proven component? I hope you&amp;rsquo;ll agree
    this too is a no-brainer, both for you and your employer.&lt;/p&gt;

    &lt;p&gt;To encourage success I&amp;rsquo;m offering a unique permanent placement of your brand on
    the GitHub repository and documentation. Funds will be returned if the minimum
    goal cannot be reached, however just 3 weeks are sufficient to ensure a well
    tested extension, with my full attention given to every bug, ready to save many
    hours right on time to enjoy the early sunlight of Spring.&lt;/p&gt;

    &lt;p&gt;Totalling much less than the economic damage caused by a single run of today&amp;rsquo;s
    Ansible, the grand plan is divided into incrementally related stretch goals. I
    cannot imagine this will achieve full funding, but if it does, as a finale
    &lt;strong&gt;I&amp;rsquo;ll deliver a feature built on Ansible that you never dreamed possible&lt;/strong&gt;.&lt;/p&gt;

    &lt;p&gt;What will that be? &lt;a target="_blank" href="https://www.kickstarter.com/projects/548438714/mitogen-extension-for-ansible/"&gt;Pledge
    today if you&amp;rsquo;d like to find out&lt;/a&gt;.&lt;/p&gt;

    &lt;div style="clear: both;"&gt;&lt;/div&gt;


    &lt;p&gt;&lt;strong&gt;Combating obsolescence in our beloved tools&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;As a modern area deployment tooling is exposed to the ebb and flow of the
    software industry far more than typical, and unexpected disruption happens
    continuously. Without ongoing evolution, exposure to buggy and unfamiliar new
    tooling is all but guaranteed, with benefits barely justifying the cost of
    their integration. As we know all too well, rational ideas like &lt;em&gt;cost/benefit&lt;/em&gt;
    rarely win the hearts of buzzword-hungry and youthful infrastructure teams, so
    counterarguments must be presented another way.&lt;/p&gt;

    &lt;p&gt;As a recent example there is growing love for &lt;a href="https://github.com/purpleidea/mgmt"&gt;mgmt&lt;/a&gt;, which is designed from the
    outset as an agent-based reactive distributed system, much as Mitogen nudges
    Ansible towards. However unlike mgmt, Ansible preserves its zero-install and
    agentless nature, while laying a sound framework for significantly more
    exciting features. If that alone does not win loyalty, we&amp;rsquo;re at least
    guaranteed that every migration-triggering new feature implemented in such
    systems can be headed off with minimal effort, long into the foreseeable
    future.&lt;br&gt;&lt;br&gt;&lt;br&gt;David.&lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Tue, 06 Mar 2018 12:58:09 +0000</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-03-06:/2018-03-06-quadrupling-ansible-performance-with-mitogen.html</guid></item><item><title>Much ado about latency: Mitogen and the BFG9000 of import hooks</title><link>https://sweetness.hmmz.org/2018-02-13-much-ado-about-latency-mitogen-and-the-bfg9000-of.html</link><description>
    &lt;p&gt;&lt;/p&gt;&lt;p&gt;After a long winter break from recreational programming, over the past days I
    finally built up steam and broke a chunk of new ground on Mitogen, this time
    growing its puny module forwarder into a bona fide beast, ready to handle
    almost any network condition and user code thrown at it.&lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito2/bfg9000.jpg"&gt;
        &lt;figcaption&gt;No adversary is a match for the BFG&lt;/figcaption&gt;
    &lt;/figure&gt;

    &lt;p&gt;&lt;strong&gt;Recap&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;&lt;a href="http://mitogen.networkgenomics.com/"&gt;Mitogen is a library for executing parts of a Python program in a remote
    context&lt;/a&gt;, primarily over &lt;code&gt;sudo&lt;/code&gt; and
    SSH connections, and establishing bidirectional communication with those
    parts. Targeting infrastructure applications, it requires no upfront
    configuration of target machines, aside from an SSH daemon and Python 2.x
    interpreter, which is the default for almost every Linux machine found on any
    conceivable network.&lt;/p&gt;

    &lt;p&gt;The target need not possess a writeable filesystem, code is loaded
    dynamically on demand, and execution occurs entirely from RAM.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;How Import Works&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;To implement dynamic loading, child Python processes (&amp;ldquo;contexts&amp;rdquo;) have a
    &lt;a href="https://www.python.org/dev/peps/pep-0302/"&gt;PEP-302 import hook&lt;/a&gt; installed
    that causes attempts to import modules unavailable locally to automatically be
    served over the network connection to the parent process. For example, in a
    script like:&lt;/p&gt;

    &lt;pre&gt;
        import mitogen
        import requests

        def get_url(url):
            return requests.get(url).text

        @mitogen.main()
        def main(router):
            host = router.ssh(hostname='k3')
            print host.call(get_url, 'https://www.google.com/')
    &lt;/pre&gt;



    &lt;p&gt;If the &lt;code&gt;requests&lt;/code&gt; package is missing on the host &lt;code&gt;k3&lt;/code&gt;, it will automatically
    be copied and imported in RAM, without requiring upfront configuration, or
    causing or requiring writes to the remote filesystem.&lt;/p&gt;

    &lt;figure style="float: right;"&gt;
        &lt;img src="/images/mito2/kathmandu-paris.jpg"&gt;
        &lt;figcaption&gt;Kathmandu to Paris via 3G: serious business&lt;/figcaption&gt;
    &lt;/figure&gt;

    &lt;p&gt;&lt;strong&gt;So far, so good. Just one hitch&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;While the loader has served well over the library&amp;rsquo;s prototypical life (which
    in real time, is approaching 12 years!), it has always placed severe limits on
    the structure of the loaded code, as each additional source file introduced
    one network round-trip to serve it.&lt;/p&gt;

    &lt;p&gt;Given a relatively small dependency such as Kenneth Reitz' popular
    &lt;a href="http://requests.readthedocs.io/en/master/"&gt;Requests&lt;/a&gt; package, comprising 17
    submodules, this means 17 additional network round-trips. While that may not
    mean much over a typical local area network segment where roundtrips are
    measured in microseconds, it quickly multiplies over even modest wide-area
    networks, where infrastructure tooling is commonly deployed.&lt;/p&gt;

    &lt;p&gt;For a library like Requests, 17 round-trips amounts to 340ms latency over a
    reasonably local 20ms link, which is comfortably within the realms of
    acceptable, however over common radio and international links of 200ms or
    more, already this adds at least 3.4 seconds to the startup cost of any
    Mitogen program, time wasted doing nothing but waiting on the network.&lt;/p&gt;

    &lt;p&gt;Sadly, Requests is hardly even the biggest dependency Mitogen can expect to
    encounter. For testing I chose
    &lt;a href="https://docs.djangoproject.com/en/2.0/topics/db/models/"&gt;django.db.models&lt;/a&gt; as
    a representative baseline: heavily integrated with all of Django, it
    transitively imports over 160 modules across numerous subpackages. That means
    on an international link, over 30 seconds of startup latency spent on one
    dependency.&lt;/p&gt;

    &lt;p&gt;It is worth note that Django is not something I&amp;rsquo;d expect to see in a typical
    Mitogen program, it&amp;rsquo;s simply an extraordinarily worst-case target worth
    hitting. If Mitogen can handle &lt;code&gt;django.db.models&lt;/code&gt;, it should cope with pretty
    much anything.&lt;/p&gt;

    &lt;p&gt;Combining evils, over an admittedly better-than-average Nepali mobile data
    network, &lt;em&gt;and&lt;/em&gt; an international link to my &lt;del&gt;IRC box&lt;/del&gt; mail server in Paris,
    &lt;code&gt;django.db.models&lt;/code&gt; takes almost 60 seconds to load with the old design.&lt;/p&gt;

    &lt;p&gt;In the real world, this one-file-per-roundtrip characteristic means the
    current approach &lt;a href="https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html"&gt;sucks almost as much as
    Ansible
    does&lt;/a&gt;,
    which calls into doubt my goal of implementing an Ansible-trumping Ansible
    connection plug-in. Clearly something must give!&lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito2/django-3g-kathmandu-paris-rtt-old.svg" width="100%"&gt;
        &lt;figcaption&gt;
            50.35 seconds and hundreds of roundtrips spent transferring
            &lt;code&gt;django.db.models&lt;/code&gt; from Kathmandu to Paris via 3G.
            Despite a fast link, throughput averages 13KiB/sec and never
            exceeds 45KiB/sec. Well over half of the 989 frames sent are wasted
            on signalling (Y=0)
        &lt;/figcaption&gt;
    &lt;/figure&gt;

    &lt;p&gt;&lt;strong&gt;Trying harder&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Over the years I discarded many approaches for handling this latency
    nightmare:&lt;/p&gt;

    &lt;ol&gt;&lt;li&gt;Having the user explicitly configure a module list to deliver upfront
    to new contexts, which sucks and is plainly unmaintainable.&lt;/li&gt;
    &lt;li&gt;Installing a PEP-302 hook in the master in order to observe the import
    graph, which would be technically exciting, but likely to suck horribly due
    to fragility and inevitable interference with real PEP-302 hooks, such as
    &lt;a href="http://www.py2exe.org/"&gt;py2exe&lt;/a&gt;.&lt;/li&gt;
    &lt;li&gt;Observing the import graph caused by a function call in a single context,
    then using it to preload modules in additional contexts. This seems
    workable, except the benefit would only be felt by multiple-child Mitogen
    programs. Single child programs would continue to pay the latency tax.&lt;/li&gt;
    &lt;li&gt;Variants of 2 and 3, except caching the result as intermediate state in the
    master&amp;rsquo;s filesystem. Ignoring the fact &lt;strong&gt;persistent intermediate state is
    always evil&lt;/strong&gt; (a topic for later!), that would require weird and imperfect
    invalidation rules, which means performance would suck during development
    and prototyping, and bugs are possible where state gets silently wedged
    and previously working programs inexplicably slow down.&lt;/li&gt;
    &lt;/ol&gt;&lt;p&gt;Finally last year I settled on using static analysis, and restricting
    preloading at package boundaries. When a dependency is detected in a package
    external to the one being requested, it is not preloaded until the child has
    demonstrated, by requesting the top-level package module from its parent, that
    the child lacks all of the submodules contained by it.&lt;/p&gt;

    &lt;p&gt;This seems like a good rule: preloading can occur aggressively within a
    package, but must otherwise wait for a child to signal a package as missing
    before preemptively wasting time and bandwidth delivering code the child never
    needed.&lt;/p&gt;

    &lt;p&gt;As a final safeguard, preloading is restricted to only modules the master
    itself loaded. It is not sufficient for an &lt;code&gt;import&lt;/code&gt; statement to exist:
    surrounding conditional logic must have caused the module to be loaded by the
    master. In this manner the semantics of platform, version-specific and lazy
    imports are roughly preserved.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Syntax tree hell&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Quite predictably, after attempting to approach the problem with regexes, I
    threw my hands up on realizing a single regex may not handle every possible
    import statement:&lt;/p&gt;

    &lt;ul&gt;&lt;li&gt;&lt;code&gt;import a&lt;/code&gt;&lt;/li&gt;
    &lt;li&gt;&lt;code&gt;import a as b&lt;/code&gt;&lt;/li&gt;
    &lt;li&gt;&lt;code&gt;from a import b&lt;/code&gt;&lt;/li&gt;
    &lt;li&gt;&lt;code&gt;from a import b as c&lt;/code&gt;&lt;/li&gt;
    &lt;li&gt;&lt;code&gt;from a import (b, c, d)&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;&lt;p&gt;I gleefully thought I&amp;rsquo;d finally found a use for the
    &lt;a href="https://docs.python.org/2/library/compiler.html"&gt;compiler&lt;/a&gt; and
    &lt;a href="https://docs.python.org/2/library/ast.html"&gt;ast&lt;/a&gt; modules, and these were the
    obvious alternative to avoiding the rats nest of multiple regexes. Not quite.
    You see, across Python releases the grammar has changed, and in lock-step so
    have the representations exported by the &lt;code&gt;compiler&lt;/code&gt; and &lt;code&gt;ast&lt;/code&gt; modules.&lt;/p&gt;

    &lt;p&gt;Adding insult to injury: neither module is supported through every interesting
    Python version. I have seen Python 2.4 deployed commercially as recently as
    summer 2016, and therefore consider it mandatory for the kind of library I
    want on my toolbelt. To support antique and chic Python alike, it was
    necessary to implement both approaches and select one at runtime. Many might
    see this is an opportunity to drop 2.4, but &lt;em&gt;&amp;ldquo;just upgrade lol&amp;rdquo;&lt;/em&gt; is never a
    good answer while maintaining long shelf-life systems, and should never be a
    a barrier to applying a trusted Swiss Army Knife.&lt;/p&gt;

    &lt;p&gt;After some busy days last September, I had a working scanner built around
    syntax trees, except for a tiny problem: &lt;strong&gt;it was ridiculously slow&lt;/strong&gt;. Parsing
    the 8KiB &lt;code&gt;mitogen.core&lt;/code&gt; module took 12ms on my laptop, which multiplied up is
    over a second of CPU burnt scanning dependencies for a package like Django. If
    memory serves, reality was closer to 3 seconds: far exceeding the latency
    saved while talking to a machine on a LAN.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Sometimes hacking bytecode make perfect sense&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;I couldn&amp;rsquo;t stop groaning the day I abandoned ASTs. As is often true when
    following software industry &lt;em&gt;best practice&lt;/em&gt;, we are left holding a decomposing
    trout that, while technically fulfilling its role, stinks horribly, costs all
    involved a fortune to support and causes pains worse than those it was
    intended to relieve. Still hoping to avoid regexes, I went digging for
    precedent elsewhere in tools dealing with the same problem.&lt;/p&gt;

    &lt;p&gt;That&amp;rsquo;s when I discovered the strange and unloved
    &lt;a href="https://docs.python.org/2/library/modulefinder.html"&gt;modulefinder&lt;/a&gt; buried in
    the standard library, a forgotten relic from a bygone era, seductively
    deposited there as a belated Christmas gift to all, on a gloomy &lt;a href="https://github.com/python/cpython/commit/41c554fbec1be5412aea2b388f0952657a2f07e7"&gt;New Year&amp;rsquo;s
    Eve 2002 by Guido&amp;rsquo;s own
    brother&lt;/a&gt;.
    Diving in, I was shocked and mesmerized to find dependencies synthesized by
    recompiling each module and extracting
    &lt;a href="https://docs.python.org/2/library/dis.html#opcode-IMPORT_FROM"&gt;IMPORT_FROM&lt;/a&gt;
    opcodes from the compiled bytecode. Reimplementing a variant, I was overjoyed
    to discover &lt;code&gt;django.db.models&lt;/code&gt; transitive dependencies enumerated in under
    350ms on my laptop. A workable solution!&lt;/p&gt;

    &lt;p&gt;The solution has some further crazy results: &lt;code&gt;IMPORT_FROM&lt;/code&gt; has barely changed
    since the Python 2.4 days, right through to Python 3.x. The same approach
    works everywhere, including PyPy, which uses the same format, which makes this
    &lt;strong&gt;more portable&lt;/strong&gt; than the &lt;code&gt;ast&lt;/code&gt; and &lt;code&gt;compiler&lt;/code&gt; modules!&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Coping with concurrency&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Now a mechanism exists to enumerate dependencies, we need a mode of delivery.
    The approach used is simplistic, and (as seen later), will likely require
    future improvement.&lt;/p&gt;

    &lt;p&gt;On receiving a
    &lt;a href="http://mitogen.networkgenomics.com/howitworks.html#mitogen.core.GET_MODULE"&gt;GET_MODULE&lt;/a&gt;
    message from a child, a parent (don&amp;rsquo;t forget, Mitogen operates recursively!)
    first tries to satisfy the request from its own cache, before forwarding it
    upwards towards the master. The master sends
    &lt;a href="http://mitogen.networkgenomics.com/howitworks.html#mitogen.core.LOAD_MODULE"&gt;LOAD_MODULE&lt;/a&gt;
    messages for all dependencies known to be missing from the child before
    sending a final message containing the module that was actually requested.
    Since contexts always cache unsolicited &lt;code&gt;LOAD_MODULE&lt;/code&gt; messages from upstream,
    by the time the message arrives for the requested module, many dependencies
    should be in RAM and no further network roundtrips requesting them are required.&lt;/p&gt;

    &lt;p&gt;Meanwhile for each stream connected to any parent, a set of module names ever
    delivered on that stream are recorded. Each parent is allowed to ignore any
    &lt;code&gt;GET_MODULE&lt;/code&gt; for which a corresponding &lt;code&gt;LOAD_MODULE&lt;/code&gt; has already been sent,
    preventing a race between in-flight requests causing the same module to ever
    be sent twice.&lt;/p&gt;

    &lt;p&gt;This places the onus on downstream contexts to ensure the single &lt;code&gt;LOAD_MODULE&lt;/code&gt;
    message received for each distinct module always reaches every interested
    party. In short, &lt;code&gt;GET_MODULE&lt;/code&gt; messages must be deduplicated and synchronized
    not only for any arriving from a context&amp;rsquo;s children, but also from its own
    threads.&lt;/p&gt;

    &lt;p&gt;&lt;a href="http://mitogen.networkgenomics.com/howitworks.html#concurrency"&gt;Some further gory details are in the docs&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Pretty pictures&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;And finally the result. For my test script, the total number of roundtrips
    dropped from 166 to 13, one of which is for the script itself, and 3 negative
    requests for extension modules that cannot be transferred. That leaves, bugs
    aside, &lt;strong&gt;9 roundtrips&lt;/strong&gt; to transfer the most obscene dependency I could think
    of.&lt;/p&gt;

    &lt;p&gt;One more look at the library&amp;rsquo;s network profile. Over the same connection as
    previously, the situation has improved immensely:&lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito2/django-3g-kathmandu-paris-rtt.svg" width="100%"&gt;
        &lt;figcaption&gt;
            570 packets sending Django 7,231km over 3G in 16.47 seconds.
            Throughput averages 38.6KiB/sec and peaks at 2.7MiB/sec
        &lt;/figcaption&gt;
    &lt;/figure&gt;

    &lt;p&gt;Not only is performance up, but the number of frames transmitted has dropped
    by 42%. That&amp;rsquo;s a 42% fewer changes of connection hang due to crappy WiFi!
    &lt;/p&gt;&lt;p&gt;One final detail is visible: around the 10 second mark, a tall column
    of frames is sent with progressively increasing size, almost in the same
    instant. This is not some bug, it is &lt;a href="https://en.wikipedia.org/wiki/Path_MTU_Discovery"&gt;Path MTU
    Discovery&lt;/a&gt; (PMTUD) in
    action. PMTUD is a mechanism by which IP subprotocols can learn the maximum
    frame size tolerated by the path between communicating peers, which in turn
    maximizes link efficiency by minimizing bandwidth wasted on headers. The size
    is ramped up until either loss occurs or an intermediary signals error via
    &lt;a href="https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol#Destination_unreachable"&gt;ICMP&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;Just like the network path, PMTUD is dynamic and must restart on any signal
    indicating network conditions have changed. Comparing this graph with the
    previous, we see one final improvement as a result of providing the network
    layer enough data to do its job: PMTUD appears restart much less frequently,
    and the stream is pegged at the true path MTU for much longer.&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Futures&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Aside from simple fixes to reduce wasted roundtrips for extension modules that
    can&amp;rsquo;t be imported, and optional imports of top-level packages that don&amp;rsquo;t exist
    on the master, there are two major niggles remaining in how import works
    today.&lt;/p&gt;

    &lt;p&gt;The first is an irritating source of latency present in deep trees: currently
    it is impossible for intermediary nodes satisfying
    &lt;a href="http://mitogen.networkgenomics.com/howitworks.html#mitogen.core.GET_MODULE"&gt;GET_MODULE&lt;/a&gt;
    requests for children to streamily send preloaded modules towards a child
    until the final
    &lt;a href="http://mitogen.networkgenomics.com/howitworks.html#mitogen.core.LOAD_MODULE"&gt;LOAD_MODULE&lt;/a&gt;
    arrives at the intermediary for the module actually requested by the child.
    That means preloading is artificially serialized at each layer in the tree,
    when a better design would allow it to progress concurrent to the
    &lt;code&gt;LOAD_MODULE&lt;/code&gt; messages still in-flight from the master.&lt;/p&gt;

    &lt;p&gt;This will present itself when doing multi-machine hops where links between the
    machines are slow or suffer high latency. It will also be important to fix
    before handling hundreds to thousands of children, such as should become
    practical once asynchronous connect() is implemented.&lt;/p&gt;

    &lt;p&gt;There are various approaches to tweaking the design so that concurrency is
    restored, but I would like to let the paint dry a little on the new
    implementation before destablizing it again.&lt;/p&gt;

    &lt;p&gt;The second major issue is almost certainly a bug waiting to be discovered, but
    I&amp;rsquo;m out of energy to attack it right now. It relates to complex situations
    where many children have different functions invoked in them, from a complex
    set of overlapping packages. In such cases, it is possible that a
    &lt;code&gt;LOAD_MODULE&lt;/code&gt; for an unrelated &lt;code&gt;GET_MODULE&lt;/code&gt; prematurely delivers the final
    module from another import, before it has had all requisite modules preloaded
    into the child.&lt;/p&gt;

    &lt;p&gt;To fix that, the library must ensure the tree of dependencies for all module
    requests are sent downstream depth-first, i.e. it is never possible for any
    module to appear in a &lt;code&gt;LOAD_MODULE&lt;/code&gt; before all of its dependencies have first.&lt;/p&gt;

    &lt;p&gt;Finally there are latency sources buried elsewhere in the library, including
    at least 2 needless roundtrips during connection setup. Fighting latency is an
    endless war, but with module loading working efficiently, the most important
    battle is over.&lt;/p&gt;

    &lt;style&gt;
     .caption img {
        margin: 0 !important;
     }
    figure: {
      padding: 4px;
      padding-bottom: 12px;
    }
    &lt;/style&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Tue, 13 Feb 2018 21:35:32 +0000</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2018-02-13:/2018-02-13-much-ado-about-latency-mitogen-and-the-bfg9000-of.html</guid></item><item><title>Mitogen, an infrastructure code baseline that sucks less</title><link>https://sweetness.hmmz.org/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html</link><description>
    &lt;img src="/images/mito1/mitogen.svg" class="mitogen-right-180 mitogen-logo-wrap"&gt;

    &lt;p&gt;
    After many years of occasional commitment, I'm finally getting close to a solid
    implementation of a module I've been wishing existed for over a decade: given a
    remote machine and an SSH connection, just magically make Python code run on
    that machine, with no hacks involving error-prone shell snippets, temporary
    files, or hugely restrictive single use request-response shell pipelines, and
    suchlike.
    &lt;/p&gt;

    &lt;p&gt;
    I'm borrowing some biology terminology and calling it &lt;a href="https://mitogen.networkgenomics.com/"&gt;Mitogen&lt;/a&gt;, as that's pretty much
    what the library does. Apply some to your program, and it magically becomes
    able to recursively split into self-replicating parts, with bidirectional
    communication and message routing between all the pieces, without any external
    assistance beyond an SSH client and/or sudo installation.
    &lt;/p&gt;

    &lt;p&gt;
    Mitogen's goal is straightforward: &lt;strong&gt;make it childsplay to run Python
    code on remote machines&lt;/strong&gt;, eventually regardless of connection method,
    without being forced to leave the rich and error-resistant joy that is a
    pure-Python environment. My target users would be applications like
    &lt;strong&gt;Ansible&lt;/strong&gt;, &lt;strong&gt;Salt&lt;/strong&gt;, &lt;strong&gt;Fabric&lt;/strong&gt; and
    similar who (through no fault of their own) are universally forced to resort to
    obscene hacks in their implementations to affect a similar result. Mitogen may
    also be of interest to would-be authors of pure Python Internet worms, although
    support for autonomous child contexts is currently (and intentionally) absent.
    &lt;/p&gt;

    &lt;p&gt;
    Because I want this tool to be useful to infrastructure folk, &lt;strong&gt;Mitogen
    does not require free disk space on the remote machines, or even a writeable
    filesystem -- everything is done entirely in RAM&lt;/strong&gt;, making it possible
    to run your infrastructure code against a damaged machine, for example to
    implement a repair process. Newly spawned Python interpreters have import hooks
    and logging handlers configured so that everything is fetched or forwarded over
    the network, and the only disk accesses necessary are those required to start a
    remote interpreter.

    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Recursion&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Mitogen can be used recursively: newly started child contexts can in turn be
    used to run portions of itself to start children-of-children, with message
    routing between all contexts handled automatically. Recursion is used to allow
    first SSHing to a machine before sudoing to a new account, all with the user's
    Python code retaining full control of each new context, and executing code in
    them transparently, as easily as if no SSH or sudo connection were involved at
    all. The master context is able to control and manipulate children created in
    this way as easily as if they were directly connected, the API remains the
    same.
    &lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito1/route.png" width="683"&gt;
    &lt;/figure&gt;

    &lt;p&gt;
    Currently there exists just two connection methods: &lt;strong&gt;ssh&lt;/strong&gt; and
    &lt;strong&gt;sudo&lt;/strong&gt;, with the sudo support able to cope with typing passwords
    interactively, and crap configurations that have &lt;code&gt;requiretty&lt;/code&gt;
    enabled.

    &lt;p&gt;
    I am explicitly planning to support Windows, either via WMI, psexec, or
    Powershell Remoting. As for other more exotic connection methods, I might
    eventually implement bootstrap over an IPMI serial console connection if for
    nothing else then as a demonstrator of how far this approach can be taken, but
    the ability to use the same code to manage a machine with or without a
    functional networking configuration would be in itself a very powerful feature.
    &lt;/p&gt;
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;This looks a bit like X. Isn't this just X?&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Mitogen is far from the first Python library to support remote bootstrapping,
    but it may be the first to specifically target infrastructure code, minimal
    networking footprint, read-only filesystems, stdio and logging redirection,
    cross-child communication, and recursive operation. Notable similar packages
    include &lt;a href="https://pythonhosted.org/Pyro4/"&gt;Pyro&lt;/a&gt; and &lt;a href="http://codespeak.net/execnet/"&gt;py.execnet&lt;/a&gt;.

    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;This looks a bit like Fabric. Isn't this just Fabric?&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    Fabric's API feels kinda similar to what Mitogen offers, but it fundamentally
    operates in terms of chunks of shell snippets to implement all its
    functionality. You can't easily (at least, as far as I know) trick Fabric into
    running your Python code remotely, or for that matter recursively across
    subsequent sudo and SSH connections, and arrange for that code to communicate
    bidirectionally with code running in the local process and autonomously between
    any spawned children.
    &lt;/p&gt;

    &lt;p&gt;
    Mitogen internally reuses this support for bidirectional communication to
    implement some pretty exciting functionality:
    &lt;/p&gt;


    &lt;p&gt;&lt;strong&gt;SSH Client Emulation&lt;/strong&gt;&lt;/p&gt;

    &lt;img style="float: right; margin-left: 16px; width: 298px;" src="/images/mito1/fakessh.png"&gt;

    So your program has an elaborate series of tunnels setup, and it's running code
    all over the place. You hit a problem, and suddenly feel the temptation to drop
    back to raw shell and SSH again: "&lt;em&gt;I just need to sync some files!&lt;/em&gt;",
    you tell yourself, before loudly groaning on realizing the spaghetti of
    duplicated tunnel configurations that would be required to get
    &lt;code&gt;rsync&lt;/code&gt; running the same way as your program. What's more, you
    realize that you can't even use &lt;code&gt;rsync&lt;/code&gt;, because you're relying on
    Mitogen's ability to run code over &lt;code&gt;sudo&lt;/code&gt; with
    &lt;code&gt;requiretty&lt;/code&gt; enabled, and you can't even directly log into that
    target account.
    &lt;/p&gt;

    &lt;p&gt;
    Not a problem: Mitogen supports running local commands with a modified
    environment that causes their attempt to use SSH to run remote command lines to
    be redirected into Mitogen, and tunnelled over your program's existing tunnels.
    No duplicate configuration, no wasted SSH connections, no 3-way handshake
    latency.
    &lt;/p&gt;

    &lt;p&gt;
    The primary goal of the SSH emulator to simplify porting existing
    infrastructure scripts away from shell, including those already written in
    Python. As a first concrete target for Mitogen, I aim to retrofit it to Ansible
    as a connection plug-in, where this functionality becomes necessary to support
    e.g. Ansible's &lt;code&gt;synchronize&lt;/code&gt; module.


    &lt;br clear="all"&gt;&lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;Compared To Ansible&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    To understand the value of Mitogen, a short comparison against Ansible may be
    useful. I created an Ansible playbook talking to a VMWare Fusion Ubuntu
    machine, with SSH pipelining enabled (the current best performance mode in
    Ansible). The playbook simply executes &lt;code&gt;/bin/true&lt;/code&gt; with
    &lt;code&gt;become: true&lt;/code&gt; and discards the result 100 times.
    &lt;/p&gt;

    &lt;p&gt;
    I then created an &lt;a href="https://gist.github.com/dw/3439f9f3c9c8f275639770e93d3c1a89#file-mito-py"&gt;equivalent script written against Mitogen&lt;/a&gt;, using its SSH and
    sudo functionality, and finally a &lt;a href="https://gist.github.com/dw/3439f9f3c9c8f275639770e93d3c1a89#file-mito2-py"&gt;trivial change to the Mitogen variant that
    executes the control loop on the target machine&lt;/a&gt;. In terms of architecture,
    the first Mitogen script is closer to a fair comparison to Ansible's control
    flow, but the latter is a good example of the kind of intelligence Mitogen
    enables that would be messy, if not close to impossible with Ansible's existing
    architecture.
    &lt;/p&gt;

    &lt;p&gt;
    &lt;em&gt;[Side note: this is comparing performance characteristics only, in
    particular I am not advocating writing code against Mitogen directly! It's
    possible, but you get none of the ease of use that a tool like Ansible
    provides. On saying that, though, a Mitogen-enabled tool composed of tens of
    modules would have similar performance to the numbers below, just a slightly
    increased base cost due to initial module upload]&lt;/em&gt;
    &lt;/p&gt;

    &lt;p&gt;
    &lt;/p&gt;&lt;table class="tbl"&gt;&lt;tr&gt;&lt;th&gt;Method
        &lt;/th&gt;&lt;th&gt;
            &lt;abbr title="Wire bytes transferred from host computer to target computer"&gt;Bytes A&amp;rarr;B&lt;/abbr&gt;
        &lt;/th&gt;&lt;th&gt;
            &lt;abbr title="Wire bytes transferred to host computer from target computer"&gt;Bytes B&amp;rarr;A&lt;/abbr&gt;

        &lt;/th&gt;&lt;th&gt;
            &lt;abbr title="Wire packets transferred from host computer to target computer"&gt;Packets A&amp;rarr;B&lt;/abbr&gt;

        &lt;/th&gt;&lt;th&gt;
            &lt;abbr title="Wire packets transferred to host computer from target computer"&gt;Packets B&amp;rarr;A&lt;/abbr&gt;

        &lt;/th&gt;&lt;th&gt;
            &lt;abbr title="The duration of the TCP connection as measured on the wire"&gt;Duration (ms)&lt;/abbr&gt;

        &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Ansible default&lt;/strong&gt;
        &lt;/td&gt;&lt;td align="right"&gt;5,001,352
        &lt;/td&gt;&lt;td align="right"&gt;486,500

        &lt;/td&gt;&lt;td align="right"&gt;8,864
        &lt;/td&gt;&lt;td align="right"&gt;4,460

        &lt;/td&gt;&lt;td align="right" style="color: red"&gt;55,065


        &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Ansible pipelining&lt;/strong&gt;
        &lt;/td&gt;&lt;td align="right"&gt;4,562,905
        &lt;/td&gt;&lt;td align="right"&gt;178,622

        &lt;/td&gt;&lt;td align="right"&gt;4,282
        &lt;/td&gt;&lt;td align="right"&gt;2,033

        &lt;/td&gt;&lt;td align="right"&gt;25,643

        &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Mitogen local loop&lt;/strong&gt;

        &lt;/td&gt;&lt;td align="right"&gt;45,847
        &lt;/td&gt;&lt;td align="right"&gt;17,982

        &lt;/td&gt;&lt;td align="right"&gt;247
        &lt;/td&gt;&lt;td align="right"&gt;135
        &lt;/td&gt;&lt;td align="right"&gt;1,245

        &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Mitogen remote loop&lt;/strong&gt;

        &lt;/td&gt;&lt;td align="right"&gt;22,511
        &lt;/td&gt;&lt;td align="right"&gt;5,766

        &lt;/td&gt;&lt;td align="right"&gt;51
        &lt;/td&gt;&lt;td align="right"&gt;39
        &lt;/td&gt;&lt;td align="right" style="color: #007f00"&gt;784
    &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p&gt;
    The first and most obvious property of Ansible is that it uses a
    &lt;strong&gt;metric crap-ton of bandwidth&lt;/strong&gt;, averaging &lt;strong&gt;45kb of data
    for each run of /bin/true&lt;/strong&gt;. In comparison, the raw command line
    "&lt;code&gt;ssh host /bin/true&lt;/code&gt;" generates only 4.7kb and 311ms,
    including SSH connection setup and teardown.
    &lt;/p&gt;

    &lt;p&gt;
    Bandwidth aside, CPU alone cannot account
    for runtime duration, clearly significant roundtrips are involved,
    &lt;strong&gt;generating sufficient latency to become visible on an in-memory
    connection&lt;/strong&gt; to a local VM. Why is that? Things are about to get real
    ugly, and I'm already starting to feel myself getting depressed. Remember those
    obscene hacks I mentioned earlier? Well, buckle your seatbelt Dorothy, because
    Kansas is going bye-bye..

    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;The Ugly&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    &lt;em&gt;[Side note: the name Ansible is borrowed from Ender's Game, where it refers
    to a faster-than-light communication technology. Giggles]&lt;/em&gt;
    &lt;/p&gt;

    &lt;img style="float: right; margin-left: 16px;" alt="Ignorance is bliss" title="Ignorance is bliss" src="/images/mito1/cypher-bliss.jpg"&gt;

    When you write some code in Ansible, like &lt;code&gt;shell: /bin/true&lt;/code&gt;, you
    are telling Ansible (in most cases) that you want to execute a module named
    &lt;code&gt;shell.py&lt;/code&gt; on the target machine, passing &lt;code&gt;/bin/true&lt;/code&gt; as
    its argument.
    &lt;/p&gt;

    &lt;p&gt;
    So far, so logical. But how is Ansible actually running &lt;code&gt;shell.py&lt;/code&gt;?
    "Simple", by default (no pipelining) it looks like this:
    &lt;/p&gt;

    &lt;ol&gt;&lt;li&gt;First it scans &lt;code&gt;shell.py&lt;/code&gt; for every module dependency,
    &lt;/li&gt;&lt;li&gt;then it adds the module and all dependents into an in-memory ZIP file,
        alongside a file containing the module's serialized arguments,
    &lt;/li&gt;&lt;li&gt;then it base64-encodes this ZIP file and mixes it into a templatized self-extracting Python script (&lt;code&gt;module_common.py&lt;/code&gt;),
    &lt;/li&gt;&lt;li&gt;then it writes the templatized script to the local filesystem, where it can be accessed by &lt;code&gt;sftp&lt;/code&gt;,
    &lt;/li&gt;&lt;li&gt;then it uploads the script to the target machine:
        &lt;ol&gt;&lt;li&gt;first it runs a &lt;a href="https://gist.github.com/dw/b3eaf0b664f1d0094816b8e4397fe589#file-step1-sh"&gt;fairly simple bash snippet over SSH to find the user's home directory&lt;/a&gt;,
        &lt;/li&gt;&lt;li&gt;then it runs a &lt;a href="https://gist.github.com/dw/b3eaf0b664f1d0094816b8e4397fe589#file-step2-sh"&gt;bigger bash snippet to create a temporary directory&lt;/a&gt; in the user's home directory in which to write the templatized script,
        &lt;/li&gt;&lt;li&gt;then it &lt;a href="https://gist.github.com/dw/b3eaf0b664f1d0094816b8e4397fe589#file-step3-sh"&gt;starts an sftp session&lt;/a&gt; and uses it to write the templatized script to the new temporary directory,
        &lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;then it &lt;a href="https://gist.github.com/dw/b3eaf0b664f1d0094816b8e4397fe589#file-step4-sh"&gt;runs another snippet over SSH to mark the script executable&lt;/a&gt;,
    &lt;/li&gt;&lt;li&gt;then it wraps a &lt;a href="https://gist.github.com/dw/b3eaf0b664f1d0094816b8e4397fe589#file-step5-sh"&gt;snippet to execute the templatized script using an obscene layer of quoting&lt;/a&gt; (16 quotes!!!) and passes it to sudo,
    &lt;/li&gt;&lt;li&gt;finally the templatized script runs:
        &lt;ol&gt;&lt;li&gt;first it creates yet another temporary directory on the target machine, this time using the &lt;code&gt;tempfile&lt;/code&gt; module,
        &lt;/li&gt;&lt;li&gt;then it writes a base64-decoded copy of the embedded ZIP file as &lt;code&gt;ansible_modlib.zip&lt;/code&gt; into that directory,
        &lt;/li&gt;&lt;li&gt;then it opens the newly written ZIP file using the &lt;code&gt;zipfile&lt;/code&gt; module and extracts the module to be executed into the same temporary directory, named like &lt;code&gt;ansible_mod_&amp;lt;modname&amp;gt;.py&lt;/code&gt;,
        &lt;/li&gt;&lt;li&gt;then it &lt;strong&gt;opens the newly written ZIP file in append mode&lt;/strong&gt; and writes a custom &lt;code&gt;sitecustomize.py&lt;/code&gt; module into it, causing the ZIP file to be written to disk for a second time on this machine, and a third time in total,
        &lt;/li&gt;&lt;li&gt;then it uses the &lt;code&gt;subprocess&lt;/code&gt; module to execute the extracted script, with &lt;code&gt;PYTHONPATH&lt;/code&gt; set to cause Python's ZIP importer to search for additional dependent modules inside the extracted-and-modified ZIP file,
        &lt;/li&gt;&lt;li&gt;then it uses the &lt;code&gt;shutil&lt;/code&gt; module to delete the second temporary directory,
        &lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;then the shell snippet that executed the templatized script is used to run &lt;code&gt;rm -rf&lt;/code&gt; over the first temporary directory.
    &lt;/li&gt;&lt;/ol&gt;&lt;p&gt;
    When pipelining is disabled, which is the default, and required for cases where
    &lt;code&gt;sudo&lt;/code&gt; has &lt;code&gt;requiretty&lt;/code&gt; enabled, these steps (and
    their associated network roundtrips) &lt;strong&gt;recur for every single playbook
    step&lt;/strong&gt;. And now you know why Ansible makes execution over a local 1Gbit
    LAN feel like it's communicating with a host on Mars.
    &lt;/p&gt;

    &lt;p&gt;
    Need a breath? Don't worry, things are about to get better. Here are some
    pretty graphs to look at while you're recovering..
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;The Ugly (from your network's perspective)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    This shows Ansible's pipelining mode, constantly reuploading the same huge
    data part and awaiting a response for each run. Be sure to note the sequence
    numbers (transmit byte count) and the scale of the time axis:
    &lt;/p&gt;

    &lt;p&gt;
    &lt;figure&gt;
        &lt;img src="/images/mito1/rev-ansible-stevens-1.svg" width="100%"&gt;
    &lt;/figure&gt;

    &lt;p&gt;
    Now for Mitogen, demonstrating vastly more conservative use of the network:
    &lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito1/rev-mito1-stevens-1.svg" width="100%"&gt;
    &lt;/figure&gt;

    &lt;p&gt;
    The SSH connection setup is clearly visible in this graph, accounting for
    about the first 300ms on the time axis. Additional excessive roundtrips are
    visible as Mitogen waits for its command-line to signal successful first stage
    bootstrap before uploading the main implementation, and 2 subsequent roundtrips
    first to fetch &lt;code&gt;mitogen.sudo&lt;/code&gt; module followed by the
    &lt;code&gt;mitogen.master&lt;/code&gt; module. Eliminating module import roundtrips like
    these will probably be an ongoing battle, but there is a clean 80% solution
    that would apply in this specific case I just haven't gotten around to
    implementing yet.
    &lt;/p&gt;

    &lt;p&gt;
    The fine curve representing repeated executions of &lt;code&gt;/bin/true&lt;/code&gt; is
    also visible: each bump in the curve is equivalent to Ansible's huge data
    uploads from earlier, but since Mitogen caches code in RAM remotely, unlike
    Ansible it doesn't need to reupload everything for each call, or start a new
    Python process, or rewrite a ZIP file on disk, or .. etc.
    &lt;/p&gt;

    &lt;p&gt;
    Finally one last graph, showing Mitogen with the execution loop moved to the
    remote machine. All the latency induced by repeatedly invoking
    &lt;code&gt;/bin/true&lt;/code&gt; from the local machine has disappeared.
    &lt;/p&gt;

    &lt;figure&gt;
        &lt;img src="/images/mito1/rev-mito2-stevens-1.svg" width="100%"&gt;
    &lt;/figure&gt;

    &lt;p&gt;&lt;strong&gt;The Less Ugly&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    &lt;strong&gt;Ansible's pipelining mode is much better&lt;/strong&gt;, and somewhat
    resembles Mitogen's own bootstrap process. Here the templatized initial script
    is fed directly into the target Python interpreter, however they immediately
    deviate since Ansible starts by extracting the embedded ZIP file per step 8
    above, and &lt;strong&gt;discarding all the code it uploaded once the playbook step
    completes&lt;/strong&gt;, with no effort made to preserve either the Python processes
    spawned, or the significant amount of uploaded module code for each step.
    &lt;/p&gt;

    &lt;p&gt;
    Pipelining mode is a huge improvement, however it still suffers from making use
    of the SSH stdio pipeline only once (which was expensive to setup, even with
    multiplexing enabled), the destination Python interpreter only once (usually
    ~100ms+ per invocation), and as mentioned repeatedly, no caching of code in the
    target, not even on disk.
    &lt;/p&gt;

    &lt;p&gt;
    When Mitogen is executing your Python function:
    &lt;/p&gt;&lt;ol&gt;&lt;li&gt;it executes SSH with a single Python command-line,
    &lt;/li&gt;&lt;li&gt;then it waits for that command-line to report &lt;code&gt;"EC0"&lt;/code&gt; on stdout,
    &lt;/li&gt;&lt;li&gt;then it writes a copy of itself over the SSH pipe,
        &lt;ol&gt;&lt;li&gt;meanwhile the remote Python interpreter forks into two processes,
        &lt;/li&gt;&lt;li&gt;the first re-execs itself to clear the huge Python command-line passed
        over SSH, and resets &lt;code&gt;argv[0]&lt;/code&gt; to something descriptive,
        &lt;/li&gt;&lt;li&gt;the second signals &lt;code&gt;"EC0"&lt;/code&gt; and waits for the parent context
        to send 7KiB worth of Mitogen source, which it decompresses and feeds to
        the first before exitting,
        &lt;/li&gt;&lt;li&gt;the Mitogen source reconfigures the Python module importer, stdio, and
        logging framework to point back into itself, then starts a private
        multiplexer thread,
        &lt;/li&gt;&lt;li&gt;the main thread writes &lt;code&gt;"EC1"&lt;/code&gt; then sleeps waiting for &lt;code&gt;CALL_FUNCTION&lt;/code&gt; messages,
        &lt;/li&gt;&lt;li&gt;meanwhile the multiplexer routes messages between this context's main
        thread, the parent, and any child contexts, and waits for something to
        trigger shutdown.
        &lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;then it waits for the remote process to report &lt;code&gt;"EC1"&lt;/code&gt;,
    &lt;/li&gt;&lt;li&gt;then it writes a &lt;code&gt;CALL_FUNCTION&lt;/code&gt; message which includes the
        target module, class, and function name and parameters,
        &lt;ol&gt;&lt;li&gt;the slave receives the &lt;code&gt;CALL_FUNCTION&lt;/code&gt; message and begins
        execution, satisfying in-RAM module imports using the connection to the
        parent context as necessary.
        &lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;/ol&gt;

    &lt;p&gt;
    On subsequent invocations of your Python function, or other functions from the
    same module, only steps 3.6, 5, and 5.1 are necessary.
    &lt;/p&gt;

    &lt;p&gt;&lt;strong&gt;This all sounds fine and dandy, but how can I use it?&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    I'm working on it! For now my goal is to implement enough functionality so that
    &lt;strong&gt;Mitogen can be made to work with Ansible's process model&lt;/strong&gt;. The
    first problem is that Ansible runs playbooks using multiple local processes,
    and has no subprocess&amp;lt;-&amp;gt;host affinity, so it is not immediately possible
    to cache Mitogen's state for a host. I have a solid plan for solving that, but
    it's not yet implemented.
    &lt;/p&gt;

    &lt;p&gt;
    There are a huge variety of things I haven't started yet, but will eventually
    be needed for more complex setups:
    &lt;/p&gt;&lt;ul&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Getting Started Documentation&lt;/strong&gt;:
        &lt;a href="https://mitogen.networkgenomics.com/getting_started.html"&gt;it's missing&lt;/a&gt;.
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Asynchronous connect()&lt;/strong&gt;: so large numbers of contexts can
        be spawned in reasonable time. For, say, 3 tiers targeting a 1,500 node
        network connecting in 30 seconds or so: a per-rack tier connecting to 38-42
        end nodes, a per-quadrant tier connecting to 10 or so racks, a single box
        in the datacentre tier for access to a management LAN, reducing latency and
        caching uploaded modules within a datacenter's network, and the top-level
        tier which is the master program itself.
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Better Bootstrap, Module Caching And Prefetching&lt;/strong&gt;:
        currently Mitogen is wasting network roundtrips in various places. This
        makes me lose sleep.
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;General Robustness&lt;/strong&gt;: no doubt with real-world use, many
        edge cases, crashes, hangs, races and suchlike will be be discovered. Of
        those, I'm most concerned with ensuring the master process never hangs
        with CTRL+C or &lt;code&gt;SIGTERM&lt;/code&gt;, and in the case of master disconnect,
        orphaned contexts completely shut down 100% of the time, even if their
        main thread has hung.
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Better Connection Types&lt;/strong&gt;: it should at least support SSH
        connection setup over a transparently forwarded TCP connection (e.g. via a
        bastion host), so that key material never leaves the master machine.
        Additionally I haven't even started on Windows support yet.
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Security Audit&lt;/strong&gt;: currently the package is using cPickle
        with a highly restrictive class whitelist. I still think it should be
        possible to use this safely, but I'm not yet satisfied this is true. I'd
        also like it to optionally use JSON if the target Python version is modern
        enough. Additionally some design tweaks are needed to ensure a compromised
        slave cannot use Mitogen to cross-infect neighbouring nodes.
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Richer Primitives&lt;/strong&gt;: I've spent so much effort keeping the
        core of Mitogen compact that overall design has suffered, and while almost
        anything is possible using the base code, often it involves scrobbling
        around in the internal plumbing to get things working. Specifically I'd
        like to make it possible to pass &lt;code&gt;Context&lt;/code&gt; handles as RPC
        parameters, and generalise the &lt;code&gt;fakessh&lt;/code&gt; code so that it can
        handle other kinds of forwarding (e.g. TCP connections, additional UNIX
        pipe scenarios).
    &lt;/p&gt;&lt;/li&gt;&lt;li&gt;
        &lt;p&gt;
        &lt;strong&gt;Tests&lt;/strong&gt;. The big one: I've only started to think about tests
        recently as the design has settled, but so much system-level trickery is
        employed, always spread out across at least 2 processes, that an effective
        test strategy is so far elusive. Logical tests don't capture any of the
        complex OS/IO ordering behaviour, and while typical integration tests would
        capture that, they are too coarse to rely on for catching new bugs quickly
        and with strong specificity.
    &lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Why are you writing about this now?&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;
    If you read this far, there's a good chance you either work in infrastructure
    tooling, or were so badly burned by your experience there that you moved into
    management. Either way, you might be the person who could help me spend more
    time on this project. Perhaps you are on a 10-person team with a budget, where
    30% of the man-hours are being wasted on Ansible's connection latency? If so,
    you should definitely &lt;a href="https://sweetness.hmmz.org/pages/contact/"&gt;drop me an e-mail&lt;/a&gt;.
    &lt;/p&gt;

    &lt;p&gt;
    The problem with projects like this is that it is almost impossible to justify
    commercially, it is much closer to research than product, and nobody ever wants
    to pay for that. However, that phase is over, the base implementation looks
    clean and feels increasingly solid, my development tasks are becoming
    increasingly target-driven, and I'd love the privilege to polish up what I
    have, to make contemporary devops tooling a significantly less depressing
    experience for everyone involved.
    &lt;/p&gt;

    &lt;p&gt;
    If you merely made it to the bottom of the article because you're interested or
    have related ideas, please drop me an e-mail. It's not quite ready for the
    prime time, but things work more than sufficiently that early experiementation
    is probably welcome at this point.
    &lt;/p&gt;

    &lt;p&gt;
    Meanwhile I will continue aiming to make it suitable for use with Ansible, or
    perhaps a gentle fork of Ansible, since its internal layering isn't the
    greatest.
     &lt;/p&gt;
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">dw</dc:creator><pubDate>Fri, 15 Sep 2017 15:36:52 +0100</pubDate><guid isPermaLink="false">tag:sweetness.hmmz.org,2017-09-15:/2017-09-15-mitogen-an-infrastructure-code-baseline-that.html</guid></item></channel></rss>