Processing math: 2%

The Book of Gehn

I found 8 posts.


Diff Between Data Frames for Testing

Tags: python, dataframe, pandas

October 1, 2022

Let’s say that we want to compare two Pandas’ dataframes for unit testing.

One is the expected dataframe, crafted by us and it will be the source of truth for the test.

The other is the obtained dataframe that is the result of the experiment that we want to check.

Doing a naive comparison will not work: first we may want to tolerate some minor differences due computation imprecision; and second, and most important, we don’t want to know just if the dataframes are different or not

We want to know where are the differences.

Knowing exactly what is different makes the debugging much easier – trying to figure out which column in which row there is a different by hand is not fun.


Lessons Learnt Optimizing Pyte

Tags: python, pyte, terminal, optimization, performance

July 17, 2022

Few thoughts about Python code optimization and benchmarking for pyte and summarized here.


Sparse Aware Optimizations for Terminal Emulator Pyte

Tags: python, pyte, byexample, optimization, performance

July 15, 2022

byexample is a tool that reads snippets of code from your documentation, executes them and compares the obtained results with the expected ones, from your docs too.

If a mismatch happen we say that the example in your documentation failed that could mean one of two things:

  • your code (the snippet) does not do what you expect so it has a bug
  • or the code does exactly what it is supposed but you forgot to update your doc.

Very useful for testing and keep your docs in sync!

But byexample does not really execute anything by itself. Having to code an interpreter for Ruby, Java, C++ and others would be insane.

Instead, byexample sends the snippets of code toa standard interpreter like IRB for Ruby or cling for C++.

Interpreting the output from they is not always trivial.

When a interpreter prints to the terminal, it may write special escape/control sequences, invisible to human eyes, but interpreted by the terminal.

That’s how IRB can tell your terminal to output something with reds and blues colors.

That’s how byexample’s +term=ansi is implemented.

byexample has no idea of what the hell those control sequences are and relays on a terminal emulator: pyte

byexample sends the snippets to the correct interpreter and its output feeds pyte.Screen. It is the plain text from the emulated terminal what byexample uses to compare with the expected output from the example.

But pyte may take seconds to process a single output so byexample never enabled it by default.

This post describes the why and how of the optimizations contributed to pyte to go from seconds to microseconds.


TL;DR Screen Optimizations Results for Terminal Emulator Pyte

Tags: python, pyte, byexample, optimization, performance, tldr, tl;dr

July 14, 2022

This post describes to some level of detail all the performance boosts and speedups due the optimizations contributed to pyte and summarized here

For large geometries (240x800, 2400x8000), Screen.display runs orders of magnitud faster and consumes between 1.10 and 50.0 times less memory.

For smaller geometries the minimum improvement was of 2 times faster.

Stream.feed is now between 1.10 and 7.30 times faster and if Screen is tuned, the speedup is between 1.14 and 12.0.

For memory usage, Stream.feed is between 1.10 and 17.0 times lighter and up to 44.0 times lighter if Screen is tuned.

Screen.reset is between 1.10 and 1.50 slower but several cases improve if the Screen is tuned (but not all).

However there are a few regressions, most of them small but some up to 4 times.

At the moment of writing this post, the PR is still pending to review.


Multiprocessing Spawn of Dynamically Imported Code

Tags: python

March 6, 2022

The following snippet loads any Python module in the ./plugins/ folder.

This is the Python 3.x way to load code dynamically.

>>> def load_modules():
...     dirnames = ["plugins/"]
...     loaded = []
...
...     # For each plugin folder, see which Python files are there
...     # and load them
...     for importer, name, is_pkg in pkgutil.iter_modules(dirnames):
...
...         # Find and load the Python module
...         spec = importer.find_spec(name)
...         module = importlib.util.module_from_spec(spec)
...         spec.loader.exec_module(module)
...         loaded.append(module)
...
...         # Add the loaded module to sys.module so it can be
...         # found by pickle
...         sys.modules[name] = module
...
...     return loaded

The loaded modules work as any other Python module. In a plugin system you typically will lookup for a specific function or a class that will serve as entry point or hooks for the plugin.

For example, in byexample the plugins must define classes that inherit from ExampleFinder, ExampleParser, ExampleRunner or Concern. These extend byexample functionality to find, parse and run examples in different languages and hook –via Concern– most of the execution logic.

Imagine now that one of the plugins implements a function exec_bg that needs to be executed in background, in a separated Python process.

We could do something like:

>>> loaded = load_modules() # loading the plugins
>>> mod = loaded[0] # pick the first, this is just an example

>>> target = getattr(mod, 'exec_bg')  # lookup plugin's exec_bg function

>>> import multiprocessing
>>> proc = multiprocessing.Process(target=target)
>>> proc.start()    # run exec_bg in a separated process
>>> proc.join()

This is plain simple use of multiprocessing…. and it will not work.

Well, it will work in Linux but not in MacOS or Windows.

In this post I will show why it will not work for dynamically loaded code (like from a plugin) and how to fix it.


Fresh Python Defaults

Tags: Python

August 14, 2021

When defining a Python function you can define the default value of its parameters.

The defaults are evaluated once and bound to the function’s signature.

That means that mutable defaults are a bad idea: if you modify them in a call, the modification will persist cross calls because for Python its is the same object.

>>> def foo(a, b='x', c=[]):
...     b += '!'
...     c += [2]
...     print(f"a={a} b={b} c={c}")

>>> foo(1)  # uses the default list
a=1 b=x! c=[2]

>>> foo(1)  # uses the *same* default list
a=1 b=x! c=[2, 2]

>>> foo(1, 'z', [3]) # uses another list
a=1 b=z! c=[3, 2]

>>> foo(1)  # uses the *same* default list, again
a=1 b=x! c=[2, 2, 2]

A mutable default can be used as the function’s private state as an alternative to functional-traditional closures and object-oriented classes.

But in general a mutable default is most likely to be a bug.

Could Python have a way to prevent such thing? Or better, could Python have a way to restart or refresh the mutable defaults in each call?

This question raised up in the python-list. Let’s see how far we get.


Home Made Python F-String

Tags: Python

July 11, 2021

Python 3.6 introduced the so called f-strings: literal strings that support formatting from the variable in the local context.

Before 3.6 you would to do something like this:

>>> x = 11
>>> y = 22
>>> "x={x} y={y}".format(x=x, y=y)
'x=11 y=22'

But with the f-strings we can remove the bureaucratic call to format:

>>> f"x={x} y={y}"
'x=11 y=22'

A few days ago Yurichev posted: could we achieve a similar feature but without using the f-strings?.

Challenge accepted.


Cipherchat (Crypto writeup - EKO 2019)

Tags: challenge, eko, hacking, python, bytecode

October 1, 2019

We start with a communication between two machines, encrypted with an unknown algorithm and the challenge is to break it.

As a hint we have the code that the client used to talk with the server.

- Martin Di Paola