The Book of Gehn

Deadly Typos 2020

April 24, 2021

A quick summary of the top 3 typos that I found (or I wrote) in code that they were small but they had a deep impact on the functionality.

In fact, the three bugs could summarized as follows:

Do you want to see what is this about?

Deep sleep

void may_sleep(uint64_t elapsed_inactivity) {
    if (elapsed_inactivity < MIN_INACTIVITY_FOR_SHORT_SLEEP)
        return;
    else if (elapsed_inactivity < MIN_INACTIVITY_FOR_LONG_SLEEP)
        usleep(SLEEP_DUE_INACTIVITY_FOR_SHORT_TIME_USECS);
    else
        sleep(SLEEP_DUE_INACTIVITY_FOR_LONG_TIME_USECS);
}

Issue? A missing u.

Impact? The code calls sleep instead of usleep and sleeps for, let says, 10000 seconds instead of 10000 microseconds.

The may_sleep was intended to put to sleep the CPU for a small period if there was not activity for some time.

If a task is received while the CPU is sleeping, the worst case would be a delay of 10000 microseconds (10 milliseconds) before the task begins to be processed.

With 10000 seconds, well, that’s almost 3 hours.

Only one

struct row_t *rows = matrix->rows;
for (int i = 0; i < matrix->row_count; ++i) {
    struct row_t *row = rows[i];
    for (int j = 0; j < row->column_count; ++j) {
        struct column_t *column = rows->columns[j];
        process(column);
    }
}

Issue? An extra s.

Impact? The nested loops process every item of a 2d matrix, scanning in the for each row, for each column classic order.

For simplicity the i-th row is loaded in the outer loop with a local row pointer.

Unfortunately the inner loop uses rows instead of row so the inner loop always processes the same columns of the first row. The local variable row is used only for knowing the column count hence the compiler does not warn use about an unused variable.

This lead to resources without proper initialization and memory corruptions all because the was an extra s.

Less is worse

void one_round(int *ports, uint32_t *start, struct msg_t **msg) {
    *msg = NULL; // not found

    uint32_t ix = *start;
    for (uint32_t i = 0; i < PORT_CNT; ++i) {
       read(ports[(ix + i) % PORT_CNT], msg);
       if (msg)
          break;
    }

    *start = ix+1;
}

void scan(int *ports) {
    *msg = NULL; // not found

    uint32_t start = 0;

    while (!should_exit()) {
        one_round(ports, &start, msg);
        if (*msg) {
            process(msg);
        } else {
            sleep(1); // no msg, sleep
        }
    }
}

Issue? A missing *

The one_round function read from N ports doing a round robin starting from *start. Each round may be interrupted if a message is found and a call to process is made.

If a full round is made without reading a message, scan takes a little nap assuming that it is not under a heavy workload.

But one_round is wrong. It checks for if (msg) instead of if (*msg). The former is always true returning earlier with a message probably empty.

scan checks for if (*msg) and if it is NULL it will believe that the subfunction did a full round and it didn’t find a message.

Under a heavy workload on all the ports, this typo was unnoticed because *msg was always not-NULL.

But when a single port was with less data, even if the rest of the ports were super-busy, scan did a sleep(1) impacting negatively in the performance. Less work made the things worse!

Related tags: pointer, memory

Deadly Typos 2020 - April 24, 2021 - Martin Di Paola