Experience report: Linux pwm-ir-tx accuracy/reliability

Published 12 Mar 2023.

At Stb-tester we send infrared remote-control signals to test TVs and set-top-boxes; and we analyse the video output from the device-under-test to verify that it is behaving correctly. For this automated testing to be viable at scale, it’s crucial that we can rely on our remote-control emulation to be 100% accurate.

For many years we have used LIRC’s ftdix user-space driver which offloads the entire signal to an external USB device. Recently we started experimenting with the in-kernel pwm-ir-tx driver; the PWM hardware will generate the carrier signal by switching the infrared output on & off at the desired frequency (say 38kHz), but the CPU needs to be involved to start & stop this carrier signal at the beginning & end of each “pulse” — these pulses or the spaces between them can be as short as 200µs. If the duration of these pulses and spaces isn’t accurate enough, the receiver might not recognize the signal.

We have a pretty decent test-bench for testing the accuracy/reliability of sent IR signals, where we can send say 500,000 IR signals and we can verify that each & every one of them had the desired effect on the receiver (e.g. a Roku set-top-box). For more details of the test-bench see https://stb-tester.com/blog/2016/05/26/ir-post-mortem (you could set this up cheaply with stb-tester and a v4l2 HDMI capture device).

On its own, pwm-ir-tx works well enough. But when the system is under load, the accuracy drops drastically. Our system needs to run video-capture processes, a WebRTC client, etc. which I guess cause loads of interrupts and general competition for CPU time. Under normal usage our load average is around 4.5 (it’s a quad-core CPU) and context switches 16-19kHz.

We tried many things to make the IR signals reliable with pwm-ir-tx under these conditions. All of the following experiments didn’t work, with at least 2% of IR signals failing (often much worse). “Failing” means that the device receiving these signals didn’t recognize them:

Using the “performance” CPU governor to disable CPU throttling.
Setting the lircd process to SCHED_FIFO.
Pinning lircd to a specific core and ensuring no other SCHED_FIFO processes running on that CPU.

In the end, the only thing that worked was all of these changes together:

Change the pwm-ir-tx kernel module to use udelay (busy-wait) instead of relinquishing the CPU.
Set the lircd process to SCHED_FIFO, and pin it to a specific CPU core where we don’t have any higher-priority SCHED_FIFO processes (so it won’t be preempted).
Use the “schedutil” CPU governor (I suppose “performance” would work too, but “interactive” did not). schedutil disables CPU frequency-scaling while a SCHED_FIFO process is running.

Under that configuration, we successfully sent 500,000 IR signals without any failures (aka “missed keypresses” on the Roku) but it sucks to dedicate an entire CPU to such a trivial task.

I found this stackexchange answer helpful (bold emphasis mine):

The smallest time you can wait for with nanosleep() is driven by several factors:

timer slack

context-switch time

timer accuracy and precision

With a default timer slack value of 50 µs, even when calling nanosleep with 0 ns or 1 ns duration you effectively wait for 50 µs or so, by default, if your process runs under normal (OTHER) scheduling.

You can disable the timer slack mechanism via a prctl() call or by running your process under a realtime scheduling class.

However, a nanosleep call then still yields a voluntary context switch, which takes 3 µs or so, depending on your hardware.

We use SCHED_FIFO so timer slack should be disabled; that leaves context-switch time and timer accuracy.

In a private email exchange, Sean Young (Linux kernel infrared maintainer) said:

pwm-ir-tx is not great, I agree. gpio-ir-tx should be very accurate. However, that holds a cpu core for the duration of the send, which is not good. It does not play nicely with other parts of the kernel (e.g. the realtime patches).

With pwm-ir-tx the hope was that it would replace gpio-ir-tx without holding a cpu core. In reality it does not work well enough, scheduling is just not predictable enough for this. We could just hold the cpu while sending with pwm-ir-tx, but then it’s just not worth it to use pwm, might as well use gpio-ir-tx, the cpu is busy spinning anyway.

Update 12 May 2023:

I had written:

Possible future work:

Modify pwm_ir_tx to printk the timings it observed after waking — it’s already calling ktime_get anyway.

Once we have measured the largest jitter we get on our hardware, we can go back to using usleep but waking a bit earlier so that we can then busy-wait until exactly the right time.

I added this debug logging (see my commit 30ddd7002ff3). Unfortunately the jitter is as large as a typical pulse/space duration, so this approach wouldn’t help.

total signals:   1007
perfect signals: 0 (0.00%)
bad signals:     1007 (100.00%)
bad edges:       35245 (97.22%)
  max:    1477 µs
  min:    1 µs
  mean:   27.28 µs
  median: 23.00 µs
  stddev: 26.22 µs

total pwm_enable/disable operations: 36252
max pwm_enable/disable:    145314 ns
min pwm_enable/disable:    833 ns
mean pwm_enable/disable:   2051 ns
median pwm_enable/disable: 1927 ns
stddev pwm_enable/disable: 1415 ns

With my patches to use udelay and disable interrupts, the debug shows a much better picture (but then we might as well use gpio-ir-tx, as pointed out earlier):

total signals:   1005
perfect signals: 1001 (99.60%)
bad signals:     4 (0.40%)
bad edges:       4 (0.01%)
  max:    2 µs
  min:    1 µs
  mean:   1.25 µs
  median: 1.00 µs
  stddev: 0.43 µs

total pwm_enable/disable operations: 36180
max pwm_enable/disable:    14948 ns
min pwm_enable/disable:    677 ns
mean pwm_enable/disable:   992 ns
median pwm_enable/disable: 885 ns
stddev pwm_enable/disable: 404 ns

There might still be some jitter caused by how long pwm_enable / pwm_disable takes — as much as 15µs. But to measure the effect on the actual signal we’ll have to analyse the output with an oscilloscope.