Getting kodi to run nicely on Raspbian

This blog post is about getting Kodi up and running with Netflix on Raspbian. This is not a tutorial, more a collection of notes for myself in order to reproduce the setup.

Basic install

Get a recent version of Raspbian from the Raspberry Pi website. (Or my ftp mirror). Extract it to a fresh micro-SD card and get the system ready. Follow this guide, if you need help. Boot into the system and run a update and install some handy tools

Next, we are gonna harden the system. For that do the following

  • Set root password
  • Rename user pi to something else
  • Remove the NOPASSWD for user pi
  • Permit root-ssh login only via public key

Install Kodi

First add the following repository

Now install kodi

Autostart Kodi

Ok, kodi is installed, now we need to start kodi. For that we create a systemd service by putting the following to /etc/systemd/system/kodi.service

Install the Netflix Plugin

The Netflix Plugin is hosted on GitHub. Download the plugin and install the zip-file. Best option seems to be to put it on a USB-Stick and install it from there. This plugin needs parted or fdisk, you install them on your kodi and then let the plugin install the widevine library, necessary for the DRM. The DRM is by the way also the thing, that was in the way of getting Netflix work in the first place. And it's still a bit of a mess, since we are using here a extraction of the libwidevine from the chromecast.

Well, as long as it works, but it's not nice and probably creating causing one or two times headache until it finally works.

Allow Kodi to reboot and shutdown

This is now more a quick fix, based on the suggestion from here. The original clue was posted long time ago on the kodi forums, but those posts were only of limited help. So, create the following file

Finetuning

I encountered the problem, that if Kodi runs for too long (multiple days), the Netflix plugin stopped working. No errors given, it just won't play a movie again. A quick fix is to schedule a reboot every night using a cronjob

Further work

This is of course just the basic installation. You will need to configure Kodi to your needs (Skins, Addons, Timezone, connect your NAS, ...)

Also, I might create an ansible-playbook to setup this procedure. This looks like a fun project to do on a rainy Sunday.


For now, I'm off, watching some Netflix 🙂

btrfs being notable slow in HDD RAID6

Disclaimer: I don't want to blame anyone. I love btrfs and would very much like to get it running in a good manner. This is a documentation about a use case, where I have some issues

I am setting up a new server for our Telescope systems. The server acts as supervisor for a handful of virtual machines, that control the telescope, instruments and provide some small services such as a file server. More or less just very basic services, nothing fancy, nothing exotic.

The server has in total 8 HDD, configured as RAID6 connected via a Symbios Logic MegaRAID SAS-3 and I would like to setup btrfs on the disks, which turns out to be horribly slow. The decision for btrfs came from resilience considerations, especially regarding the possibility of creating snapshots and conveniently transferring them to another storage system. I am using OpenSuSE 15.0 LEAP, because I wanted a not too old Kernel and because OpenSuSE and btrfs should work nicely together (A small sidekick in the direction of RedHat to rethink their strategy of abandoning btrfs in the first place)

The whole project runs now into problems, because in comparison to xfs or ext4, btrfs performs horribly in this environment. And with horribly I mean a factor of 10 and more!

The problem

The results as text are listed below. Scroll to the end of the section

I use the tool dbench to measure IO throughput on the RAID. When comparing btrfs to ext4 or xfs I notice, the overall throughput is about a factor of 10 (!!) lower, as can be seen in the following figure:

Throughput measurement using dbench - xfs's throughput is 11.60 times as high as btrfs !!!

sda1 and sda2 are obviosly the same disk and created with default parameters, as the following figure demonstrates

My first suspicion was, that maybe cow (copy-on-write) could be the reason for the performance issue. So I tried to disable cow by setting chattr +C and by remounting the filesystem with nodatacow. Both without any noticeable differences as shown in the next two figures

Same dbench run as before, with with chattr +C. No notice-worthy difference
Same dbench run as in the beginning, but with nodatacow mount option. Still, negligible difference

Hmmmm, looks like something's fishy with the filesystem. For completion I also wanted to swap /dev/sda1 and /dev/sda2 (in case something is wrong with the partition alignment) and for comparison also include ext4. So I reformatted /dev/sda1 with the btrfs system (was previously the xfs filesystem) and /dev/sda2 with ext4 (was previously the btrfs partition). The results stayed the same, although there are some minor differences between ext4 and xfs (not discussed here). The order of magnitude difference between btrfs and ext4/xfs remained

Swapping /dev/sda1 and /dev/sda2 and testing dbench on ext4 yields the same results: btrfs performs badly on this configuration

So, here are the results from the figures.

/dev/sda1 - xfsThroughput 7235.15 MB/sec 10 procs
/dev/sda2 - btrfsThroughput 623.654 MB/sec 10 procs
/dev/sda2 - btrfs (chattr +C)Throughput 609.842 MB/sec 10 procs
/dev/sda2 - btrfs (nodatacow)Throughput 606.169 MB/sec 10 procs
/dev/sda1 - btrfsThroughput 636.441 MB/sec 10 procs
/dev/sda2 - ext4Throughput 7130.93 MB/sec 10 procs

And here is the also the text file of the detailed initial test series. In the beginning I said something like a factor of 10 and more. This relates to the results in the text field. When setting dbench to use synchronous IO operations, btrfs gets even worse.

Other benchmarks

Phoronix did a comparison between btrfs, ext4, f2fs and xfs on Linux 4.12, Linux 4.13 and Linux 4.14 that also revealed differences between the filesystems of varying magnitude.

As far as I can interpret their results, they stay rather inconclusive. Sequential reads perform fastest on btrfs, while sequential writes apperently are horrible (Factor 4.3 difference).

Linux 5.0

Yeah, I am running a recent self-made build of Linux 5.0 🙂

Despite the major version number change, there's nothing more special about this version, that with other releases. Still, I find this pretty cool!

Now, back to work ...

ssh config for IPv6

This is just a short note to remind me, how to configure a link-local IPv6 address in the ssh-config

Remember to put two precent signs, otherwise you might get errors similar to the following

IPv6 for the win!

Gridengine and CentOS 7

... there's life in the old dog yet!

We are still using the Gridengine on some of our high performance clusters and getting that thing running isn't really a piece of cake. Since Oracle bought Sun, things have changed a little bit: First of all, the good old (TM) Sun Grid engine doesn't exist anymore. There are some clones of it, with the most promising candidate being probably the Son of Grid Engine project. This is also what I will refer as gridengine henceforth. Noticeworthy, but not covered is the OpenGrid Scheduler and the commercial Univa Grid Engine (I'm not linking there), which is just the old Sun Grid engine, but sold to Univa and commercially distributed

In the Debian world, there is a gridengine deb packet, which just works nicely as it should do. There was a el6 port for CentOS 6, but there is nothing official for CentOS 7 (yet?). I've build them myself and everyone is free to use them as they are. They are provided as-they-are, so no support or warranty of any kind are provided. Also, they should work just fine as they are

Building the Son of Grid Engine

The process was difficult enough to make me fork the repository and setup my own GitHub project. My fork contains two bugfixes, which prevented the original source from building.
The project contains also build instructions in the README.md for OpenSuSE 15 and CentOS 7 and pre-compiled rpms in the releases section.

Short notes about building

The Gridengine comes with it's own build tool, called aimk. One can say a lot about it, but if treated correctly it works okayish. The list of requirements is long and listed in the README.md for CentOS 7 and OpenSuSE 15. It hopefully also works for any other versions.

SGE uses a lot of different libraries. Mixing architectures for a single cluster environment is in general a bad idea and SGE might work, but you really don't want to hassle with the inevitable white hairs that come with all of the unpredictable and sometimes not-easy-to-understand voodoo errors that occur. Just ... Don't do that!

I never used the Hadoop build, so all binaries and everything is tested with -no-herd.

For the impatient (not commented)

Notes about installing the Gridengine

I've tried to automate the install with ansible, but the install_execd -auto script proves to be quiet unreliable. After several failed attempts, I decided to install the Gridengine manually from a shared NFS directory.

This is in general a good idea, as the spool directory anyways needs to be in a NFS share. To prevent trouble I have separated the binaries (read-only NFS) from the spool directory (read-write access to all nodes).

I've tried to mix CentOS and OpenSuSE. The Gridengine works with each other, but you will run into other problems as the execution environment is different. Don't do that!

Running the SGE over NFS is the way I recommend. Be aware of the hassle, when the master node becomes unresponsive. In that case, don't do magic tricks, just reboot the nodes. Everything else is fishy.

Known problems with Son of Grid engine

This section is dedicated to document two bugs and make them appear on google, so that other unfortunate beings, who encounter the same problems can find a solution. I've encountered two errors, when trying to build the original 8.1.9 version

This problem was the reason for me to fork it. Comment out line 51 in sge-8.1.9/source/3rdparty/qtcsh/sh.proc.c


I encountered this error when building as root. Try building as unprivileged user (which you should do anyways!)

Mirrors

I am mirroring the current version of Son of Grid engine on my ftp-server. My own fork is in the GitHub repository gridengine.

iphex

I wrote a small bash script, that transforms IP addresses into HEX format. The tool consists of 10 lines of bash script

I needed the tool to match IP-addresses to HEX files for PXE boot. Normally PXE boot fetches first the MAC-address, and then iteratively for the HEX representation of the IP address, with reducing the number of matching characters. Oracle documents the behavior very nicely for the IP address "192.0.2.91"which matches "C000025B" and the imaginary MAC-address "88:99:AA:BB:CC:DD". Then the PXE client probes for the following files (in the given order)

Now, with iphex I can easily convert the more used numerical representation of IP-addresses like 192.168.2.91 into the IMHO not directly visible HEX representation.

zfs - CKSUM errors (and a bad SATA cable)

Just some days ago, I run a zpool scrub on my ZFS RAID-z array on my home NAS. This is actually a piece of beauty - Powered by a low-energy Celeron and using FreeBSD with ZFS, this is our main storage system for photos, pictures, data, disk images, ecc. It just powers the whole home infrastructure. So during a normal routine zpool scrub, I noticed CKSUM errors popping up.

zfs CKSUM errors

Now, this is not good as CKSUM indicates the number of uncorrectable checksum errors. So, rapid action is required!
As the affected disk is a rather old disk, my first guess was, that this is an indicator of a disk going bad. So, first I run smartctl to check the smart status. So, first I checked the glabel to find the underlying physical device

use glabel status to map the output of zpool status to the underlying hardware
Output of smartctl -a /dev/ada4

The smart status looked good, but I still decided to replace the disk, as the CKSUM errors made me nervous. I put into the new disk and replaced it with the default zfs tools via

The zpool replace command immediately exploded into my face. In dmesg I could trace back some weird messages with something like "No such pool or dataset"
Now, that's weird. So apparently the issue was not the disk itself, but what else? Probably it is rather just a faulty connection. Could be a failing SATA controller or just a bad SATA cable. The cable was working just fine for years, I didn't changed anything so I was wondering. Normally I always suspect moving or mechanical parts first to fail, so a plain cable is way below a failing HDD when CKSUM errors appear in my priority list.
Yeah, I was wrong. Just replaced the SATA cable (because I had one and it was orders of magnitude easier than replacing a SATA controller) healed the whole system. Now zpool scrub runs happily through for the second time (the first time it had still to recover some errors, probably from faulty writes in the first place) and now the NAS is running smoothly as it was doing always.

No errors, everything is happy again 🙂

So, in a nutshell

CKSUM errors in ZFS without any READ or WRITE errors are sometimes also just triggered by a faulty connection (a bad SATA cable) and do not necessarily indicate a failing hard disk.

Lucky me, now I have a sparse disk on stock in case one day one disk really goes bad 🙂

Zsh and Home/End/Delete buttons

I've notices that in Zsh under Mate, the HOME/END/DELETE buttons are for some reasons not working as I expected them to work. I use vim keybindings, and am still accustomed to sometimes hit the end button to reach the end of the line. So far it has never been a problem, but zsh just reacts weirdly here. Before triggering a rage quit, I found a solution of how to deal with it. Put the following lines in your .zshrc and you're good

I found this solution here and mirror it on my blog, in case the original solution gets lost or something.

I also link oh-my-zsh here, in case someone just hopped on zsh as well and wants to make it as fancy as possible 🙂

Kernel build bug - KVM_AMD and CRYPTO_DEV_CPP

About a week ago, I failed to build a Kernel for my new Ryzen 2700X working machine. After some time of configuring my kernel I run into some weird problems

The problem

I wanted to have a Kernel with KVM_AMD support enabled. The build was going on fine, until some weird linker errors appeared.

(Full output [Pastebin])

Since I'm a Kernel rookie, it took me some time to realize what was going on. A google search didn't revealed a solution, other than something similar on Unix Stackexchange, that was not directly applicable for my case.

The problem persisted and is reproducible in linux-4.17.1 and linux-4.16.15, using this config file. Building linux-4.14.49 was doing fine. For any options that were not defined by the config file I chose the default suggestion.


Workaround

The problem arises, if CONFIG_CRYPTO_DEV_CCP_DD is compiled as module [=m], also if the SEV is not used. Enabling CONFIG_CRYPTO_DEV_CCP_DD to be compiled in the kernel [=y] is a workaround for the issue.

This commit already revealed the issue.

I had to Include the "Secure Processor device driver", that is found in Cryptographic API > Hardware crypto devices

Weirdly, the suggested solution from Unix Stackexchange was not solving the problem for me, neither was it causing problems. I could build the Kernel (4.17.1) with "Kernel-based Virtual Machine Support" set as module. But those are just my two cents, it might have been an issue some versions ago ...

Unluckily I cannot contribute to Unix Stackexchange yet (not enough reputation *sigh*), so I cannot improve the answer there.

Thanks to Richard!

Many thanks to Richard, who provided me with support, regarding nailing it down to a bug in the Kernel build system.

Resizing a btrfs parition

This is a simple note to myself, in case I need to re-do this again. How to resize a btrfs partition to maximum size (or full capacity)

  1. Resize the partition using parted
  2. Resize the btrfs filesystem using

    Would it make sense to create an alias, so that also btrfs filesystem resize 100% or other percentages would work?

In a nutshell example

Detailed example

First resize the partition (I use parted for that purpose)

Done.