Faster than rsync?

Since the dawn of time if you needed a reliable way to synchronize data within or between systems one of the best answers to that problem was rsync. It’s been in continuous development since 1996 and by now has code quality and feature set that is basically unrivaled. But what it never got was any sort of parallel transfer option. You could pipe it up with xargs and hack your own, but even that could cause problems as folders may not have yet been created when another sync thread comes along. So if you had the time to just wait for it to move a single file one at a time you were fine. It would, no doubt about it, sync your data even over an unreliable WAN connection. But like all things TCP latency and packetloss cause massive backoffs (true for cubic, reno, etc) and your transfer rate is going to crawl.

So how do we move large datasets with reliability AND speed? Let’s do some testing and find out.

Dataset: My personal repo of Ubuntu ISO files going back to 12.04. It clocks in at 193GB.

Environment: LAN 10Gbit NVME to NVME

Contender #1: rsync

As a baseline, rsync finishes in 10 minutes 34 seconds.

Contender #2: rsync + HPN-SSH

HPN is a set of patches for SSH and SFTP that tune the software stack for TCP performance. It also allows you to forgo encryption, which will be the limiting factor for large pipe local networks. Over longer runs (WAN) TCP bandwidth delay product is likely to be your bottleneck. And HPN is specifically made to address this. Result: 9 minutes 8 seconds. It’s faster but it won’t blow you away. About as expected as HPN is more tuned for solving TCP performance problems that aren’t going to be an issue on LAN.

Contender #3: rsync (again!) daemon

Did you know that rsync has another mode that bypasses SFTP? Many people who have used it for years don’t know of it. You must configure each storage location you want to be available in /etc/rsyncd.conf. See an example:

uid = nobody
gid = nobody
address = 10.10.100.55
use chroot = yes
max connections = 10
syslog facility = daemon
pid file = /var/run/rsyncd.pid
log file = /var/log/rsyncd.log
lock file = /var/run/rsyncd.lock
[file]
  comment = rsync
  path = /rsync/
  read only = no
  dont compress = *.gz *.bz2 *.zip
  auth users = linuxuser1

Once that is done you can hit it from a remote machine. The daemon mode has no encryption, so it’s only to be used on trusted networks. LAN or over VPN. Local firewalling would be a good idea. Default port is TCP/873. Results? 4 minutes 1 second.

Contender #4: rclone

A newer piece of software, rclone is designed to provide remote storage access for a wide array of storage types. Check out the whole list. It can get you access to perform operations against Google Drive, WebDAV, Dropbox, or even object store systems like S3. But for today we are interested in a fancy trick it can do. Something rsync never learned. Parallel operation. It’s switch configurable, but the default of 4 was find for testing. Same dataset over rclone was done in 5 minutes 43 seconds. While I didn’t get 4x speed over rsync like Jeff Geerling, it put up a good result.

Contender #5: fdt

Let’s get a bit more exotic. FDT is a java jar that you can use to move files over any network that can handle TCP. It autoscales threads, buffers, and streams to find out the limit of the pipe and peg it out. And performance was impressive. The only tool today that maxed out the link speed and was very flat at line rate for the whole test. Done in 2 minutes 33 seconds, it was by far the fastest. But there are drawbacks. FDT does not do directory structure. It can do individual files or you can give it a pre-compiled text list of files to operate on. Since it won’t look into folders, I had to use truncate to make a sparse file of the same size as my dataset to test against.

truncate -s 193G testfile.img

So what did we learn? There are tradeoffs like all things in life. Here they are speed, security, or convenience. It’s all a toolbox, so reach in and use your brain to get building.

Linux Bluetooth Audio

Like any proper aficionado of moderately forgotten John Cusack movies, I appreciate some High Fidelity. Living in modern times, convenience is also a factor. I’d love to roll around with a set of Sennheiser open cans, but the cables and the noise leakage are disruptive. So after watching MKBHD express ambivalence over the newest Cupertino creation I also noticed how nice the Sony WH-1000XM4 started to look.

Active noise cancellation, decent audio quality, comfort, touch controls, and the glorious freedom from wires. It’s a Christmas present to myself. But the default audio quality under Linux just wasn’t great. Turns out I was listening to the fallback audio option for Bluetooth, SBC. SBC is sort of the basic fallback for Bluetooth audio when better options aren’t available. It’s required in order to support the A2DP standard. There are much better options, but they all come with caveats. AAC, LDAC, and aptX are the options, and each of them is patent encumbered. Because of this, Windows does not natively support any of them. In order to use any of these codecs you would need to have a hardware device that natively does the conversion. No doubt this is due to legal and licensing issues that would rustle the legal jimmies up in Redmond. Linux, however, ever respects your freedom and with a little patching will happily run any of them.

So let’s get out the old command line and cook up some audiophile goodness.

Linux audio system pulseaudio is pluggable, extensible, and generally quite good. Distributions ship with a version of the bluetooth modules that won’t step on any patent toes. A bit of Github code can extend functionality.

On Debian I had to install some dependencies first.

sudo apt install libavcodec58 libavcodec-extra libavcodec-extra58 libfdk-aac2 ffmpeg pkg-config cmake libtool libpulse-dev libdbus-1-dev libsbc-dev libbluetooth-dev libavcodec-dev fdkaac libldacbt-abr-dev libldacbt-enc-dev libldacbt-enc2 libldacbt-abr2 bluez-hcidump pkg-config cmake fdkaac libtool libpulse-dev libdbus-1-dev libsbc-dev libbluetooth-dev libavcodec-dev git checkinstall libfdk-aac2 libfdk-aac-dev

Pull down our module code and build it.

git clone --recurse-submodules https://github.com/EHfive/ldacBT.git

cd ldacBT/

mkdir build

cd build

cmake -DCMAKE_INSTALL_PREFIX=/usr -DINSTALL_LIBDIR=/usr/lib -DLDAC_SOFT_FLOAT=OFF ../

checkinstall -D --install=yes --pkgname libldac 

cd ..

git clone --recurse-submodules https://github.com/EHfive/pulseaudio-modules-bt.git

cd pulseaudio-modules-bt

mkdir build

cd build

cmake ..

make

sudo make install

pulseaudio -k

sudo systemctl restart bluetooth.service

Now you can choose any of the codecs for use. I’m sticking with AAC, as it’s supported by my Sony headset and sounds delicious.

DNS Deathmatch

For the longest time /etc/resolv.conf has been a very simple little file telling Linux where to go for DNS resolution. These days it’s quite the popular file. System daemons old and new are lining up to have their way with what should be a very simple little config. Leading to memes like this one:

While some will symlink to their own file, others want to set up a local caching resolver and point to that. The problem with most of these is that roaming between networks will break your config should you have a mobile device. And local resolution won’t work if you are using a local resolver like systemd-resolved wants to do. It’s not exactly difficult to get into a situation where more than one is installed in which case they fight to wrest control of the single config file. So how do we unfuck this mess? Let’s look at Ubuntu, which now ships with systemd-resolved.

By default we can see that systemd-resolved has a symlink in place. It’s set up a local resolver that won’t respect your upstream DNS that should be assigned by your DHCP server.

So…let’s kill that.

Now that Lennart Poettering has been sufficiently triggered we can tell NetworkManager to do its job.

The old symlink is still in place, so that needs to be removed. A restart of the NetworkManager service writes out a new resolv.conf with the actually assigned DHCP DNS nameservers.

It only took five minutes to fix something that shouldn’t have been broken in the first place. 🤦

0

Ubuntu on the Decline

Ubuntu has been in the distro news lately with it’s nascent release of 20.04. Ubuntu has long been touted as the beginner friendly distro. And while there have been some historical gaffes and strange decisions I’ve never been one to actively discourage the use of Ubuntu for those that were interested in it. I never even dinged them for basically doing nothing that Debian didn’t do already. Every one of their releases is basically a pull of Debian testing to fit a time window. Unfortunately, I can no longer defend Ubuntu.

I tried out the latest release on a spare Thinkpad. One I use to check out distros occasionally as they come out. The t450s has a Broadwell generation i7 and 12GB of RAM. It shouldn’t be slow, especially as the installs go to a Intel DC series SSD. But this time around I found Ubuntu to feel laggy. Window animations and the opening of some programs would halt and hitch and generally run poorly. A little digging reveals that part of this is due to Ubuntu’s use of SNAP packages by default. If you install chromium from the official repos, for instance, you aren’t getting a native binary for the browser. You are getting a SNAP application. While SNAPs are a good idea I think they are a poor implementation. Flatpak tend to be much faster. Other applications have been replaced by SNAPs seemingly with no rhyme or reasoning:

  • gnome-calculator
  • gnome-characters
  • gnome-logs
  • gnome-system-monitor

Why do this? Why these applications? Why is my calculator taking so long to open? You might want to just remove them. But then they have a nasty habit of coming back. As YouTuber DistroTube discovered after his gaffe in the comments. He uninstalled snapd only to have it reinstall itself as a dependency moments later.

So I’ve prattled on about SNAPs, why I don’t like them, and why I think they were implemented in a sneaky and illogical way. Is that my only problem with Ubuntu? Goodness no.

Another example is the default DNS configuration. Since about 1991 /etc/resolv.conf has been where you go to declare your DNS nameservers. It’s worked just find all that time, so clearly it needed to be broken. Now you get autogen nameservers defined in /run/resolvconf/resolv.conf that need a symbolic link to /etc/resolv.conf. dpkg-reconfigure can fix your broken symbolic link. It took me 10 minutes of searching to fix something that was broken for no well explained reason.

So I’m done with Ubuntu. There are just too many other options that don’t do silly things for silly reasons.