figbert.com-gemini

[ACTIVE] the capsule and home of figbert in geminispace
git clone git://git.figbert.com/figbert.com-gemini.git
Log | Files | Refs | README

2021-06-17-wrong-way-to-switch-server-os.gmi (9066B)


      1 # The Wrong Way to Switch Operating Systems on Your Server
      2 
      3 After moving my server to Hetzner, I built up a large collection of self-hosted services I use on a daily basis: from fun things like an RSS reader and an IRC bouncer, to critical services like my email. I ran them all with Docker Compose from a Debian VPS.
      4 
      5 For the last couple months, however, I've been meaning to move away from Debian and towards something more minimal and clean. Over this last weekend, I decided to move to Alpine Linux.
      6 
      7 ## The Plan
      8 
      9 The transition was supposed to be quick and dirty:
     10 
     11 1. Shut down all the services running on my VPS
     12 2. Make a backup of relevant files with Tarsnap
     13 3. Mount Alpine Virtual ISO image and setup the OS
     14 4. Restore files from Tarsnap backup
     15 5. Bring everything back up
     16 
     17 In a previous move between two servers, I simply rsynced the relevant files over to the new VPS. Here, where I'm just switching operating systems on a single server, I figured I could make a backup with Tarsnap, and be done within the day.
     18 
     19 However, backups are much more complex than simply transferring files from one server to another. My haphazard strategy resulted in three days of stress and frustration as I clambered to restore a self-hosting empire that I myself had reduced to ash.
     20 
     21 ## Day One
     22 
     23 I began my work on the transition full of optimism, if a bit stressed. I had read through the Tarsnap online documentation a number of times, and was ready to make my first attempt. I loaded my Tarsnap account up with USD$10 and ran:
     24 
     25 ```sh
     26 $ sudo tarsnap -c -f backup-name docker-compose.yml ...
     27 ```
     28 
     29 My terminal sat empty for hours. There were no changes – the process was running, but there was no feedback. I was nervous.
     30 
     31 > What if it failed silently?
     32 >
     33 > How can I check?
     34 >
     35 > What should I do?
     36 
     37 I pressed <Ctrl-C>.
     38 
     39 To my horror, stats printed to the screen: the backup had been 90% complete, and I had stopped it. Convinced I had ruined the backup completely, I deleted the partial backup from Tarsnap and started again from scratch.
     40 
     41 This was my first, but not last, moment close to tears. I went to sleep and let the backup run overnight.
     42 
     43 ## Day Two
     44 
     45 Day Two began well: I woke and the backup was finished! I wiped the VPS, installed Alpine, and brought it up to spec. I created a regular user, configured SSH, and decided to use doas instead of sudo for a change. Alpine, so far, feels great to use. None of the cruft that bothered me when using Debian.
     46 
     47 ### Virgin Tarnsap
     48 
     49 With Alpine set up, I started to restore the backup:
     50 
     51 ```sh
     52 $ doas tarsnap -x -f backup-name
     53 ```
     54 
     55 Once again, after running all day it had not finished.
     56 
     57 I opened up a new tmux window and poked around the filesystem. All my files seemed like they were already there...
     58 
     59 > What if it failed silently?
     60 >
     61 > How can I check?
     62 >
     63 > What should I do?
     64 
     65 I pressed <Ctrl-C>, cutting off the download, and tried to bring everything back online:
     66 
     67 ```sh
     68 $ doas docker-compose up -d
     69 ```
     70 
     71 It errored out. All my environment variables were undefined. Then it hit me: I forgot to back up the .env file. My eyes welled up.
     72 
     73 Still, I was determined. I worked to reconstruct the .env file from secrets I had stored in Bitwarden (my offline copy, because my vault is self-hosted and was thus down).
     74 
     75 I ran it again:
     76 
     77 ```sh
     78 $ doas docker-compose up -d
     79 ```
     80 
     81 One of my services was missing a Dockerfile to build. I shouldn't have pressed <Ctrl-C>! I was a total moron.
     82 
     83 I put on a sad song. I was close to tears once again.
     84 
     85 I gathered what was left of my resolve and trudged onwards. I searched tarsnap's manpages looking for something to speed up my download.
     86 
     87 I found a number of flags that could have helped me *make* a backup better the next time around, but nothing that would help me restore the backup any faster. With nothing in the manpages, I went to look at the helper scripts.
     88 
     89 ### Chad Redsnapper
     90 
     91 That's when I found it:
     92 => https://github.com/directededge/redsnapper redsnapper
     93 
     94 A Ruby script that runs multiple tarsnap clients at once to extract archives fast. Fucking precisely. I wiped out the incomplete files I had restored, downloaded Ruby and started restoring from the backup once again:
     95 
     96 ```sh
     97 $ doas redsnapper backup-name
     98 ```
     99 
    100 I changed the song, and watched the files fly by on my screen. I went to sleep, confident I would wake to good news.
    101 
    102 ## Day Three
    103 
    104 The download had failed trying to download a large .mkv file.
    105 
    106 ### Manual Exclusion
    107 
    108 I restarted redsnapper, explicitly excluding the .mkv it had failed to download, and let it run until it came on another movie and crashed again (an hour or so later). I excluded the second movie file and sent it to run again.
    109 
    110 This was a long, boring process. It sucked.
    111 
    112 ### An Afternoon Breakthrough
    113 
    114 Then I realized something. redsnapper kept crashing when it hit movies I had stored in Jellyfin.
    115 
    116 > I don't need Jellyfin at all. I've never watched a movie more than
    117 > once.
    118 >
    119 > The movies take up massive storage on disk, and keep causing tarsnap
    120 > to crash. They don't compress well either, so they take up a fuckton
    121 > of space in the archives.
    122 >
    123 > I can always download the movies again if I want to give them
    124 > another go.
    125 >
    126 > Why the fuck am I forcing myself to deal with this shit?
    127 
    128 I stopped the download in the middle - the day's third, after two earlier attempts that ended after encountering movie files – and changed the command slightly before rerunning. After a number of errors I couldn't explain, I realized my account was negative and topped it up with another USD$25 before running:
    129 
    130 ```sh
    131 $ doas redsnapper backup-name -- --exclude='*/jellyfin/*'
    132 ```
    133 
    134 I returned to my computer a couple hours later. redsnapper had stopped, with a whole lot of files extracted and a couple errors at the bottom about symlinks.
    135 
    136 I figured, this time, it had probably done everything properly but couldn't create the symlinks (probably a flag missing somewhere). I manually went through my files creating the symlinks, and then brought everything up with docker-compose.
    137 
    138 I checked the containers. All up.
    139 
    140 I checked the logs – no immediate errors visible.
    141 
    142 I opened figbert.com on my laptop. It appeared. Service was restored. Hallelujah.
    143 
    144 ## Mistakes
    145 
    146 I made a lot of them. Here are a few:
    147 
    148 1. After shutting down my containers, I backed up my entire setup. This included a number of "live" databases, .git folders, and other data that I either did not need or could reconstruct once the move had been completed.
    149 2. I didn't back up the .env file I use to store secrets for use in docker-compose.yml. I was luckily able to reconstruct it from individual secrets I stored in my password manager.
    150 3. A thorough read of the manpages before I started (rather than just the online guides) would have revealed several helpful flags: -v to see what files tarsnap is operating on, --aggressive-networking to take advantage of the datacenter internet speeds, and --recover to resume interrupted backups, to name a few.
    151 4. We already talked about Jellyfin. Even with very little content in Jellyfin, the collection took up huge amounts of space on disk and in the backup (especially because video files don't compress well), and sat entirely unused. It is now gone. Good riddance.
    152 
    153 ## Future
    154 
    155 What did I learn? Well, I'm still devising a plan to prevent things like this from happening in the future. Here's the plan currently:
    156 
    157 ### Backups
    158 
    159 Back up everything every day. I'll build a buffer of three "rolling" backups, where backups collect up to a max of three and then, as new backups are created, the older backups are removed.
    160 
    161 The backup script will shut down the services, dump the databases (i.e. convert as much content to plain-text, easily-compressible formats as possible) and make a time-stamped backup (currently only with Tarsnap, but perhaps in the future with a number of other services).
    162 
    163 ### Restoring
    164 
    165 Simply having high-quality backups to restore will already be a huge leap forward. I'm also *definitely* going to continue using redsnapper: the speed gains it gives on large backups are crucial.
    166 
    167 ### Manpages
    168 
    169 I really should read all the documentation before I try something new.
    170 
    171 ## Bye Bye
    172 
    173 I'll write further about my self-hosting setup as it evolves, and publish the backup script once its finished. I'll also maintain a dedicated page on my site describing my self-hosting setup as it changes.
    174 
    175 Also, I'm sure there are people more knowledgeable about Tarsnap than I. That's basically the point of this article. If you are one of these people, please don't hesitate to email me (figbert@figbert.com) if you've got corrections, advice, or just want to flex that you know how to do backups better than I do.
    176 
    177 ---
    178 Relevant links:
    179 => /log/2020-11-01-moving-to-hetzner-from-digitalocean.gmi Moving to Hetzner Cloud from DigitalOcean
    180 => https://www.tarsnap.com Tarsnap
    181 => https://github.com/directededge/redsnapper redsnapper
    182 => https://www.tarsnap.com/helper-scripts.html Tarsnap 3rd-party helper scripts
    183 => https://www.tarsnap.com/tips.html#back-up-live Backing up databases with Tarsnap
    184 => https://jellyfin.org Jellyfin