2021-06-17-wrong-way-to-switch-server-os.gmi (9066B)
1 # The Wrong Way to Switch Operating Systems on Your Server 2 3 After moving my server to Hetzner, I built up a large collection of self-hosted services I use on a daily basis: from fun things like an RSS reader and an IRC bouncer, to critical services like my email. I ran them all with Docker Compose from a Debian VPS. 4 5 For the last couple months, however, I've been meaning to move away from Debian and towards something more minimal and clean. Over this last weekend, I decided to move to Alpine Linux. 6 7 ## The Plan 8 9 The transition was supposed to be quick and dirty: 10 11 1. Shut down all the services running on my VPS 12 2. Make a backup of relevant files with Tarsnap 13 3. Mount Alpine Virtual ISO image and setup the OS 14 4. Restore files from Tarsnap backup 15 5. Bring everything back up 16 17 In a previous move between two servers, I simply rsynced the relevant files over to the new VPS. Here, where I'm just switching operating systems on a single server, I figured I could make a backup with Tarsnap, and be done within the day. 18 19 However, backups are much more complex than simply transferring files from one server to another. My haphazard strategy resulted in three days of stress and frustration as I clambered to restore a self-hosting empire that I myself had reduced to ash. 20 21 ## Day One 22 23 I began my work on the transition full of optimism, if a bit stressed. I had read through the Tarsnap online documentation a number of times, and was ready to make my first attempt. I loaded my Tarsnap account up with USD$10 and ran: 24 25 ```sh 26 $ sudo tarsnap -c -f backup-name docker-compose.yml ... 27 ``` 28 29 My terminal sat empty for hours. There were no changes – the process was running, but there was no feedback. I was nervous. 30 31 > What if it failed silently? 32 > 33 > How can I check? 34 > 35 > What should I do? 36 37 I pressed <Ctrl-C>. 38 39 To my horror, stats printed to the screen: the backup had been 90% complete, and I had stopped it. Convinced I had ruined the backup completely, I deleted the partial backup from Tarsnap and started again from scratch. 40 41 This was my first, but not last, moment close to tears. I went to sleep and let the backup run overnight. 42 43 ## Day Two 44 45 Day Two began well: I woke and the backup was finished! I wiped the VPS, installed Alpine, and brought it up to spec. I created a regular user, configured SSH, and decided to use doas instead of sudo for a change. Alpine, so far, feels great to use. None of the cruft that bothered me when using Debian. 46 47 ### Virgin Tarnsap 48 49 With Alpine set up, I started to restore the backup: 50 51 ```sh 52 $ doas tarsnap -x -f backup-name 53 ``` 54 55 Once again, after running all day it had not finished. 56 57 I opened up a new tmux window and poked around the filesystem. All my files seemed like they were already there... 58 59 > What if it failed silently? 60 > 61 > How can I check? 62 > 63 > What should I do? 64 65 I pressed <Ctrl-C>, cutting off the download, and tried to bring everything back online: 66 67 ```sh 68 $ doas docker-compose up -d 69 ``` 70 71 It errored out. All my environment variables were undefined. Then it hit me: I forgot to back up the .env file. My eyes welled up. 72 73 Still, I was determined. I worked to reconstruct the .env file from secrets I had stored in Bitwarden (my offline copy, because my vault is self-hosted and was thus down). 74 75 I ran it again: 76 77 ```sh 78 $ doas docker-compose up -d 79 ``` 80 81 One of my services was missing a Dockerfile to build. I shouldn't have pressed <Ctrl-C>! I was a total moron. 82 83 I put on a sad song. I was close to tears once again. 84 85 I gathered what was left of my resolve and trudged onwards. I searched tarsnap's manpages looking for something to speed up my download. 86 87 I found a number of flags that could have helped me *make* a backup better the next time around, but nothing that would help me restore the backup any faster. With nothing in the manpages, I went to look at the helper scripts. 88 89 ### Chad Redsnapper 90 91 That's when I found it: 92 => https://github.com/directededge/redsnapper redsnapper 93 94 A Ruby script that runs multiple tarsnap clients at once to extract archives fast. Fucking precisely. I wiped out the incomplete files I had restored, downloaded Ruby and started restoring from the backup once again: 95 96 ```sh 97 $ doas redsnapper backup-name 98 ``` 99 100 I changed the song, and watched the files fly by on my screen. I went to sleep, confident I would wake to good news. 101 102 ## Day Three 103 104 The download had failed trying to download a large .mkv file. 105 106 ### Manual Exclusion 107 108 I restarted redsnapper, explicitly excluding the .mkv it had failed to download, and let it run until it came on another movie and crashed again (an hour or so later). I excluded the second movie file and sent it to run again. 109 110 This was a long, boring process. It sucked. 111 112 ### An Afternoon Breakthrough 113 114 Then I realized something. redsnapper kept crashing when it hit movies I had stored in Jellyfin. 115 116 > I don't need Jellyfin at all. I've never watched a movie more than 117 > once. 118 > 119 > The movies take up massive storage on disk, and keep causing tarsnap 120 > to crash. They don't compress well either, so they take up a fuckton 121 > of space in the archives. 122 > 123 > I can always download the movies again if I want to give them 124 > another go. 125 > 126 > Why the fuck am I forcing myself to deal with this shit? 127 128 I stopped the download in the middle - the day's third, after two earlier attempts that ended after encountering movie files – and changed the command slightly before rerunning. After a number of errors I couldn't explain, I realized my account was negative and topped it up with another USD$25 before running: 129 130 ```sh 131 $ doas redsnapper backup-name -- --exclude='*/jellyfin/*' 132 ``` 133 134 I returned to my computer a couple hours later. redsnapper had stopped, with a whole lot of files extracted and a couple errors at the bottom about symlinks. 135 136 I figured, this time, it had probably done everything properly but couldn't create the symlinks (probably a flag missing somewhere). I manually went through my files creating the symlinks, and then brought everything up with docker-compose. 137 138 I checked the containers. All up. 139 140 I checked the logs – no immediate errors visible. 141 142 I opened figbert.com on my laptop. It appeared. Service was restored. Hallelujah. 143 144 ## Mistakes 145 146 I made a lot of them. Here are a few: 147 148 1. After shutting down my containers, I backed up my entire setup. This included a number of "live" databases, .git folders, and other data that I either did not need or could reconstruct once the move had been completed. 149 2. I didn't back up the .env file I use to store secrets for use in docker-compose.yml. I was luckily able to reconstruct it from individual secrets I stored in my password manager. 150 3. A thorough read of the manpages before I started (rather than just the online guides) would have revealed several helpful flags: -v to see what files tarsnap is operating on, --aggressive-networking to take advantage of the datacenter internet speeds, and --recover to resume interrupted backups, to name a few. 151 4. We already talked about Jellyfin. Even with very little content in Jellyfin, the collection took up huge amounts of space on disk and in the backup (especially because video files don't compress well), and sat entirely unused. It is now gone. Good riddance. 152 153 ## Future 154 155 What did I learn? Well, I'm still devising a plan to prevent things like this from happening in the future. Here's the plan currently: 156 157 ### Backups 158 159 Back up everything every day. I'll build a buffer of three "rolling" backups, where backups collect up to a max of three and then, as new backups are created, the older backups are removed. 160 161 The backup script will shut down the services, dump the databases (i.e. convert as much content to plain-text, easily-compressible formats as possible) and make a time-stamped backup (currently only with Tarsnap, but perhaps in the future with a number of other services). 162 163 ### Restoring 164 165 Simply having high-quality backups to restore will already be a huge leap forward. I'm also *definitely* going to continue using redsnapper: the speed gains it gives on large backups are crucial. 166 167 ### Manpages 168 169 I really should read all the documentation before I try something new. 170 171 ## Bye Bye 172 173 I'll write further about my self-hosting setup as it evolves, and publish the backup script once its finished. I'll also maintain a dedicated page on my site describing my self-hosting setup as it changes. 174 175 Also, I'm sure there are people more knowledgeable about Tarsnap than I. That's basically the point of this article. If you are one of these people, please don't hesitate to email me (figbert@figbert.com) if you've got corrections, advice, or just want to flex that you know how to do backups better than I do. 176 177 --- 178 Relevant links: 179 => /log/2020-11-01-moving-to-hetzner-from-digitalocean.gmi Moving to Hetzner Cloud from DigitalOcean 180 => https://www.tarsnap.com Tarsnap 181 => https://github.com/directededge/redsnapper redsnapper 182 => https://www.tarsnap.com/helper-scripts.html Tarsnap 3rd-party helper scripts 183 => https://www.tarsnap.com/tips.html#back-up-live Backing up databases with Tarsnap 184 => https://jellyfin.org Jellyfin