A setback and the importance of backups

One would think, that someone who pushes the importance of backups so hard, regardless of HA and DR (which I maintain to this day), that they would have backups in place for their own infrastructure, right? Well, I didn’t, it was on my docket, but like many before me in their own respective environments, I pushed it off for a myriad of reasons. A couple of those revolve around the fact that I can not for the life of me figure out why whenever I attempt to mount NFS hosted from my FreeNAS server to any other box, it times out with the only error and log message being that it timed out with no lead in. Another reason being that I was working on stuff that was far more exciting, which is a low bar to begin with. Anything is more exciting than backups. And because of my rush for getting to the seemingly greener grass on the other side of the fence, I lost almost everything. My only saving grace is the inherit nature of Git being decentralized.

Alright, so enough beating myself up for my stupidity, let’s go over what happened, what I did, and what I am going to do. My OpenStack cluster was built using Packstack and the RDO project. Which for me, was a great way stand up OpenStack quickly to learn the ins and outs of OpenStack and get familiar with it without getting overwhelmed from the complexity that is OpenStack. I was looking to deploy Kubernetes within OpenStack in order start working with containers. However, the Heat service was not installed, which was required to deploy Kubernetes quickly. So I edited my Packstack answer file to enable HEAT and reran Packstack. It failed because Keystone was throwing errors. Mainly it couldn’t find a column in a table of its database. This only started showing after I ran Packstack. So I started tweaking by hand and came across instructions for v3 of the keystone API, but not v2 which is what Packstack had installed. So I attempted to upgrade Keystone to v3. Upgraded fine, but it was still throwing the same database error.

After a few hours of fighting, and coming up with no reason for why the database would be missing columns, I gave up. I didn’t know enough about the database to be able to fix the problem, and my database/SQL-fu is lacking. I quickly became depressed shortly after making the realization that without Keystone working, absolutely nothing in OpenStack worked. Neutron, Nova, Horizon, everything depends on Keystone. Which judging by its very name I should have been able to guess even with no OpenStack experience. After pondering for a bit, an idea occurred to me.

What if I were to recreate Keystone to match what the services are trying to get to? They should just authenticate and be on their way right? Well, yes and no. The services themselves connected just fine, but the tenants they wanted to associate to for what they already had and knew about were no longer there. At this point I’m now having to make a choice. Do I want to try and recreate and adjust the database directly in order to restore what I have, OR do I just call it a lost, lesson learned, and recreate the whole thing. If I was in a business and had better SQL-fu on my batman belt, then I would absolutely attempt to recreate and fix what is out there. But my goal is to learn and setup cloud. While SQL is something I want on my toolbelt eventually and have some courses on my radar to address that, I didn’t want to jump into the deep pool with sharks while I still learning how to swim and no lifeguard on duty. So I opted to rebuild the whole thing making some notable changes this time around.

First big thing, when a service is brought up, before I even think about doing another service or something else, the backup scripts and jobs need to be up, running, and verified. Second, instead of leaving OpenStack setup to RDO/Packstack, I’m going to setup OpenStack by hand with the current release (Ocata as of this writing, I was on Newton originally). Manual setup is going to help with a few things. 1) I will have a better idea of how to interact with OpenStack on a command line level and what it in turn is doing between the various components. And 2) RDO/Packstack has messed me up twice now and caused me to both times recreate the whole thing. While I would still recommend RDO/Packstack for people getting started with OpenStack and understanding what it is and how to work with it, I do not recommend it for a production like deployment. I know that Packstack is using Puppet, which should in turn be idempotent. While having seen Puppet work idempotent I have not seen Packstack work that way and is just not working out for me.

TLDR: Backup any data and service that you depend on. Even if it is boring, you waste too much time rebuilding because of lost data, rather than just restoring the data and moving on to the cool things you would rather be doing.