The Beauty of a Software Defined Data Center
Generally speaking, data centers have been custom built, manually managed and lack uniformity of physical resources. They consist of proprietary devices that have changed over the years. For many years, network and storage vendors were way behind the times in regards to automation and didn’t really innovate much. While you could automate a server build, in order to monitor and deploy software to any commodity hardware you still had to configure the network and storage by hand, or gamble and use kludgy expect scripts and hope nothing went wrong. You had to know where a server was plugged in and hope your information was right or disaster could strike.
These days your infrastructure is finally catching up to your software development practices on a large and generally available scale. You no longer have to be Google or Facebook with custom devices to get manageability and programmability in an easily repeatable fashion. Although there’s no question that this clearly points to virtualization, any company that cares about ultra high performance and ultra low latency has to manage a large footprint bare metal servers as well. Let’s look at how we can manage all the infrastructure pieces to have the ability to spin up pristine autonomous systems.
If you’re an experienced Systems Administrator or Operations Engineer, over the course of your career you’ve probably developed a series of tools with scripts to monitor performance, provision, and manage all the unique systems you’ve encountered in a variety of environments over the years. You also likely have a great neck-beard you’ve been growing nearly as long. You know how to login to any console or system and tweak just the right knobs needed to fix applications, network issues, performance issues or anything else you run into.
For the longest time, systems administration involved managing a bunch of individual servers as special snowflakes; servers existed in a specific environment, hosting a specific tool and only the necessary tweaks were applied.
Over the past few years, IT and Operations has been moving away from this model. The same type of server is managed and configured in the same way for all environments. Configuration management tools such as SaltStack, Puppet, and Chef are used to make configuration deterministic and repeatable; servers are to be treated as cattle and not pets.
This is all really great – it’s really easy to configure a server with Salt once it’s built – but building a server, especially bare metal, can still be a hand held automated process. It involves using Foreman (as we do, but you could use Puppet Enterprise, Cobbler, Fully Automated Installer (FAI) or MAAS) to PXE Boot, run discovery, automate a preseed/kickstart, and fire off configuration management. A more advanced option might include NFS/SAN booting a server off a known base image such as the interesting yet underutilized VMWare AutoDeploy.
Of course in the software defined datacenter we also need to use virtualization. It makes resources easier to manage and fully utilize. At Belvedere we're mostly experimenting with KVM and oVirt (opensource RHEV)for features like:
- SR-IOV NICS that allow development teams to test kernel bypass and other more advanced features
- VM CPU and IO prioritization
- Ease of Administration
- Access to low latency data feeds and connective.
- Low cost of entry
If you combine all these tools together you can programmatically build and create new baremetal or virtual servers. Such tools can take an inventory of available resources and build an entire environment in mere hours. If you get really fancy you could even have robots in the data center racking new servers on demand.
Servers all have different needs for network connections. Is this an LACP or trunked interface? What VLAN(s) does it need to support, what’s native? What port(s) is this server connected to on the switch? What’s the topography of the switch? What of the many network routing protocols (BGP, eBGP, OSPF, IGRP or IS-IS) are in use for this new network? It used to be that your friendly network engineer or datacenter technician would need to provide all this information, setup new VLANs, enable routing, etc. before you could build a new network or server.
Now, we have standards like LLDP or CDP that are enabled everywhere. You can know how a server is connected and have standard information about how this type of server is setup. We now have fancy API’s like Cisco NX-API, Arista eAPI or Junos REST API. Utilizing these API's you can query switches, look at the current state of affairs, programmatically map a network and generally discern everything you need in order to create something new (network, connection, etc.) without duplication – maybe with a few rules about IP address and ranges for different areas.
We also have truly software defined solutions such as VMWare NSX, AWS VPCs and Virtual Switches that we can use and plumb together programmatically.
Until recently, storage vendors have been living in the past with storage controllers that were hard to manage with limited open standards. Every new tool you wanted would cost a lot of money and make life only slightly easier. Supplied tools such as HORCM for replication were clearly still straight out of the 1970’s in 2005. You had to have a fiber channel network deployed and connected. Upgrades required a forklift and long outages with a lot of risk.
All that’s changing now. Upstart vendors like XtemIO Pure, Nimble, Kaminaro and SolidFire gave the classics a run for their money. They have easy to use and manage platforms. They support easier to integrate standards like FCoE or iSCSI that require no additional infrastructure. They have the ability to snapshot existing LUNS while reducing the amount of required space. They have great dashboards to monitor and track ongoing performance. You can safely and transparently move LUNS and data between controllers. We even have truly software-defined solutions such the can run on commodity hardware such as VMWare vSAN, Ceph, Gluster and a handful of ZFS based products that offer most of these features as well, although sometimes involving steep learning curves.
Generally, storage has come a really long way in the last 10 years. All this is great because you don’t need a dedicated team of storage administrators for modern storage platforms. Modern storage platforms also all have the most important features for a software defined datacenter, an API that can be used to manage connections, initiators, targets, snapshots and clones, which enables easily and programmatically controlled storage.
Putting it All Together
So what good is all of this? Developers and QA often want it all and they want it now. If you’ve ever worked in infrastructure or operations you know how true this is: "I need a copy of this database", "I need a totally isolated environment to develop these 5 features", "I need a full QA environment for this hotfix that needs to go out tomorrow!", "Oh, and in 10 days we’ll stop using it, resources will be wasted, but we won’t tell you..." All the great advances in infrastructure now make it possible to now manage all these types of requests in an automated and repeatable fashion.
This is why at Belvedere we’re developing the tool we're calling Hydra. Hydra will help Belvedere dynamically provision and deploy subsets of technical systems in new, existing, and time boxed temporary environments very quickly. We hope to
- Improve developer workflow by providing a mechanism for dynamic deployment of isolated application in new, existing, partial and temporary environments
- Improve automated integration testing by providing stable, consistent environments in which these tests can be run
- Remove shared environment stability for manual QA testing.
- Improve production deployment by removing the need for manual configuration of services/clients
In other words, by programmatically utilizing modern compute, network and storage capabilities, we’re implementing tools to make it possible to quickly manage production, multiple QA and development environments. And for fun we're also including the ability for every developer to build complete sprint long custom environments, with a single command.
So what will Hydra do?
- Need a sanitized copy of the prod database? Ok, but we’ll automatically tear it down at the end of 1 or 2 sprints.
- Need an environment of 10 servers with different purposes and different uses, some requiring baremetal, others living on a VM? No problem. Hydra will build and manage the full life cycle of an environment.
- Tear down unused resources and notify people of expensive underutilized resources.
- Build you a private network with DNS, configuration management and custom application deployment.
- Keep track and manage what resource you have,what branches you’re working off and allow continuous integration without impacting the rest of your development team.
- Keep you from connecting to the wrong environment and developing in prod by mistake.
Most importantly, we'll have a quality inventory and know what's available and what you've allocated. We can charge back your team for just what you need and use, and we'll never overallocate resources. We'll also be able to manage necessary resource and look for patterns in growth without just throwing more and more hardware at the problem.
We’re looking forward to a beautiful new development paradigm.