Work History

I’m passionate about building network automation and operations platforms. I’ve built and run global service provider and CDN networks. I’ve built automation solutions with perl, python, ansible, and salt, using all kinds of “sources of truth” and various levels of automation and orchestration. Making life easier for network engineers and making the network more useful to other operations teams is my goal.

Early Nerditry

I’m from a small town in central Georgia, Zebulon. I started my nerditry pretty early. Around first grade I received a TRS-80 Color Computer 2 for Christmas. It was mostly to play games and improve hand & eye coordination. I quickly fell in love with writing basic programs and making the screen blink. I would spend hours meticulously copying code out of books to write small games and then spend hours tweaking the code to make things look and operate like I wanted.

Eventually I upgraded to an IBM XT and got started learning MS-DOS 2.x. Once I was able to hook up a modem and be able to dial into GA Tech’s systems (for free at the time) and learned about BBS’ I was completely hooked. After years of playing with this box and learning all about how it operated I decided it was time for an upgrade. Spending hours of browsing catalogs and magazines I found a local PC company and spec’d out a 486 DX2 with 8mb of RAM. Man I was SO excited. I was having it built for my Christmas present that year. During school Christmas vacation while my parents were at work I would take it out of the box and work on getting it all setup just how I liked it. Then carefully place it back in the box before they arrived home from work. It was amazing how well it worked and was all setup when Christmas finally came, and I was able to open it up and immediately start running .

This is where I learned a lot more about BASIC programming and also spending a great deal more time learning the MS-DOS. I recall carrying an MS-DOS Pocket Reference Manual around with me at school. It was fun learning to build batch files and how to string together tools. Using the ascii drawing tool The Draw I was able to build a whole menu system to run all my favorite tools and games. I even wrote a whole set of batch files to automatically take downloaded files from BBS’ using QModem, move them into a specific set of directories, categorized appropriately, unzip them, and clean up any unneeded bits.

Of course during all this time my friends and I spent many, many hours playing all the popular Sierra Games. All of the Space Quest, King’s Quest, Police Quest, and of course Leisure Suit Larry games!

Early Career

A&B Computers (1993-1997)

I think it was my sophomore year of high school I went to a week long “PC Troubleshooting and Maintenance” class. One of the things they stressed was to find the local “Mom ’n Pop” computer shop and become friends with them so to keep a good pipeline of parts and needs handy. Once I was home I went on the hunt for that and ended up scoring my first real job. I hounded them mercilessly for the job, offering to work for almost free for experience. At the time I got paid $7 for every new PC I built and $11 for every used one. The owner would spend weekends going to swap-meets and shows picking up van loads of misc parts. I would then go through everything picking out the working and best parts and building systems.

The owner of the PC shop and I once had a contest to try and come up with the best CONFIG.SYS setup to gain the most memory available to the OSS. He ultimately won, but only because he used an EMM386 setup to steal memory from the video card. It was a fun time.

I eventually got more involved with business PC systems and software. In my junior year of high school I studied and received a Novell Netware 3.12 CNA Certification. This allowed me to be billed out to more businesses for real work. I spent a lot of time at Carter’s building and deploying systems to run their admin and plant networks.

Trusted.net (1999-2002)

After graduating and running off to college I had various jobs doing the same kind of work, building and troubleshooting PC’s and small business networks. Being the nerdy type I had an ISDN line installed into my apartment, so I could have better than plain dial-up Internet. The trouble was trying to find someone who supported that type of connection for a decent price. After some searching I identified an ISP in Marietta, GA who could help me out. After talking with them a bit I ended up going to work for them doing basic support for everything from dial up managed on the Livingston Portmasters, to Frame Relay, V.35, T1, T3, and DSL services. Managing email, DNS, and DHCP services on slackware and *BSD. I learned a lot, it was fun!

First Real Automation Tooling

Most of the support issues I dealt with were due to misconfigurations of some sort. This ranged from incorrect email configuration settings to error-prone DSL configuration items. I hated spending all the time chasing down these issues and customers were none too happy as well. Using the ever popular LAMP stack I built a tool where we could input settings into a form and the tool would generate the needed configs. Everything from email to DLCI mappings for DSL. Initially the tool would spit out a text file that we could copy and paste into a config or a device to setup a user. Eventually I had it working to automatically log into the device or host and make the changes for us. This saved a ton of time and made sure users were setup correctly the first time. As with most early 2000’s service provider companies we were absorbed, pieced off, and eventually laid off.

Mid Career

Internap (2002-2014. 2015-2017)

After being laid off and coasting on severance I continued to sharpen linux and programming skills. I eventually found work as Tier-1 support for Internap. It was here that I reallY started to learn large scale networking. Internap was a pretty cool place with a very strong engineering culture. Everyone was encouraged to come up with tools and processes to improve operations. Semi-weekly meetings were held for engineering staff (and anyone else interested) to show case new tools, processes, and ideas. I eventually moved up the ranks from Tier-1 NOC support to Tier-2, into the IP Operations team. While in the IPOps team I designed and deployed our first DWDM networks in Atlanta and Miami built on the ADVA FSP3000 platform. No ROADM fun here, all manual attenuation and patching on the front panels. I was also responsible for the initial configuration, deployment, and daily care-and-feeding of multiple pops across the world. After several years I moved to the Backone Engineering team and then into Architecture as Senior Network Architect. I was responsible for the design and engineering of new products and services while working closely with vendors to choose appropriate hardware for those solutions. Additionally, I led the effort to implement IPv6 for the network and services.

Fleet Refresh

One of the larger projects I led was to replace our aging fleet of Cisco Cat6ks. This project would define our topology, services, and platform for the next decade or more. This project would replace gear in 50+ PoPs and amount to approximately 200+ routers and switches. We built our requirements and handed out RFQs to many vendors. We ran several concurrent Proof=of-Concept labs with Brocade, Cisco, and Juniper. It was a monumental effort that required working with all parts of the company and the vendors. Once the vendors and platforms were chosen the real work began with integration into our monitoring and automation systems. Pages and pages of documentation on deploying, operating, and general technical information were written and published. This required months of lab work taking into account all the power, space, and wiring needed. The project took almost 2 years from inception to first deployment but it was a success.

go-ssh

Being the hater of tedious and repetitive tasks it wasn’t long before I started putting my skills to work. One of the first tools I built was a wrapper around ssh. It had two primary functions. Firstly it would log all output of a session to a ~/daily_log/YYYY/MM/DD/hostname-YYYY-MM-DD-mm-ss.#.log file. This was HUGELY helpful in being able to lookup some command I had run previously or check some troubleshooting session to see how things looked previously. It also helped to show, generally, what I was doing every day. The second function was to automate the targetting of jump hosts to reach devices. The network was setup so from our desktop host we would ssh into a central jump point, then off to a management host in the remote pop, and then to the ultimate device. That meant having to execute three ssh commands to log into a router. The wrapper would build and then chain all these commands together such that one need only input the end device. It would figure out what hosts to jump through, build the ssh command, execute it and drop you into an ssh session on the device. All the while logging all input and output for the session in plain text files setup in the user’s home directory.

gencf

Many years and tools later I had become the go-to guy for development on our automation tool, gencf. This tool would take in router config in a DSL (Domain Specific Language) and could then express it for all the various platforms we used. This ranged from Cisco Cat 6ks, Cisco ASR9000’s, Cisco 7206VXR’s, Juniper MX’s, Brocade MLX, Juniper EX, and Brocade SLB’s. An additional function of this system was to provide a diff of the generated resultant config and the devices’ running config. If the engineer was happy it would then perform some basic operating checks, apply the config, collect more operating checks, and then present the user with a full status report of the deploy. While I did not write the original tooling I was the maintainer for the last several years I was at Internap.

Internap MIRO - Patent US9525638

One of the primary business cases for Internap was selling it blended transit model being managed by MIRO (Managed Internet Route Optimizer). This tool would direct egress traffic from the pop based on performance and cost. It would look at flows incoming to our customers in the pop and then probe those sources across each of the pop’s upstream providers. Then given we now had performace (loss & latency) for “important” prefixes and we know the amount of traffic each prefix represented and how much bandwidth we had from each upstream provider and the cost of each we could calculate a “solution” that would optimize for performance and cost. The tool would perform these calculation about every hour or so and move traffic as needed. It was really pretty neat. While I was in IPOPS the 2nd generation of this tool was deployed and worked pretty well. Several years later it was decided that we could take advantage of technology advances and rebuild MIRO to be more effective. At the time I had moved into the Architecture dept and was lead on this project for technical implementation and all network requirements. The 3rd generation made changes approximately every 90 seconds, was much more reactive to problems on the internet, had better reporting, supported IPv6, and a plethora of other features. We were able to secure a patent on the work: Routing System for Internet Traffic - Patent: US9525638

Synchronoss (2014-2015)

At Synchronoss I spent most of my time on general network admin and engineering tasks. However, given my experience with IPv6 I was responsible for the design and deployment of the IPv6 design. This required a general design template detailing how the address spaces was allocated for the company including internal LAN and Admin networks along with data center networks. I also implemented tooling to track and manage allocations for both IPv4 and IPv6 allocations.

GreenSky (2017-2018)

At Greensky I spent most of my time working in Amazon’s AWS environments. The network used both on-prem gear along with virtual routers in AWS to connect and manage traffic between physical sites and virtual sites across the many AWS regions. I implemented multiple monitoring systems to aid in alerting to faults in any of the routing across the network and providing reports on utilization of resources. I also implemented systems to track configurations and ensure compliance to standards for network devices.

Verizon Media / Oath / EdgeCast / Edgio (2018-2021, 2023-)

Verizon Media I was originally hired on to perform network architecture and engineering tasks but found myself spending most of my time on network automation. The system was built on top of Ansible, a huge corpus of YAML, and tools to pull in data from other “sources of truth”. The system was mostly in place when I started but I added lots of features and capabilities. I was the general SME and primary maintainer. I was also the primary person to manage the systems the automation ran on. I managed the Salt recipes so the underlying machines could be quickly and easily deployed. I migrated these systems from a single host to run in a primary/backup setup across multiple data centers. Upgraded everything from python 2.x to 3.x and standardized methods for deployment and testing.

The tool worked pretty well but was slow. It took several minutes to compile all the data and then express it through jinja to ultimately be pushed to a device. I used ansible’s built-in profiling tools to help find the process that took the most amount of time and then focused on making those faster. Ultimately we were able to reduce the time taken by 70%, saving lots of time.

Data Model Verification

A consistent issue with the automation tools was the data being input into the models by hand editing YAML. Eventhough the data was syntactically correct YAML sometimes it would not make sense for the deployment models, templating, or within the grammar of the device. I wrote a tool to check for common mistakes. This included things like making sure LLDP was enabled (or disabled) on the correct interface types, BGP peers had proper ASNs, netflow was enabled on needed interfaces, IP address were not attached to layer-3 interfaces, etc.. The tool would perform checks before the user would perform a test build or test deploy then indicate if any problems were found and then indicate exactly what the problem is and how to fix it. Several of the test included a link to documentation detailing the issue and specifically what was expected and why. This cut down on support issues from the users greatly.

Deployment Verification

The tooling aided significantly in making sure the data represented in our model was deployed to the devices. However, it did not do much to make sure what was in the model was exactly what was intended and further, did not do much to verify the operation state of the device after a change was made. I developed and wrote tooling to briefly grab operational stats from the device prior to a deploy and then afterwards. The tool would also grab the config from before and after.
It would also check routes, bgp peers, etc… and present the user with a concise report detailing if anything unexpected changed or was out of spec. The tool was written using pytest as its “testing” engine. This made output easy to parse, diff, and display; not only for the user but for later validation if needed.

Partial Deployments

The automation tooling primarily made deployments to devices by uploading the entire config every time something was changed. This helped ensure the devices were kept in compliance. However it would have unintended consequences in that changes one person made could affect the entire network, things like a prefix list update for a peer. So a user making an interface update would see and have to reconcile those changes from other users. Since this caused unnecessary work and cognitive load on the user it was decided to investigate partial uploads. I ran with this project and developed the workflow to keep the “full” builds consistent but allow partial uploads. I then developed all the tooling and processes to work within ansible and python to make partial uploads a reality. This included major re-works of the core python code and ansible playbooks. Once deployed our velocity for rolling out changes for things like upstream provider and peering point adds and changes was accelerated greatly.

Imperva (2022)

I had followed my director over to Imperva to help build and implement a network traffic controller. After spending a month or so there it looked like this wasn’t something they really wanted or needed at the time. I was placed into the DevOps/SRE group. Most of my responsibilities included managing several Jenkins pipelines and managing the Salt infrastructure recipes. This work was primarily focused around deploying new versions of O/S software and their primary internal traffic filtering applications. I had worked with prometheus and grafana quite a bit at Verizon Media but spent a great deal of time on the management of dashboards, alerting, and configuration here. I built several tools to generate dashboards as new hosts came online. This was due to some of the specific requirements around monitoring that couldn’t easily be done with normal prometheus queries directly.

Edgio (EdgeCast + Limelight) (2023-)

Near the end of the 2022 year the director I had followed to Imperva left to go back to EdgeCast to run their whole operations org. Since I wasn’t doing much network automation and he still wanted to build a network controller I followed him back. I started in January of 2023, however was placed on the Limelight side of the house. I immediately began working on their network stuff. This involved adding capability to the aging Rancid, support for new device types, commands, etc. Additionally I began working on an orchestration and automation system that could be used for both Limelight and EdgeCast. My manager at the time was instrumental in working through all the requirements and capabilities needed. We worked out a pretty good system. While working on the new system design I still had projects that needed work immediately. I created salt recipes to upgrade many of the network group’s systems, and manage all the applications, specifically a lot of work for StackStorm. Around mid 2023 I was moved back over to the EdgeCast side of the house to aid in speeding along many of the network deployment projects. Since I was already familiar with all the tooling I was able to jump right in. While working on some of the ansible playbooks, router templating in Jinja2, and backend systems I was spending time updating hosts that hadn’t been managed in a while. During this time I was still working toward the “next gen” network automation and orchestration system that could manage the fleet on either the EdgeCast or Limelight side, using different sources of truth, and network architectures. The idea was to build a system that could be used as a general framework and adapt to the unique requirements and workflow for either network.