Into Cloud Networking

AWS-Certified_Advanced-Networking_Specialty_512x512.aed6ffb8112b76230b433a27efefe96eeab05684

After a long break, I went back to my cloud networking studies and invested some time on pursuing the Advanced Networking Specialty from AWS. At first, I thought this would not be a huge undertaking by having a background in networking but I was wrong. This is more than networking, it is also about security controls, load balancing, content delivery, name resolution, computing, automation, monitoring and other related services. I ended up getting a 60% score on my first attempt and realized right after that I needed to focus on other areas I had not explored enough, because again, this is more than networking. If I had passed right away, I would not have learned as much when trying it again.

In the hope my experience may help others as a just cause, I am writing a list of supporting materials which worked for me along the way.

Online Training
I used acloud.guru for the associate exam earlier in the year, and gave it another go for this exam as well. However, it does not have hands-on labs (as of this writing). This udemy training has a few hands-on labs which were very helpful and insightful. I leveraged a business subscription but both providers have affordable options, and one can leverage AWS free tier.

Another must attend online training is the exam readiness provided by AWS. It is a 9-hour training divided into specific modules with associated domains, and very comprehensive.

Documentation
The list is almost endless; it all depends on how much we already know about certain technologies or services. I did not keep track of every single blog or guide I read or revisited, but I bookmarked some of the most important ones I found during the preparation:

Multiple Data Center HA Network Connectivity
Single Data Center HA Network Connectivity
Multiple-VPC VPN Connection Sharing
Multiple Region Multi-VPC Connectivity
Amazon Web Services – A Practical Guide
VPC Peering Scenarios
AWS Direct Connect

This also includes a few deep dive videos from re:Invent on Direct Connect and VPNs (NET402, NET403, NET404), and specific FAQs for VPC, Route53, ELB, DX, and CloudFront.

Practice Tests
I used practice tests from whizlabs and udemy, the later having more descriptive and challenging scenarios. I think one complements the other.

Slack Channel
A few months ago, Daniel Dib created a channel for Cloud and Networking. It is an amazing initiative and while there may be many others out there, this one definitely deserves a follow.

Now I will take some weekends off and catch up on greys anatomy ;=P

Introduction to Automation Anywhere

illustration_automation-anywhere-digital-workersAt first, one may think this is about automating a network or any kind of API-driven device with the intent to eliminate repetitive tasks, and not necessarily a business process or workflow. Well, at least this was my first thought when I heard about Automation Anywhere, a developer of Robotic Process Automation (RPA) software. There are not that many out there so an introduction to what RPA means and why it is being considered an emerging technology sounds appropriate in a time where automation is such a hot topic.

Robotic Process Automation

It is a technology to automate business processes based on software robots (or simply bots) or artificial intelligence (AI) workers. RPA may also be referred as “Intelligent Process Automation” by the industry, and the bots or workers as Digital Workforce. Does it mean they are replacing humans? I would say they are helping humans by making routine activities, repetitive, or sometimes cumbersome tasks or actions more efficiently.  As opposed to traditional automation where a specific scripting language or API are required most of the time, RPA leverages a list of actions and interactions an user must perform as part of a process using one or more graphical user interface (GUI), and perform the automation of these actions directly on the GUI on behalf of the user provided proper rules are defined. It essentially learns from human behavior.

It is not only about performing typical human tasks, but intelligently handling data in or between multiple applications, such as looking at an invoice or a spreadsheet, extracting the data, and adding it to a bookkeeping system to record financial transactions as part of a business accounting process. The software bots can complete a business process end to end, and the technology can be applied anywhere, as long as there is a business need and desire for process speed and optimization.

The Market and Business Value

As with most things, what is the market and business value? In a recent article from Gartner, it says the global RPA market grew by 63% last year, reaching around $846 million in revenues and expect it to reach $1.3 billion this year. This is also validated by the Global Intelligent Process Automation Market report from Data Bridge with an estimated value of $19.79 billion by 2026. While predictable numbers makes RPA appealing, organizations are also looking for the business value. The promise is that by bringing more business processes automation to perform repetitive tasks, it will automatically enable employees to instead focus on value-added activities.

Automation Anywhere

Automation Anywhere was the first RPA company to ever present at Tech Field Day on June 2019. At TFD19, the first 10 minutes was a very passionate introduction from Mihir Shukla, CEO. He did not explain the how or what, but the why, which is to “bring the power of automation to the masses, democratize it so that simple automation can be done by a business user and more complex automation can be done by a developer, with a vision that anything that can be automated should be automated“.

This is proved by the a vast amount of collateral and resources published at their website, which also includes a Bot Store. Besides the Enterprise version, Automation Anywhere also released a community version. I am always pleased when I see companies educating consumers and practitioners about their products and providing a means to actually interact with it.

Additional speakers from the Product Management Team followed with a presentation about the overall product vision with Abhijit Kakhandiki, and a very detailed view of what is needed for a RPA adoption to be successful and product deep dive with Steve Shah. He went over the combination and collaboration of IT, Business, and Developers and what is expected from each stakeholder. He also provided a short brief on the microservices architecture composed by, but not limited to, bot control room, bot creator, and bot runner as well as the difference between unattended and attended bots use cases.

On the second part of the presentations, Steve Van Lare gave a view of IQ Bot, which adds cognitive automation capabilities (i.e.: document and language understanding) to RPA, essential when locating and organizing unstructured data before a process can begin. This is where one can get some fundamentals of AI skills and the different categories (build, train, deploy and learn). Automation Anywhere leverages their own AI stack or other 3rd party AI stacks. Brendan Foley finalized the session with a view of the available resources from users to developers, and also a glimpse of the Bot Games initiative.

As a first appearance for Automation Anywhere, I think this provided a lot of insights into the technology and also things for us to think about with regards to a future workplace. Demonstrations with real use cases across industries can be an extension on upcoming sessions. As for me, I  am very grateful to be part of TFD19, which allowed me to be in the delegate panel one more time and have the opportunity to learn more about what is out there, especially when I am not actively looking at these emerging technologies.

Where to Start?

If you want to learn more about bots, the “Getting started with Automation: 7 steps to building and managing bots” is a start.

The Future

The future is here. Tomorrow is just a better version of what we have available today.

 

In Between

I graduated from a 9-month fast track leadership development program through Dimension Data. The last 3 months were the most intense and time consuming. In a team of 6 people from different countries and regions, the topic for graduation was Global Employee Mobility.

Some call it a mini-MBA, some call it a speedy-MBA, but regardless of the definition, it is a lifelong experience. Several interviews, external research, insights from multiple people, design thinking approach, all the way to a prototype. I had the opportunity to interview mobility pratictioners at VMware, Dell, NTT Data, and IBM. I essentially stretched myself to places I never been before.

I attended Finance, Marketing, Strategy, Leadership, and Innovation classes from IE Business School. I learned about the Six Thinking Hats and the Blue Ocean strategy, among other things. I also learned about worthy rivals on this Infinite Game talk by Simon Sinek.

My first time in Madrid, first time in South Africa, first time on a 15-hours long flight yet this year, first time presenting to an audience as if I was going to fly right after. This experience also allowed me to learn more about myself with and through others, and most importantly acknowledge weaknesses and areas for improvement. The balance between introspection and emotional intelligence.

While all of this was happening, I still managed to listen 4 audible books (Personal MBA, QBQ!, Quiet, and Moment of Lift), which is my recent addiction. I had a podcast released about my journey to a career in technology and all I could appreciate is the fact I need some serious pauses in live conversations and public speaking.

For now, I plan to catch up on Cisco Live presentations as I missed the event this year for a very good cause.

Datera and Modern Data Center

Datera presented at Tech Field Day 18 and provided additional insights on their software-defined storage solution as well as perspectives on modern Data Centers. Datera’s Team:

– Al Woods, CTO
– Nic Bellinger, Chief Architect and Co-Founder
– Bill Borsari, Head of Field Systems Engineering
– Shailesh Mittal, Sr. Engineering Architect

The presentation started with an introduction on enterprise software-defined storage, characteristics of a software-defined Data Center, overview of their data services platform pillars, business use cases, and current server partnerships that their software runs on.

no-silos

Datera’s meaning of a software-defined Data Center includes these characteristics:

1 – All hardware is virtualized and delivered as a service
2 – Control of the Data Center is fully automated by software
3 – Supports legacy and cloud-native applications
4 – Radically lower cost versus hard wired Data Centers

With this mind, the building blocks of the Datera Data Server Platform architecture were also presented which is extremely relevant for those interested on what is “under the hood” and how it is built, and where standard functions such as dedup, tiering, compression, and snapshots happen. Datera focused on demonstrating how the architecture is optimized to overcome the traditional storage management and data movement challenges. This is where one needs to have some background on storage operation to fully understand the evolution to a platform built for always-on transparent data movement working on a distributed, lockless, application intent driven architecture running on x86 servers. The overall solution is driven by application service level objectives (SLOs) intended to provide a “self-driving” storage infrastructure.

There were no lab demos during the sessions, however, there were some unique slide animations on what Datera calls continuous availability and perpetual storage service to contextualize how their solution works. The last part of the presentation was about containers and microservices applications and how Datera provides enough flexibility and safeguards to address such workloads and portability nature.

Modern Data Center

In a whiteboard explanation, Datera also shared that they have seen more Clos style (leaf-spine) network architecture on modern Data Centers, and see themselves as “servers” that are running in a rack-scale design alongside independent compute nodes. The network is the backplane and integral part of the technology, as compute nodes access the server-based software storage over the network with high-performance iSCSI interfaces. It also supports S3 object storage access.

One of the things I learned during the presentation is their ability to peer directly with the top of rack (leaf) switches via BGP. The list of networking vendors is published here. Essentially, Datera integrates SLO-based L3 network function virtualization (NFV) with SLO-based data virtualization to automatically provide secure and scalable data connectivity as a service for every application. It accomplishes this by running a software router (BGP) on each of its nodes as a managed service, similar to Project Calico. Al Woods wrote about the benefits of L3 networking very eloquently on this article. I find it interesting how BGP is making its way inside Data Centers in some shape or form.

In addition to the L3 networking integration for more Data Center awareness, Datera adopts an API first design by making everything API-driven, a policy-based approach from day 1 which is meant for easy operations at scale, and targeted data placement to ensure data is distributed correctly across physical infrastructure to provide for availability and resilience. This is all aligned to the concept of a scale-out modern Data Center.

As a follow-up, Datera will also be presenting at Storage Field Day 18 and there may be more opportunity to delve into their technology and have a glimpse of the user interface and multi-tenancy capabilities.

Tech Field Days

IMG_4402

… a continuation of the beginning.

I had my second appearance on Tech Field Day 18 this week, which is a major accomplishment for someone who for a long time was quiet on social media and the likes. Not that I have changed much, but it is definitely a huge step being “outside”. I had the opportunity to meet several very professional, intelligent and insightful people with great minds and an amazing ability to express themselves (in writing, speaking, improvising, podcasting). It is been an overwhelming learning experience more than anything else. Social (soft) skills do not come overnight, and it is fundamental on building trust and long-term relationships with people who naturally enlight, inspire or pave the way for others.

The video presentations have been posted for Datera, NetApp, VMware, and SolarWinds.

 

ACI Troubleshooting Notes

screenshot2019-01-31at5.43.53pm

I attended a 3-day ACI Troubleshooting v3.1 bootcamp this week and I have to say, even though I do not get involved in actual implementation after the architecture and design, it is always valuable to understand how things (can) break and ways to troubleshoot. Here are some notes I put together:

Fabric Discovery

I learned that show lldp neighbors can save lives when proposed diagram does not match physical topology. Mapping serial number to node ID and name is a must before and during fabric discovery.  The acidiag fnvread is also very helpful during the process.

Access Policies

For any endpoint connected, verification can be done top down, bottom up, or randomly, but regardless of the methodology, always make sure the policies are all interconnected. I like the top down approach, starting with switch policies (including VPC explicit groups), switch profiles, then interface policies and profile followed by policy groups. This is where all policies need to be in place (ie.: CDP, LLDP, port-channel mode) and most importantly, association to a AEP, which in turn needs to be associated to a domain (physical, VMM, L2, L3) and a VLAN pool followed by a range. If they are all interconnected, the AEP is bridging everything, then comes the logical part of the fabric.

I can only imagine what a missing AEP association can do in a real world deployment.

L2 Out

By extending a bridge domain to an external layer 2 network, a contract is required on the L2 Out (external EPG), that is known. Now, assuming this is a no-filter contract, it can be either a provider or consumer, as long as the EPG associated to the bridge domain being extended also has a matching contract, that is, if the L2 Out has a consumer contract, the associated EPG needs to have a provider contract. If L2 Out has a provider contract, then the EPG needs a consumer contract. In short, everytime I think I finally nailed the provider and consumer behavior, I learn otherwise.

L3 Out

Assuming all access policies are in place, in a OSPF connection, the same traditional checks are required, from MTU to network type. If the external device is using SVI, network broadcast is required on the OSPF interface profile for the L3 Out. I had point-to-point for a while. This is probably basics, but sometimes one can spend considerable time checking unrelated configuration.

Static Port (Binding)

Basically the solution for any connectivity issue from endpoints behind a VMM domain. I have seen it working with and without static binding of VLANs. In the past, I would associate this with the vSwitch policies, where as long as the hypervisor sees the leaf on the topology under virtual networking, no static binding was needed. Not the case anymore. The show vpc extended is the way to show the active vlans passing through from leaf to the host.

API Inspector

It is the easiest way to confirm specifics for API calls. With Postman, it is just a matter of copy and paste of the method, URL and payload while having the inspector running in the background for a specific configuration via GUI.

VMM AVE

Very similar process as deploying a distributed virtual switch, only that it needs a VLAN or VXLAN mode defined. If running VXLAN encapsulation, a multicast address is required along with a multicast pool, as well as a firewall mode. All the rest of the configuration is the same as far as adding vCenter credentials and specifying the Data Center name and IP address. After doing the process a few times without any success, and AVE not getting pushed to vCenter, I enabled infra-vlan on the AEP towards the host, which is a requirement when running VXLAN, and there it goes.

Follow-up

The official ACI troubleshooting e-book has screenshots based on earlier versions but is still relevant as the policy model did not change. For most updated troubleshooting tools or tips, the BRKACI-2102 ACI Troubleshooting session from Cisco Live is recommended.

 

Cumulus VX Spine and Leaf

After hearing the word Cumulus twice from different initiatives on the same day, I decided I wanted to know more about Cumulus Networks in general, and playing with VX seems to be a great start. I am already running Vagrant and VirtualBox for other means, so having an additional box is easy. Well, the idea was just an additional box but after doing some GitHub investigative work, found out that there is already a pre-defined Cumulus Linux Demo Framework or Reference Topology available for consumption. I quickly followed this repository and built my own spine and leaf architecture:

cumulus

The whole process did not take more than 10 minutes. There is a lot that goes in the background, but still, not bad for a virtual non-prod environment or validation platform that supposedly has the same foundation as the Cumulus Linux and Cumulus RMP versions, including all the control plane elements.

The configuration is done on each of the VMs using the Network Command Line Utility (NCLU) or by editing the /etc/network/interfaces and /etc/frr/frr.conf files. This definitely requires some “essential” Linux skills. Multiple demos are available here using this topology, including NetQ. I have tested the config-routing demo and it worked perfectly with two spines, two leafs, and two servers. It uses an ansible playbook to push the configuration to the spine and leafs, as well as adding new interfaces to the servers for the connectivity test. A nice way to test the OSPF and BGP unnumbered concept. 

The fundamental piece is the FRR (Free Range Routing) responsible for EVPN, BGP, and OSPF functionality. Pete Lumbis did an excellent whiteboard session at Networking Field Day 17 by going over the building blocks followed by a demo on a similar topology running Cumulus VX.

Ansible Tower on Vagrant

I am still on the re-install apps land on the macOS, and this is a mini guide on how to install Ansible Tower using Vagrant for demo/trial usage only.

The first step is to install Vagrant if not already installed for other means. Vagrant relies on interactions with 3rd party systems, known as “providers”, to provide Vagrant with resources to run development environments.  I am running VirtualBox

To verify the installation of both Vagrant and VirtualBox:

vagrant --version

vboxmanage --version

Once the installation of both Vagrant and VirtualBox are completed, Ansible Tower can be initialized by creating a Vagrantfile with default instructions in the current directory as follows:

vagrant init ansible/tower

vagrant up

The process takes a few minutes the first time, and once complete:

vagrant ssh

The vagrant ssh command will give you your admin password and the Tower log-in URL (https://10.42.0.42). This is using the default (basic) settings in Vagrantfile and it can be edited further, including a more specific name for the ansible VM.

To verify the Ansible version:

ansible --version

At the moment, there are two trial/demo licenses available: one with enterprise features such as LDAP and Active Directory support, System Tracking, Audit Trails, Surveys, and one limited to 10 nodes and no expiration date, however, it does not include the enterprise features just listed. The open source alternative (or non-enterprise version) with no node limitation is the project AWX.

Below is the main (default) dashboard of Ansible Tower:

ansible tower

And here is a nice walk-through on the GUI: Ansible Tower demo.

Tip: if by any chance 10.42.0.42 can not be accessed the first time, check the routing table (ip r) and interfaces (ip a show) to see if 10.42.0.0/24 is listed on the Vagrant VM. If not listed, reinstall everything. 

Apstra in a Whiteboard

As an occasional and very ordinary writer, I think this topic deserves a “direct from the source” approach. I am definitely using credits from my blank slate bucket. In other words, nothing I write would be better than reading from who explained in a whiteboard at Networking Field Day 19 so simple and easy to understand what the Apstra AOS (Apstra Operating System) is all about and its building blocks.

Besides doing a great whiteboard session, @_vCarly also published an outline of that very busy morning at Apstra’s NFD19 Experience. It is a detailed narrative that goes from the reference architecture made of the AOS server sitting at the orchestration (or management) layer and agents installed on each individual switch (supporting modern leaf-spine designs or extensible to other environments) to the building blocks: logical device, rack type, template, blueprint, interface map, device profile, resources, and managed devices. They are all interconnected and it makes more sense when delving into the whiteboard. There is no better way to get a clear understanding of Apstra other than watching the original video followed by her narrative.

IMG_3761

On a side note, this is someone who I met in person for the first time, but who I have known for a while through videos as part of my initial Cisco ACI learning journey. I just wish my whiteboards were that decent and inspirational.

In addition to the whiteboard session, other highlights were around the ServiceNow integration delivered jointly with Network to Code, an overview and demo of Day 2 Operations via IBA (Intent-Based Analytics) with a write-up here as well, and a demo on AOS for additional context. The original videos are available at the Networking Field Day 19 portal.

NSX-T Logical Routers

Between a few VMworld 2018 sessions and a recent NSX-T Bootcamp, I believe I collected enough information to describe at a high level the new logical routing scheme within NSX-T. The interest is also being driven by an internal project.

The intent is to continue to “route as close as possible to the source” as all routing and switching is being done at the host level in software within the NSX overlay architecture, while the underlay infrastructure provides only transport and external connectivity.

Logical Routers Components

NSX-T has two logical routers components, namely the Services Router (SR) and the Distributed Router (DR). As the names imply, SR is where centralized services are provisioned such as NAT, DHCP, VPN, Perimeter Firewall, Load Balancing, etc., and DR performs distributed routing across all hosts participating in a given transport zone. This is very similar to the Distributed Logical Router (DLR) in NSX-v, except that there is no need for a DLR Control VM or dynamic routing protocols between DR and centralized services.

Apart from the logical router components being named SR and DR, the actual logical router naming convention configured within the NSX-T Manager is “Tier 0” and “Tier 1” router as described further when deploying single or two-tier routing/topologies.

Figure 1 is a conceptual diagram illustrating the SR and the DR placement within the NSX domain, for north-south (external networks) and east-west (internal networks) traffic respectively.

Screen Shot 2018-12-14 at 11.24.51 PMFigure 1 – Conceptual Design

Single Tier Topology

In a single tier topology, both SR and DR are known as Tier 0 Logical Router. In this architecture, upon creation of a Tier 0 Router with downlink interfaces to logical switches, Tier 0 Distributed Routers are automatically pushed to all Transport Nodes (compute hosts) participating in a transport zone. A Tier 0 DR instance is also automatically added to the Edge Node with the SR, which is instantiated the moment a service is enabled. By default, the link between the SR and the DR uses the 169.254.0.0/28 subnet and is auto-plumbed by NSX-T Manager. A default route is created on the DR with the next-hop pointing to the SR, and the connected routes of the DR are programmed on the SR with a next-hop pointing to the DR.

For east-west traffic, a packet coming from a virtual machine behind a DR to a virtual machine in another logical switch (same or different compute host) is routed at the local DR, same goes for the returning traffic, which is routed at the local DR first. For north-south traffic that traverses the Edge Node, the packet is also routed at the local DR first, and the returning traffic is routed at the local DR residing in the Edge Node before it is encapsulated and sent back to the source.

There is a lot that happens in the background, but from the perspective of the Tier 0 DR router, the logical switches are directly connected as south-bound switches, and the logical switches only see a single logical router or single routing construct upstream. Figure 2 depicts the physical and logical view with color-coded routers to indicate which one is SR, which one is DR within a single tier topology.

Screen Shot 2018-12-14 at 11.25.02 PM

Figure 2 – Single Tier Topology

Two-Tier Topology

In a two-tier (or multi) routing topology, the fundamentals of SR and DR remains the same, but the logical routers are named Tier 0 and Tier 1 routers and both are instantiated on the hypervisors of each transport node in a fully distributed architecture. The “RouterLink” between Tier 0 and Tier 1 routers is automatically configured with a /31 IP in the subnet range of 100.64.0.0 when the Tier 1 is connected to the Tier 0 router, same auto-plumbing process in the backend by NSX-T Manager. There is no routing protocol running between Tier 0 and Tier 1 routers. The NSX management plane knows about the connected routes on Tier 1 and creates static routes on the Tier 0 router with a next-hop in the subnet range of 100.64.0.0/10.

As with the single tier topology, the Edge Node also has instances of SR and DR locally, or Tier 0 and Tier 1. The major difference is that even though Tier 1 is being “distributed” across all Transport Nodes, it has tenant isolation from the other Tier 1 routers across the NSX domain. A Tier 1 can be removed from the environment without affecting any other tenant, completely independent (or isolated) given the multi-tenancy nature. Specific services can also be enabled on a Tier 1 router such as Load Balancing or NAT.

If there is a need for inter-tenant connectivity, traffic between tenants traverse the local Tier 1 as well as the Tier 0 routers in the transport node, and packets are routed locally by the Tier 1 before it hits the wire via Geneve encapsulation. Returning traffic is also routed at the local Tier 1 at the remote tenant. For north-south traffic that traverses the Edge Node, the packet is also routed at the local Tier 1 first, and the returning traffic is routed at the local Tier 0 and Tier 1 instantiated at the Edge Node.

Figure 3 depicts the physical and logical view with color-coded routers to indicate which one is Tier 0, which one is Tier 1 within a two-tier topology contained in each tenant.

Screen Shot 2018-12-14 at 11.25.10 PM

Figure 3 – Two-Tier Topology

Which One?

The decision of when and which topology to use will come down to business requirements. In a multi-tenancy environment, of any kind, two-tier routing has its place by providing tenant isolation and independent control over network and security policies. For some, it may simplify management, while for others, it may add complexity. The single tier topology is as “simple” as it could be. If there is no interest (or requirement) in separating routing domains for tenants/groups, then only Tier 0 can be deployed.

Recommended Sessions

These are the VMworld sessions which add more in-depth details with packet walk of the logical routing in NSX-T:

  • NET1127BU: NSX-T Data Center Routing Deep Dive
  • NET1561BU: Next Generation Reference Design with NSX-T Data Center (part 1)
  • NET1562BU: Next Generation Reference Design with NSX-T Data Center (part 2)

Credit

Thanks to my friend @LuisChanu who provided invaluable inputs and guidance.