Jekyll2020-06-04T14:55:45+00:00https://blog.networktocode.com/feed.xmlThe NTC MagNetwork to Codeinfo@networktocode.comAlerting with Prometheus2020-06-03T00:00:00+00:002020-06-03T00:00:00+00:00https://blog.networktocode.com/post/prometheus_alerting<p>Over the past several posts, I have discussed how to gather metrics about your infrastructure and web applications. Now, the natural progression is to move into alerting with Prometheus. This post will build on the <a href="http://blog.networktocode.com/post/monitoring_websites_with_telegraf_and_prometheus/">previous post</a> on gathering website and DNS responses. I will be taking you through how to setup a rule whenever a website gives a response other than a <code class="highlighter-rouge">200 OK</code> response. To accomplish this we will take a look at the metric <code class="highlighter-rouge">http_response_http_response_code</code> gathered via Telegraf.</p> <h2 id="prometheus-setup">Prometheus Setup</h2> <p>You configure rules in files and reference those file names within the Prometheus configuration. A common practice is to name the file <code class="highlighter-rouge">alert.rules</code> within the <code class="highlighter-rouge">/etc/prometheus/</code> directory.</p> <p>The following outlines what the file will contain. The alert rules will be defined by a YAML file that specifies the alert name (<code class="highlighter-rouge">alert</code>), expression (<code class="highlighter-rouge">expr</code>) to search for within Prometheus, and the time (<code class="highlighter-rouge">for</code>) that the event status meets the criteria. There are additional keys available as well, such as labels and annotations as demonstrated below:</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">groups</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">websites</span> <span class="na">rules</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">alert</span><span class="pi">:</span> <span class="s">WebsiteDown</span> <span class="na">expr</span><span class="pi">:</span> <span class="s">http_response_http_response_code != </span><span class="m">200</span> <span class="na">for</span><span class="pi">:</span> <span class="s">1m</span> <span class="na">labels</span><span class="pi">:</span> <span class="na">severity</span><span class="pi">:</span> <span class="s">critical</span> <span class="na">annotations</span><span class="pi">:</span> <span class="na">summary</span><span class="pi">:</span> <span class="s2">"</span><span class="s">{{</span><span class="nv"> </span><span class="s">$labels.instance</span><span class="nv"> </span><span class="s">}}</span><span class="nv"> </span><span class="s">is</span><span class="nv"> </span><span class="s">not</span><span class="nv"> </span><span class="s">responding</span><span class="nv"> </span><span class="s">with</span><span class="nv"> </span><span class="s">200</span><span class="nv"> </span><span class="s">OK."</span> </code></pre></div></div> <p>This is what the configuration will look like for the <code class="highlighter-rouge">prometheus.yml</code> file. The rules file that is created above will be added to the array under the key <code class="highlighter-rouge">rule_files</code>. This will allow for multiple files to be processed by Prometheus.</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">global</span><span class="pi">:</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s">15s</span> <span class="na">rule_files</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">alert.rules</span> <span class="na">alerting</span><span class="pi">:</span> <span class="na">alertmanagers</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">static_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">targets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">localhost:9093</span> <span class="na">scrape_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">job_name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">prometheus'</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s">5s</span> <span class="na">static_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">targets</span><span class="pi">:</span> <span class="pi">[</span><span class="s1">'</span><span class="s">localhost:9090'</span><span class="pi">]</span> <span class="pi">-</span> <span class="na">job_name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">telegraf</span><span class="nv"> </span><span class="s">website'</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s">10s</span> <span class="na">static_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">targets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">localhost:9012"</span> </code></pre></div></div> <p>Once the rules are loaded, you can verify the rules by going to the Prometheus url - <code class="highlighter-rouge">http://&lt;hostname_or_ip&gt;:9090/rules</code>. You will now see what rules are loaded:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part3/prometheus_rules.png" alt="Prometheus Rules" /></p> <h3 id="prometheus-alertmanager">Prometheus AlertManager</h3> <p>Now you have a configuration for the alerts, but how do you actually manage them? You’ll need to add an application into the environment, <a href="https://prometheus.io/docs/alerting/alertmanager/">Prometheus AlertManager</a>. AlertManager is where you will handle the silencing, deduplicating, grouping, and routing of alerts to the appropriate outputs. These destinations can include, but not limited to, Slack, email, or webhooks. The <a href="https://prometheus.io/docs/alerting/configuration/">AlertManager configuration page</a> has the details on how to make configuration for these:</p> <ul> <li>Email</li> <li>HipChat</li> <li>PagerDuty</li> <li>Pushover</li> <li>Slack</li> <li>OpsGenie</li> <li>Webhook</li> <li>VictorOps</li> <li>WeChat</li> </ul> <h4 id="alertmanager-installation">AlertManager Installation</h4> <p>Installation can be done in several ways. There are binaries available for many common platforms, Docker containers, and installation from source. In this demo, I will just be installing the binary via the <a href="https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz">installation</a> using wget to download the file.</p> <p>Once the file is downloaded, we will expand it within the directory:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tar</span> <span class="nt">-xzf</span> alertmanager-0.20.0.linux-amd64.tar.gz </code></pre></div></div> <h4 id="alertmanager-configuration">AlertManager Configuration</h4> <p>The AlertManager configuration is to be handled in the <code class="highlighter-rouge">alertmanager.yml</code> file. An example may look like:</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">route</span><span class="pi">:</span> <span class="na">group_by</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">Alertname</span><span class="pi">]</span> <span class="c1"># Send all notifications to me.</span> <span class="na">receiver</span><span class="pi">:</span> <span class="s">email-me</span> <span class="na">receivers</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">email-me</span> <span class="na">email_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">to</span><span class="pi">:</span> <span class="s">$GMAIL_ACCOUNT</span> <span class="na">from</span><span class="pi">:</span> <span class="s">$GMAIL_ACCOUNT</span> <span class="na">smarthost</span><span class="pi">:</span> <span class="s">smtp.gmail.com:587</span> <span class="na">auth_username</span><span class="pi">:</span> <span class="s2">"</span><span class="s">$GMAIL_ACCOUNT"</span> <span class="na">auth_identity</span><span class="pi">:</span> <span class="s2">"</span><span class="s">$GMAIL_ACCOUNT"</span> <span class="na">auth_password</span><span class="pi">:</span> <span class="s2">"</span><span class="s">$GMAIL_AUTH_TOKEN"</span> </code></pre></div></div> <h4 id="alertmanager-execution">AlertManager Execution</h4> <p>To start this test instance of AlertManager the command <code class="highlighter-rouge">./alertmanager --config.file="alertmanager.yml"</code> is executed to start AlertManager:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./alertmanager <span class="nt">--config</span>.file<span class="o">=</span><span class="s2">"alertmanager.yml"</span> <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.850Z <span class="nb">caller</span><span class="o">=</span>main.go:231 <span class="nv">msg</span><span class="o">=</span><span class="s2">"Starting Alertmanager"</span> <span class="nv">version</span><span class="o">=</span><span class="s2">"(version=0.20.0, branch=HEAD, revision=f74be0400a6243d10bb53812d6fa408ad71ff32d)"</span> <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.850Z <span class="nb">caller</span><span class="o">=</span>main.go:232 <span class="nv">build_context</span><span class="o">=</span><span class="s2">"(go=go1.13.5, user=root@00c3106655f8, date=20191211-14:13:14)"</span> <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.859Z <span class="nb">caller</span><span class="o">=</span>cluster.go:161 <span class="nv">component</span><span class="o">=</span>cluster <span class="nv">msg</span><span class="o">=</span><span class="s2">"setting advertise address explicitly"</span> <span class="nv">addr</span><span class="o">=</span>10.250.0.83 <span class="nv">port</span><span class="o">=</span>9094 <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.868Z <span class="nb">caller</span><span class="o">=</span>cluster.go:623 <span class="nv">component</span><span class="o">=</span>cluster <span class="nv">msg</span><span class="o">=</span><span class="s2">"Waiting for gossip to settle..."</span> <span class="nv">interval</span><span class="o">=</span>2s <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.883Z <span class="nb">caller</span><span class="o">=</span>coordinator.go:119 <span class="nv">component</span><span class="o">=</span>configuration <span class="nv">msg</span><span class="o">=</span><span class="s2">"Loading configuration file"</span> <span class="nv">file</span><span class="o">=</span>alertmanager.yml <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.883Z <span class="nb">caller</span><span class="o">=</span>coordinator.go:131 <span class="nv">component</span><span class="o">=</span>configuration <span class="nv">msg</span><span class="o">=</span><span class="s2">"Completed loading of configuration file"</span> <span class="nv">file</span><span class="o">=</span>alertmanager.yml <span class="nv">level</span><span class="o">=</span>info <span class="nv">ts</span><span class="o">=</span>2020-05-21T15:14:56.885Z <span class="nb">caller</span><span class="o">=</span>main.go:497 <span class="nv">msg</span><span class="o">=</span>Listening <span class="nv">address</span><span class="o">=</span>:9093 </code></pre></div></div> <p>You can see that the application starts up, and then the listening address port is displayed indicating in this instance the AlertManager is listening on port 9093.</p> <h2 id="prometheus-alerts-in-action">Prometheus Alerts in Action</h2> <p>Now that the configuration has been called out, let’s take a look at how this looks put all together.</p> <p>To see the status of the alerts within the Prometheus environment, you can navigate to the <strong>Alerts</strong> menu item, or to the URL <code class="highlighter-rouge">http://&lt;hostname_or_ip&gt;:9090/alerts</code>. Once there, the following image shows the status of each of the rules within the files that the rules are being added to.</p> <p><img src="../../../static/images/blog_posts/prometheus_alerting/alert_list.png" alt="Prometheus Alerts" /></p> <p>At this point there are no websites down. To confirm this in the Prometheus Graph you can search for <code class="highlighter-rouge">ALERTS</code> within the graph application of Prometheus. You should get the message <code class="highlighter-rouge">No datapoints found.</code> if you have nothing alerting. This will help you understand if you are receiving an alert status and it is being suppressed or if there is something else wrong with the configuration.</p> <blockquote> <p>At this point I am going to have my DNS server deny access to the ServiceNow website. This will simulate the service unavailable</p> </blockquote> <h3 id="prometheus-alerts-pending">Prometheus Alerts Pending</h3> <p>After some time the website becomes non responsive. Next we can see within the Alerts management page that Prometheus was first in a waiting status that the website was down, but had not crossed the threshold for amount of time that was set (1 minute). You see the 1 pending rule.</p> <p><img src="../../../static/images/blog_posts/prometheus_alerting/prometheus_alert_start.png" alt="Start of Alert" /></p> <p><img src="../../../static/images/blog_posts/prometheus_alerting/prometheus_alert_start2.png" alt="Start of Alert Expanded" /></p> <h3 id="prometheus-alerts-firing">Prometheus Alerts Firing</h3> <p>Once the threshold that was defined has passed, the alert will move from a <em>Pending</em> state to a <em>Firing</em> state. In this state Prometheus has sent the alert off to AlertManager to handle the processing of the alert.</p> <p>First, let’s take a look at the Prometheus Alerts page. This page shows that the alert has moved through to the <em>Firing</em> phase. This has the same information that you seen in the <em>Pending</em> state but now in the red state.</p> <p><img src="../../../static/images/blog_posts/prometheus_alerting/prometheus_firing1.png" alt="Alert Firing" /></p> <p>Now, moving on to the <strong>Graph</strong> section of Prometheus and searching for <code class="highlighter-rouge">ALERT</code>, you can now see the lines along the way of the state of the ALERT.</p> <p>At the start the graph with the mouse cursor over the section indicating when the event was in a <em>Pending</em> state. The second graph shows the mouse hovering over the <em>Firing</em> state. Each gives you additional information to help debug if alerts are not getting to their destination.</p> <p><img src="../../../static/images/blog_posts/prometheus_alerting/fired_graph_1.png" alt="Graph Pending" /> <img src="../../../static/images/blog_posts/prometheus_alerting/fired_graph_2.png" alt="Graph Firing" /></p> <h3 id="prometheus-alertmanager-firing">Prometheus AlertManager Firing</h3> <p>The last image is the view from the AlertManager perspective. This shows what alerts have been triggered and which tags are found within the search for the alert.</p> <p><img src="../../../static/images/blog_posts/prometheus_alerting/alert_mgr_fired.png" alt="AlertManager Pane" /></p> <h2 id="summary">Summary</h2> <p>This wraps up (for now) this series of posts focused on leveraging Telegraf, Prometheus, and Grafana to monitor your environment. Take a look at the post list below for the others in the series and jump on into the <a href="https://ntcslack.com">Network to Code Slack</a> Telemetry channel to start a conversation on what you are doing, what you want to do, or just to talk network telemetry!</p> <ul> <li><a href="http://blog.networktocode.com/post/using_python_and_telegraf_for_metrics/">Using Python to monitor your Infrastructure through CLI</a></li> <li><a href="http://blog.networktocode.com/post/network_telemetry_for_snmp_devices/">Network Telemetry for SNMP Devices</a></li> <li><a href="http://blog.networktocode.com/post/monitor_your_network_with_gnmi_snmp_and_grafana/">Monitoring your Network with gNMI, SNMP, and Grafana in one</a></li> <li><a href="http://blog.networktocode.com/post/monitoring_websites_with_telegraf_and_prometheus/">Monitoring Websites with Telegraf and Prometheus</a></li> </ul> <p>Hope this has been helpful!</p> <p>-Josh (<a href="https://twitter.com/vanderaaj">@vanderaaj</a>)</p>Josh VanDeraaOver the past several posts, I have discussed how to gather metrics about your infrastructure and web applications. Now, the natural progression is to move into alerting with Prometheus. This post will build on the previous post on gathering website and DNS responses. I will be taking you through how to setup a rule whenever a website gives a response other than a 200 OK response. To accomplish this we will take a look at the metric http_response_http_response_code gathered via Telegraf.Monitoring Websites with Telegraf and Prometheus2020-05-28T00:00:00+00:002020-05-28T00:00:00+00:00https://blog.networktocode.com/post/monitoring_websites_with_telegraf_and_prometheus<p>In network service delivery, the network exists to have applications ride on it. Yes, even voice is considered an application when it is riding over the top of the network. We have explored in previous posts how to get telemetry data from your network devices to get an understanding of how they are performing from a device perspective. Now, in this post, I will move on to exploring how to monitor web applications and DNS using Telegraf, Prometheus, and Grafana. Often your operations teams will receive reports of websites not working for a user or you are just looking to get some more visibility into your own web services. The following method could be used to get more insight into the network and the name resolution required for those applications.</p> <p>There are also several other Telegraf inputs available including ping (ICMP) and TCP tests. As of this post in May 2020 there are 181 different input plugins available to choose from. Take a look at the <a href="https://docs.influxdata.com/telegraf/v1.14/plugins/plugin-list/">Telegraf plugins</a> for more details and explore what other plugins you may be able to use to monitor your environment.</p> <p>I will not be going into the setup of these tools, as this is already covered in a previous post. The previous posts in the series are:</p> <ul> <li><a href="http://blog.networktocode.com/post/using_python_and_telegraf_for_metrics/">How to Monitor Your VPN Infrastructure with Netmiko, NTC-Templates, and a Time Series Database</a></li> <li><a href="http://blog.networktocode.com/post/network_telemetry_for_snmp_devices/">Network Telemetry for SNMP Devices</a></li> <li><a href="http://blog.networktocode.com/post/monitor_your_network_with_gnmi_snmp_and_grafana/">Monitor Your Network With gNMI, SNMP, and Grafana</a></li> </ul> <p>These posts can help you get up and running when it comes to monitoring your network devices in CLI, SNMP, and gNMI.</p> <blockquote> <p>Blackbox exporter from Prometheus is also a valid choice for this process, and I encourage you to try both the Telegraf and Blackbox exporters in your environment.</p> </blockquote> <h2 id="sequence-diagram">Sequence Diagram</h2> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part3/sequence.png" alt="Sequence" /></p> <h2 id="telegraf-setup---http-response">Telegraf Setup - HTTP Response</h2> <p>Telegraf has the <a href="https://github.com/influxdata/telegraf/tree/master/plugins/inputs/http_response">HTTP Response plugin</a> that does exactly what we would be looking to use for gathering metrics about a HTTP response. This lets you define the list of websites that you wish to monitor, set options for proxy, response timeout, method, any data you may want to include in the body, and various responses. Take a look at the plugin documentation for more details. Here is the configuration that is going to get setup for this demonstration:</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">##################################################### # # Check on status of URLs # ##################################################### </span><span class="nn">[[inputs.http_response]</span><span class="err">]</span> <span class="py">urls</span> <span class="p">=</span> <span class="s">["https://www.networktocode.com", "https://blog.networktocode.com", "https://www.service-now.com"]</span> <span class="py">method</span> <span class="p">=</span> <span class="s">"GET"</span> <span class="py">follow_redirects</span> <span class="p">=</span> <span class="s">true</span> <span class="c">##################################################### # # Export Information to Prometheus # ##################################################### </span><span class="nn">[[outputs.prometheus_client]</span><span class="err">]</span> <span class="py">listen</span> <span class="p">=</span> <span class="s">":9012"</span> <span class="py">metric_version</span> <span class="p">=</span> <span class="s">2</span> </code></pre></div></div> <p>Upon executing this, here are the relevant Prometheus metrics that we are gathering:</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># HELP http_response_content_length Telegraf collected metric</span> <span class="c"># TYPE http_response_content_length untyped</span> <span class="n">http_response_content_length</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://blog.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mf">1.791348</span><span class="n">e</span><span class="o">+</span><span class="mi">06</span> <span class="n">http_response_content_length</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">123667</span> <span class="n">http_response_content_length</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.service-now.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">478636</span> <span class="c"># HELP http_response_http_response_code Telegraf collected metric</span> <span class="c"># TYPE http_response_http_response_code untyped</span> <span class="n">http_response_http_response_code</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://blog.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">200</span> <span class="n">http_response_http_response_code</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">200</span> <span class="n">http_response_http_response_code</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.service-now.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">200</span> <span class="c"># HELP http_response_response_time Telegraf collected metric</span> <span class="c"># TYPE http_response_response_time untyped</span> <span class="n">http_response_response_time</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://blog.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mf">0.371015121</span> <span class="n">http_response_response_time</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mf">0.186775794</span> <span class="n">http_response_response_time</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.service-now.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mf">0.658694795</span> <span class="c"># HELP http_response_result_code Telegraf collected metric</span> <span class="c"># TYPE http_response_result_code untyped</span> <span class="n">http_response_result_code</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://blog.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">http_response_result_code</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.networktocode.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">http_response_result_code</span><span class="p">{</span><span class="na">method</span><span class="o">=</span><span class="s2">"GET"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">result_type</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"https://www.service-now.com"</span><span class="p">,</span><span class="na">status_code</span><span class="o">=</span><span class="s2">"200"</span><span class="p">}</span> <span class="mi">0</span> </code></pre></div></div> <p>You have several pieces that come back right away including:</p> <ul> <li>content_length: How long the content is</li> <li>response_code: HTTP response code</li> <li>response_time: How long did it take for the request to process</li> <li>result_code: This is a function of Telegraf, to take an OK response to map to 0</li> </ul> <h2 id="telegraf---dns-check">Telegraf - DNS Check</h2> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part3/dns_sequence.png" alt="DNS Sequence" /></p> <p>On top of this, I want to also show how to add in a second input. We will add in a DNS query to test the name resolution of the sites as well to verify that the DNS lookup is working as expected. This could also be extended to test and verify DNS from a user perspective within your environment.</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">##################################################### # # Check on status of URLs # ##################################################### </span><span class="nn">[[inputs.http_response]</span><span class="err">]</span> <span class="py">urls</span> <span class="p">=</span> <span class="s">["https://www.networktocode.com", "https://blog.networktocode.com", "https://www.service-now.com"]</span> <span class="py">method</span> <span class="p">=</span> <span class="s">"GET"</span> <span class="py">follow_redirects</span> <span class="p">=</span> <span class="s">true</span> <span class="nn">[[inputs.dns_query]</span><span class="err">]</span> <span class="py">servers</span> <span class="p">=</span> <span class="s">["8.8.8.8"]</span> <span class="py">domains</span> <span class="p">=</span> <span class="s">["blog.networktocode.com", "www.networktocode.com", "www.servicenow.com"]</span> <span class="c">##################################################### # # Export Information to Prometheus # ##################################################### </span><span class="nn">[[outputs.prometheus_client]</span><span class="err">]</span> <span class="py">listen</span> <span class="p">=</span> <span class="s">":9012"</span> <span class="py">metric_version</span> <span class="p">=</span> <span class="s">2</span> </code></pre></div></div> <p>The new section is:</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[inputs.dns_query]</span><span class="err">]</span> <span class="py">servers</span> <span class="p">=</span> <span class="s">["8.8.8.8"]</span> <span class="py">domains</span> <span class="p">=</span> <span class="s">["blog.networktocode.com", "www.networktocode.com", "www.servicenow.com"]</span> </code></pre></div></div> <p>Based on the plugin definition we are going to define to use the Google DNS resolver. And the interesting domains that we are going to verify are blog.networktocode.com, www.networktocode.com, and the popular ITSM tool ServiceNow.</p> <p>Here is what gets added to the Prometheus Client output:</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># HELP dns_query_query_time_ms Telegraf collected metric</span> <span class="c"># TYPE dns_query_query_time_ms untyped</span> <span class="n">dns_query_query_time_ms</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"blog.networktocode.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mf">70.950858</span> <span class="n">dns_query_query_time_ms</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"www.networktocode.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mf">48.118903</span> <span class="n">dns_query_query_time_ms</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"www.servicenow.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mf">48.552328</span> <span class="c"># HELP dns_query_rcode_value Telegraf collected metric</span> <span class="c"># TYPE dns_query_rcode_value untyped</span> <span class="n">dns_query_rcode_value</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"blog.networktocode.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">dns_query_rcode_value</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"www.networktocode.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">dns_query_rcode_value</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"www.servicenow.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mi">0</span> <span class="c"># HELP dns_query_result_code Telegraf collected metric</span> <span class="c"># TYPE dns_query_result_code untyped</span> <span class="n">dns_query_result_code</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"blog.networktocode.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">dns_query_result_code</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"www.networktocode.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">dns_query_result_code</span><span class="p">{</span><span class="na">domain</span><span class="o">=</span><span class="s2">"www.servicenow.com"</span><span class="p">,</span><span class="na">rcode</span><span class="o">=</span><span class="s2">"NOERROR"</span><span class="p">,</span><span class="na">record_type</span><span class="o">=</span><span class="s2">"NS"</span><span class="p">,</span><span class="na">result</span><span class="o">=</span><span class="s2">"success"</span><span class="p">,</span><span class="na">server</span><span class="o">=</span><span class="s2">"8.8.8.8"</span><span class="p">}</span> <span class="mi">0</span> </code></pre></div></div> <p>The corresponding values gathered from the <code class="highlighter-rouge">dns_query</code> input are:</p> <ul> <li>dns_query_query_time_ms: Amount of time it took for the query to respond</li> <li>dns_query_rcode_value: Return code value for a DNS entry</li> <li>dns_query_result_code: Code defined by Telegraf for the response</li> </ul> <h2 id="prometheus">Prometheus</h2> <p>The configuration for Prometheus at this point has a single addition to gather the statistics for each of the websites:</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">global:</span> <span class="err">scrape_interval:</span> <span class="err">15s</span> <span class="err">scrape_configs:</span> <span class="err">-</span> <span class="err">job_name:</span> <span class="err">'prometheus'</span> <span class="err">scrape_interval:</span> <span class="err">5s</span> <span class="err">static_configs:</span> <span class="err">-</span> <span class="err">targets:</span> <span class="nn">['localhost:9090']</span> <span class="err">-</span> <span class="err">job_name:</span> <span class="err">'telegraf</span> <span class="err">website'</span> <span class="err">scrape_interval:</span> <span class="err">10s</span> <span class="err">static_configs:</span> <span class="err">-</span> <span class="err">targets:</span> <span class="err">-</span> <span class="err">"localhost:9012"</span> </code></pre></div></div> <p>When you navigate to the base page to check on how Prometheus is doing with polling the data you can get a base graph. Here you see that all three sites are appearing on the graph with respect to response time:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part3/prometheus.png" alt="Prometheus Output" /></p> <h2 id="grafana">Grafana</h2> <p>What does it look like to get this information into a graph on Grafana?</p> <h3 id="grafana---websites">Grafana - Websites</h3> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part3/grafana_graph.png" alt="Grafana Website Response" /></p> <p>To build this chart, this is a small configuration. In the <code class="highlighter-rouge">Metrics</code> section I only put the query of <code class="highlighter-rouge">http_response_response_time</code>. With the legend I set it to <code class="highlighter-rouge">{{ server }}</code> to get the website address as the table legend.</p> <p>In the visualization section, the only thing that is needs to be doneis to adjust in the <em>Left Y Axis</em> <code class="highlighter-rouge">Unit</code> to be <code class="highlighter-rouge">seconds (s)</code> to provide the proper Y-Axis Metric.</p> <h3 id="grafana---dns">Grafana - DNS</h3> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part3/dns_response_time.png" alt="Grafana DNS Response" /></p> <p>This is going to be another small configuration panel, similar to the previous one. In the <code class="highlighter-rouge">Metrics</code> section the corresponding query to get response time is <code class="highlighter-rouge">dns_query_query_time_ms</code>. The legend you then set to <code class="highlighter-rouge">{{ domain }}</code> to match that of what is in the query shown above.</p> <p>In the visualization section, you should use the <em>Unit</em> of <code class="highlighter-rouge">milliseconds (ms)</code>. If you copied the panel from the Website panel, don’t forget to change this. The unit of measure is in fact different and the time scale would be off.</p> <h2 id="summary">Summary</h2> <p>Hopefully this post will help you gain some insight into your environment. We have been using this process internally at Network to Code already, keeping an eye on our key services that we rely on to understand if there is an individual issue or an issue with the service. Let us know your thoughts and comments! To continue the conversation, check out the #Telemetry channel inside the Network to Code Slack. Sign up at <a href="https://slack.networktocode.com">slack.networktocode.com</a>.</p> <p>-Josh</p>Josh VanDeraaIn network service delivery, the network exists to have applications ride on it. Yes, even voice is considered an application when it is riding over the top of the network. We have explored in previous posts how to get telemetry data from your network devices to get an understanding of how they are performing from a device perspective. Now, in this post, I will move on to exploring how to monitor web applications and DNS using Telegraf, Prometheus, and Grafana. Often your operations teams will receive reports of websites not working for a user or you are just looking to get some more visibility into your own web services. The following method could be used to get more insight into the network and the name resolution required for those applications.Upgrade Your Python Project With Poetry2020-05-19T00:00:00+00:002020-05-19T00:00:00+00:00https://blog.networktocode.com/post/upgrade-your-python-project-with-poetry<p>Dependency management and virtual environments are integral to the Python ecosystem, yet the primary tools in use today are far from ideal. Some of the primary methods are:</p> <ul> <li>Dependencies management: <code class="highlighter-rouge">pip</code> by way of <code class="highlighter-rouge">requirements.txt</code> is still the de facto solution for most of us. While this approach has worked in the past, there are limitations when it comes to guaranteeing that the same project will be consistently installed.</li> <li>Virtual environments: a common setup is to use <code class="highlighter-rouge">virtualenv</code> to create your virtual environment and manually activate it using <code class="highlighter-rouge">source &lt;path to venv&gt;/activate</code>. While this approach works, it requires the user to know which venv needs to be activated for each project and the command to execute can be lengthy.</li> <li>Code packaging: (only applicable if you need to share your code), it is common to use <code class="highlighter-rouge">setuptools</code> in a <code class="highlighter-rouge">setup.py</code> file, but this solution also has some shortcomings.</li> </ul> <p>If you are using any or all of the methods described above, you should take a look at <a href="https://python-poetry.org/">Poetry</a> to help you manage your Python project(s). Poetry’s goal is to <strong>simplify the management of Python packaging and dependencies</strong>. Amongst other things, Poetry can help:</p> <ul> <li>Manage your dependencies by replacing requirements.txt</li> <li>Manage your virtualenv by simplifying the creation and activation of a virtualenv for your project</li> <li>Manage your Python package by replacing setup.py</li> <li>Publish your application to PyPi</li> <li>Turn Python functions into command line programs</li> <li>Ensure package integrity</li> </ul> <p>It sounds like magic and too good to be true, but there is really nothing magical happening here. Poetry is just a modern tool implementing the best practices from Python and other tools to manage a project properly. Poetry is leveraging 2 main files:</p> <ul> <li><code class="highlighter-rouge">pyproject.toml</code>: As the main configuration file for your Python project, this file can be edited manually and Poetry also helps to manage the file. The <code class="highlighter-rouge">pyproject.toml</code> file is not specific to Poetry and is meant to be the main configuration file for your Python project and all the tools surrounding it (Poetry, Black, etc.). It was introduced to the Python community in 2016 by <a href="https://www.python.org/dev/peps/pep-0518/">PEP 518</a> to improve how to define Python packages, but its scope has increased year over year to become the default configuration file.</li> <li><code class="highlighter-rouge">poetry.lock</code>: A lock file managed by Poetry, this file should never be edited manually. With the <code class="highlighter-rouge">poetry.lock</code> file, Poetry brings a much-needed feature to Python dependencies management where we can separately maintain the list of primary dependencies, the list of development dependencies, and the exact version of the libraries that should be installed on a system. This feature is common on other languages but it has been used infrequently in Python’s <code class="highlighter-rouge">setup.py</code> or <code class="highlighter-rouge">requirements.txt</code> in the past. If you have ever generated your requirements.txt file with <code class="highlighter-rouge">pip freeze &gt; requirements.txt</code> to ensure that you’ll always install the same version of your dependencies, you should be familiar with the problem the lock file solves. While <code class="highlighter-rouge">pip freeze</code> works most of the time, it’s not a great solution and it’s prone to version conflicts between projects, which may require manual intervention.</li> </ul> <blockquote> <p>If you are interested in reading more about the story behind <code class="highlighter-rouge">pyproject.toml</code>, I recommend reading <a href="https://snarky.ca/what-the-heck-is-pyproject-toml/">this blog from Brett Cannon</a>.</p> </blockquote> <h2 id="install-poetry">Install Poetry</h2> <p>To install Poetry on Mac OS, Linux or Windows(bash) the recommended method is to use the below command on your system</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-sSL</span> https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python </code></pre></div></div> <blockquote> <p>For convenience, Poetry is also available via pip but it’s not the recommended method to install it. I usually reserve that for when I need to install it within a Docker container: <code class="highlighter-rouge">pip install poetry</code>.</p> </blockquote> <h2 id="manage-python-dependencies-and-virtual-environment-with-poetry">Manage Python dependencies and virtual environment with Poetry</h2> <p>Below is a simple <code class="highlighter-rouge">pyproject.toml</code> file to keep track of the dependencies for a project named <code class="highlighter-rouge">mypythonproject</code>.<br /> This file can either be generated manually or Poetry can help you to generate it with <code class="highlighter-rouge">poetry init</code>.</p> <div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[tool.poetry]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"mypythonproject"</span> <span class="py">version</span> <span class="p">=</span> <span class="s">"0.1.0"</span> <span class="py">description</span> <span class="p">=</span> <span class="s">"My awesome Python project"</span> <span class="py">authors</span> <span class="p">=</span> <span class="p">[</span><span class="s">"NTC &lt;info@networktocode.com&gt;"</span><span class="p">]</span> <span class="nn">[tool.poetry.dependencies]</span> <span class="py">python</span> <span class="p">=</span> <span class="s">"^3.6"</span> <span class="py">click</span> <span class="p">=</span> <span class="s">"^7.1.1"</span> </code></pre></div></div> <p>Taking a closer look at the file, the first section <code class="highlighter-rouge">[tool.poetry]</code> contains information about the project itself and the second section <code class="highlighter-rouge">[tool.poetry.dependencies]</code> defines the list of dependencies for the project, including both the Python version and the list of external dependencies that would usually be in a <code class="highlighter-rouge">requirements.txt</code> file.</p> <p>The <code class="highlighter-rouge">pyproject.toml</code> file should be at the root of your project (here it’s the only file in my directory). Poetry will automatically install all the dependencies with <code class="highlighter-rouge">poetry install</code> (this replaces for <code class="highlighter-rouge">pip install -r requirements.txt</code>, <code class="highlighter-rouge">python setup.py install</code>, <code class="highlighter-rouge">pip install .</code>, or <code class="highlighter-rouge">pip install -e .</code>)</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>➜ mypythonproject# ll total 8 <span class="nt">-rw-r--r--</span> 1 damien staff 203B May 13 09:17 pyproject.toml ➜ mypythonproject# ➜ mypythonproject# poetry <span class="nb">install </span>The currently activated Python version 2.7.16 is not supported by the project <span class="o">(</span>^3.6<span class="o">)</span><span class="nb">.</span> Trying to find and use a compatible version. Using python3 <span class="o">(</span>3.7.7<span class="o">)</span> Creating virtualenv mypythonproject-0zMZkBqq-py3.7 <span class="k">in</span> /Users/damien/Library/Caches/pypoetry/virtualenvs Updating dependencies Resolving dependencies... <span class="o">(</span>0.2s<span class="o">)</span> Writing lock file Package operations: 1 <span class="nb">install</span>, 0 updates, 0 removals - Installing click <span class="o">(</span>7.1.2<span class="o">)</span> ➜ mypythonproject# </code></pre></div></div> <p>During the installation, Poetry automatically generates the <code class="highlighter-rouge">poetry.lock</code> file to track the exact version of the dependencies that have been install on my system. If the <code class="highlighter-rouge">poetry.lock</code> file was already present, it would have installed the exact version of <code class="highlighter-rouge">click</code> defined in the lock file, instead of trying to install the latest one from PyPi.</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>➜ mypythonproject# ll total 16 <span class="nt">-rw-r--r--</span> 1 damien staff 606B May 13 09:43 poetry.lock <span class="nt">-rw-r--r--</span> 1 damien staff 203B May 13 09:17 pyproject.toml ➜ mypythonproject# <span class="nb">cat </span>poetry.lock <span class="o">[[</span>package]] category <span class="o">=</span> <span class="s2">"main"</span> description <span class="o">=</span> <span class="s2">"Composable command line interface toolkit"</span> name <span class="o">=</span> <span class="s2">"click"</span> optional <span class="o">=</span> <span class="nb">false </span>python-versions <span class="o">=</span> <span class="s2">"&gt;=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"</span> version <span class="o">=</span> <span class="s2">"7.1.2"</span> <span class="o">[</span>metadata] content-hash <span class="o">=</span> <span class="s2">"1876b927e070ae12d1e9090f5ea6bcdd2bb35f09269fc2182bcb9399c5e1be2a"</span> python-versions <span class="o">=</span> <span class="s2">"^3.6"</span> <span class="o">[</span>metadata.files] click <span class="o">=</span> <span class="o">[</span> <span class="o">{</span>file <span class="o">=</span> <span class="s2">"click-7.1.2-py2.py3-none-any.whl"</span>, <span class="nb">hash</span> <span class="o">=</span> <span class="s2">"sha256:dacca89f4bfadd5de3d7489b7c8a566eee0d3676333fbb50030263894c38c0dc"</span><span class="o">}</span>, <span class="o">{</span>file <span class="o">=</span> <span class="s2">"click-7.1.2.tar.gz"</span>, <span class="nb">hash</span> <span class="o">=</span> <span class="s2">"sha256:d2b5255c7c6349bc1bd1e59e08cd12acbbd63ce649f2588755783aa94dfb6b1a"</span><span class="o">}</span>, <span class="o">]</span> </code></pre></div></div> <blockquote> <p>Both the <code class="highlighter-rouge">pyproject.toml</code> and the <code class="highlighter-rouge">poetry.lock</code> should be tracked in source control (git). Notice the hash values in the lock file, these values ensure the package installed locally is exactly the same as intended.</p> </blockquote> <p>Also, during <code class="highlighter-rouge">poetry install</code>, Poetry created a new virtual environment for my project because it detected that no virtual environment was already associated with the project. Poetry is able to manage multiple environments per project and provides some commands to easily manage these virtual environments.</p> <ul> <li><code class="highlighter-rouge">poetry env info</code> to list the existing env</li> <li><code class="highlighter-rouge">poetry shell</code> to activate the default virtualenv (replaces <code class="highlighter-rouge">source &lt;path to venv&gt;/activate</code>, or <code class="highlighter-rouge">workon &lt;project&gt;</code> if you use virtualenvwrapper )</li> <li><code class="highlighter-rouge">poetry run</code> to run a command within the default virtual environment without activating it</li> </ul> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>➜ mypythonproject# poetry <span class="nb">env </span>info Virtualenv Python: 3.7.7 Implementation: CPython Path: /Users/damien/Library/Caches/pypoetry/virtualenvs/mypythonproject-0zMZkBqq-py3.7 Valid: True System Platform: darwin OS: posix Python: /usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7 ➜ mypythonproject# poetry shell The currently activated Python version 2.7.16 is not supported by the project <span class="o">(</span>^3.6<span class="o">)</span><span class="nb">.</span> Trying to find and use a compatible version. Using python3 <span class="o">(</span>3.7.7<span class="o">)</span> Spawning shell within /Users/damien/Library/Caches/pypoetry/virtualenvs/mypythonproject-0zMZkBqq-py3.7 ➜ mypythonproject <span class="nb">.</span> /Users/damien/Library/Caches/pypoetry/virtualenvs/mypythonproject-0zMZkBqq-py3.7/bin/activate <span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# </code></pre></div></div> <blockquote> <p>It’s possible to disable the virtual environment management in Poetry with <code class="highlighter-rouge">poetry config virtualenvs.create false</code> if you want to manage your virtual environment on your own or if you don’t want to use a virtual environment at all.</p> </blockquote> <h3 id="add-a-new-dependency-to-your-project">Add a new dependency to your project</h3> <p>Poetry provides a method to easily install and track a new dependency for your project: <code class="highlighter-rouge">poetry add &lt;python package&gt;</code></p> <p>In the example below, I’m adding <code class="highlighter-rouge">jinja2</code> as a dependency to my project:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# poetry add jinja2 Using version ^2.11.2 <span class="k">for </span>jinja2 Updating dependencies Resolving dependencies... <span class="o">(</span>0.2s<span class="o">)</span> Writing lock file Package operations: 2 installs, 0 updates, 0 removals - Installing markupsafe <span class="o">(</span>1.1.1<span class="o">)</span> - Installing jinja2 <span class="o">(</span>2.11.2<span class="o">)</span> </code></pre></div></div> <p>Poetry automatically updated the <code class="highlighter-rouge">pyproject.toml</code> and the <code class="highlighter-rouge">poetry.lock</code> file in the process:</p> <div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[tool.poetry]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"mypythonproject"</span> <span class="py">version</span> <span class="p">=</span> <span class="s">"0.1.0"</span> <span class="py">description</span> <span class="p">=</span> <span class="s">"My awesome Python project"</span> <span class="py">authors</span> <span class="p">=</span> <span class="p">[</span><span class="s">"NTC &lt;info@networktocode.com&gt;"</span><span class="p">]</span> <span class="nn">[tool.poetry.dependencies]</span> <span class="py">python</span> <span class="p">=</span> <span class="s">"^3.6"</span> <span class="py">click</span> <span class="p">=</span> <span class="s">"^7.1.1"</span> <span class="py">jinja2</span> <span class="p">=</span> <span class="s">"^2.11.2"</span> </code></pre></div></div> <p>Poetry can also maintain a list of dependencies specific to your development environment. To add a new dependency to the development dependencies list you need to add the option <code class="highlighter-rouge">-D</code>: <code class="highlighter-rouge">poetry add -D pytest</code>. This will create a new section <code class="highlighter-rouge">[tool.poetry.dev-dependencies]</code> in the <code class="highlighter-rouge">pyproject.toml</code> file.</p> <div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[tool.poetry.dependencies]</span> <span class="py">python</span> <span class="p">=</span> <span class="s">"^3.6"</span> <span class="py">click</span> <span class="p">=</span> <span class="s">"^7.1.1"</span> <span class="py">jinja2</span> <span class="p">=</span> <span class="s">"^2.11.2"</span> <span class="nn">[tool.poetry.dev-dependencies]</span> <span class="py">pytest</span> <span class="p">=</span> <span class="s">"^5.4.2"</span> </code></pre></div></div> <h2 id="managing-python-package-with-poetry">Managing Python package with Poetry</h2> <p>As mentioned in the introduction, Poetry can also manage your Python package.<br /> By default, Poetry will look for a directory with the name of the project and it will try to install it. In my example, since my project is named <code class="highlighter-rouge">mypythonproject</code> in the <code class="highlighter-rouge">pyproject.toml</code>, Poetry will automatically look for a directory with this name and install it.</p> <p>I created a very simple file named <code class="highlighter-rouge">cli.py</code> in the directory <code class="highlighter-rouge">mypythonproject</code></p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># mypythonproject/cli.py </span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">print</span><span class="p">(</span><span class="s">"hi there"</span><span class="p">)</span> </code></pre></div></div> <p>Here is how the project looks on my file system.</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# <span class="nb">.</span> ├── mypythonproject │ └── cli.py ├── poetry.lock └── pyproject.toml </code></pre></div></div> <p>Running <code class="highlighter-rouge">poetry install</code> again will automatically install the delta between the <code class="highlighter-rouge">pyproject.toml</code> file and my environment, here the only delta is the library <code class="highlighter-rouge">mypythonproject</code> itself.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="n">mypythonproject</span><span class="o">-</span><span class="mi">0</span><span class="n">zMZkBqq</span><span class="o">-</span><span class="n">py3</span><span class="mf">.7</span><span class="p">)</span> <span class="err">➜</span> <span class="n">mypythonproject</span><span class="c1"># poetry install </span><span class="n">Installing</span> <span class="n">dependencies</span> <span class="k">from</span> <span class="n">lock</span> <span class="nb">file</span> <span class="n">No</span> <span class="n">dependencies</span> <span class="n">to</span> <span class="n">install</span> <span class="ow">or</span> <span class="n">update</span> <span class="o">-</span> <span class="n">Installing</span> <span class="n">mypythonproject</span> <span class="p">(</span><span class="mf">0.1.0</span><span class="p">)</span> </code></pre></div></div> <p>Once installed, I can access my code from anywhere as long as I’m still within the same virtual environment. In the example below, I moved outside of the project directory and imported the function <code class="highlighter-rouge">main()</code> in Python with <code class="highlighter-rouge">from mypythonproject.cli import main</code></p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# <span class="nb">cd</span> / <span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ / <span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ / python Python 3.7.7 <span class="o">(</span>default, Mar 10 2020, 15:43:33<span class="o">)</span> <span class="o">[</span>Clang 11.0.0 <span class="o">(</span>clang-1100.0.33.17<span class="o">)]</span> on darwin Type <span class="s2">"help"</span>, <span class="s2">"copyright"</span>, <span class="s2">"credits"</span> or <span class="s2">"license"</span> <span class="k">for </span>more information. <span class="o">&gt;&gt;&gt;</span> from mypythonproject.cli import main <span class="o">&gt;&gt;&gt;</span> main<span class="o">()</span> hi there </code></pre></div></div> <p>We can also check the list of installed packages within the virtual environment with <code class="highlighter-rouge">pip list</code>:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ / pip list | <span class="nb">grep </span>mypythonproject mypythonproject 0.1.0 /Users/damien/projects/mypythonproject </code></pre></div></div> <p>If the name of your directory does not match the name of your project, you need to tell Poetry from which directory to install using <code class="highlighter-rouge">packages</code> key as part of the main <code class="highlighter-rouge">[tool.poetry]</code> section of the <code class="highlighter-rouge">pyproject.toml</code>:</p> <div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[tool.poetry]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"mypythonproject"</span> <span class="py">version</span> <span class="p">=</span> <span class="s">"0.1.0"</span> <span class="py">description</span> <span class="p">=</span> <span class="s">"My awesome Python project"</span> <span class="py">authors</span> <span class="p">=</span> <span class="p">[</span><span class="s">"NTC &lt;info@networktocode.com&gt;"</span><span class="p">]</span> <span class="py">packages</span> <span class="p">=</span> <span class="p">[</span> <span class="err">{</span> <span class="err">include</span> <span class="err">=</span> <span class="s">"mylibraryname"</span> <span class="err">}</span><span class="p">,</span> <span class="p">]</span> </code></pre></div></div> <h2 id="creating-command-line-programs-with-poetry">Creating command line programs with Poetry</h2> <p>Another feature that is extremely useful in Poetry is the ability to easily turn a Python function into an executable/program that will be available in your PATH.</p> <p>Building on the previous example, I can convert my function main() into a CLI tool with <code class="highlighter-rouge">if __name__ == "__main__":</code>. At this point I can execute it as a script as long as I know its exact location.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">print</span><span class="p">(</span><span class="s">"hi there"</span><span class="p">)</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </code></pre></div></div> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# python mypythonproject/cli.py hi there </code></pre></div></div> <p>By leveraging <code class="highlighter-rouge">[tool.poetry.scripts]</code> feature, I can automatically turn my function main() into an executable, here called <code class="highlighter-rouge">myawesomecli</code>:</p> <div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[tool.poetry.scripts]</span> <span class="py">myawesomecli</span> <span class="p">=</span> <span class="s">"mypythonproject.cli:main"</span> </code></pre></div></div> <p>After reinstalling the library with <code class="highlighter-rouge">poetry install</code>, I now have access to a new executable <code class="highlighter-rouge">myawesomecli</code>:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# myawesomecli hi there <span class="o">(</span>mypythonproject-0zMZkBqq-py3.7<span class="o">)</span> ➜ mypythonproject# which myawesomecli /Users/damien/Library/Caches/pypoetry/virtualenvs/mypythonproject-0zMZkBqq-py3.7/bin/myawesomecli </code></pre></div></div> <h2 id="conclusion">Conclusion</h2> <p>I hope this introduction to Poetry convinced you to give it a try, I know it’s hard to change our habits when it comes to tools and development environment sometimes. I wish I had tried Poetry a long time ago instead of waiting months before transitioning.<br /> Poetry actually does even more than what we covered in this article, so I encourage you to check out the <a href="https://python-poetry.org/">official documentation</a>!</p> <p>-Damien (@damgarros)</p>Damien GarrosDependency management and virtual environments are integral to the Python ecosystem, yet the primary tools in use today are far from ideal. Some of the primary methods are: Dependencies management: pip by way of requirements.txt is still the de facto solution for most of us. While this approach has worked in the past, there are limitations when it comes to guaranteeing that the same project will be consistently installed. Virtual environments: a common setup is to use virtualenv to create your virtual environment and manually activate it using source &lt;path to venv&gt;/activate. While this approach works, it requires the user to know which venv needs to be activated for each project and the command to execute can be lengthy. Code packaging: (only applicable if you need to share your code), it is common to use setuptools in a setup.py file, but this solution also has some shortcomings.NetDevOps Concepts - An Introduction2020-05-12T00:00:00+00:002020-05-12T00:00:00+00:00https://blog.networktocode.com/post/netdevops-concepts-intro<p>As the networking industry moves to embrace NetDevOps, there are often many terms used that network practitioners are unfamiliar with. This can lead to many people feeling unsure about what role a new technology could play in their organization, or even being afraid to ask for clarity about a discussed concept. In fact, there are a few questions we see asked time and time again.</p> <p>“What is CI/CD, why would it have a pipeline, and what does that have to do with my network?”</p> <p>“What does IaC or SoT mean?”</p> <p>Or even, “What is NetDevOps anyway and why should I care?”</p> <p>In this series of blog posts, we will delve into NetDevOps in a way that ensures you have familiarity and a basic understanding of what NetDevOps is, as well as some of the key components. We will also talk briefly about how, and why, you might apply a some of these concepts to operating your network.</p> <h2 id="what-is-netdevops">What is NetDevOps?</h2> <p>Many fantastic blog posts and presentations discussing what NetDevOps is as a philosophy exist, and I won’t revisit all of their points here. Suffice to say that NetDevOps brings key concepts from the DevOps movement, which we’ll talk about deeply in subsequent blog posts, and applies them to operating and building networks.</p> <p>With the proper application of NetDevOps concepts, you no longer have to think about your network as static, rigid, and fragile. Instead, you can start to treat it as something flexible and responsive to your desires or business needs. Some examples of these concepts include:</p> <ul> <li>Continuous Integration/Continuous Deployment (CI/CD)</li> <li>Infrastructure as Code (IaC)</li> <li>Source of Truth (SoT)</li> <li>Minimum Viable Product (MVP)</li> <li>ChatOps</li> </ul> <p>As for seeing these concepts in action, let us say that you need to deploy a new site to your corporate VPN. Without a NetDevOps approach to managing your network, you could spend a week fighting by hand to build the proper IKE/ISAKMP settings for the tunnel and documenting the site in your systems, applying changes, and then testing to ensure everything is functional.</p> <p>However, if you are using a few key NetDevOps concepts to manage your network, your process could look more like this:</p> <ul> <li>Message <em>new vpn site</em> to <code class="highlighter-rouge">@my_netdevops_bot</code> in your chat application (Slack/MS Teams/Webex Teams)</li> <li>Fill in the site name, equipment type, and other metadata as prompted by the bot.</li> <li><code class="highlighter-rouge">@my_netdevops_bot</code> uses that information to: <ul> <li>Create the site in your Source of Truth (SoT)</li> <li>Merge a Jinja template with the relevant data from your SoT</li> <li>Present you with the rendered configuration in chat application</li> </ul> </li> <li>Upon approval of the change, <code class="highlighter-rouge">@my_netdevops_bot</code> will connect to the devices, and deploy these configuration changes</li> <li><code class="highlighter-rouge">@my_netdevops_bot</code> then queries the network (or other systems such as monitoring systems) to determine if the change was successful and present you with confirmation in chat. <ul> <li>If the change failed for some reason, <code class="highlighter-rouge">@my_netdevops_bot</code> can then start to troubleshoot some basic things on your behalf and present that information to you as well.</li> </ul> </li> </ul> <p>This is not a far-fetched or cutting edge example, this is just one of the many ways in which we see the application of NetDevOps concepts on a daily basis across all types of organizations.</p> <h2 id="but">“But…”</h2> <p>Usually, about this point in the conversation, there is a “But…” lurking off in the wings. Some greatest hits include, “But my network is too old” or “But my network is too unique.” My personal favorites are, “But my network is too large/too small!” as if there was a mythical Goldilocks zone where these concepts apply. The list goes on, but for almost every exception raised, there is an answer. Most of these can be addressed by two main responses.</p> <p>First, no one is suggesting that you go directly from zero to the scenario described above. Embracing a NetDevOps mentality and approach to your network operations is an iterative process. Applying just a few of these concepts (and not all of them) to a single small element of your network is the best way to begin your NetDevOps journey. Even small, simple steps can pay large dividends. If you are manually generating device configurations, start to look at building a templating system for your device configurations and keeping it as code in a Git repository. This first step of only generating the configurations, leaving the engineer to actually qualify and apply them, can pave the way for many other steps in the future. Even in and of itself, simple configuration generation from a template can provide time savings and operational wins by removing human error and ensuring there is a single standard configuration.</p> <p>Secondly, no network is too special to take advantage of these concepts. Is your network very large and complicated? Is literally every single component unique? Pick a single place in your network to start implementing a Source of Truth (such as <a href="https://netbox.readthedocs.io/en/stable/">NetBox</a>). Having the data in an easily accessible API and/or GUI, and in a known standard format, will show value immediately for any other concepts you wish to apply. Does your network lack coherent standards? Start to enforce them on a single item of configuration, such as the AAA configuration, via Infrastructure as Code principals. Ensuring that the AAA configuration on all devices is a known/expected value at all times is extremely valuable from an operational and security perspective. Your compliance auditors will thank you!</p> <p>Each of the above strategies relies on the concept of a Minimum Viable Product. When you start to embrace NetDevOps concepts for operating your network, do not expect to solve all of your problems in the first try. Pick one straightforward problem to solve. Build your solutions and tooling to solve that problem, then begin using it in your day-to-day work. If it is successful, look at adapting your tools to solve additional problems. As described above, perhaps you tackle only the AAA configuration first. When that is fully managed in a Source of Truth, and as code instead of raw device configuration, you can expand to managing the SNMP configuration on all your devices as well. Next, you could tackle management ACLs to your devices or NTP and DNS settings, really it’s up to what makes sense in your network. In this way you will rapidly expand the portions of your network that are being managed via NetDevOps.</p> <h2 id="next-steps">Next Steps</h2> <p>This is the first in a series of posts on NetDevOps concepts, and I highly recommend staying tuned for the subsequent posts where we dive into some of these concepts in greater detail. We’ll have use cases and examples for each of them, including how to tie these all together into a coherent network automation strategy in the end.</p> <p>If you’re impatient, you can always reach out to <a href="mailto:info@networktocode.com">info@networktocode.com</a> for more information on our services and training to help you take the next steps on your NetDevOps journey.</p> <p>-Brett</p>Brett LykinsAs the networking industry moves to embrace NetDevOps, there are often many terms used that network practitioners are unfamiliar with. This can lead to many people feeling unsure about what role a new technology could play in their organization, or even being afraid to ask for clarity about a discussed concept. In fact, there are a few questions we see asked time and time again.Monitor Your Network With gNMI, SNMP, and Grafana2020-05-05T00:00:00+00:002020-05-05T00:00:00+00:00https://blog.networktocode.com/post/monitor_your_network_with_gnmi_snmp_and_grafana<p>This post, the second in a series focused on using Telegraf, Prometheus, and Grafana for Network Telemetry, will focus on transforming data and making additional graphs within Grafana. This post will cover the following topics:</p> <ul> <li><a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraf</a> <ul> <li>Gathering streaming data with gNMI, as an alternative to SNMP</li> <li>Changing data with Enum and Replacement</li> <li>Tagging Data</li> </ul> </li> <li><a href="https://www.prometheus.io">Prometheus</a> <ul> <li>Prometheus Query Language (PromQL)</li> </ul> </li> <li>Advancing Your <a href="https://www.grafana.com">Grafana</a> Capabilities <ul> <li>Variables</li> <li>Tables (BGP Table)</li> <li>Device Dashboards vs Environment Dashboards</li> </ul> </li> </ul> <p><a href="https://blog.networktocode.com/post/network_telemetry_for_snmp_devices/">Here</a> is where you can find the first post in the series on how to gather data from SNMP based devices.</p> <h2 id="purpose">Purpose</h2> <p>The intent of this post is to demonstrate how to bring multiple telemetry gathering methods into one. In our experience, a successful telemetry &amp; analytics stack should be able to collect data transparently from SNMP, telemetry Streaming (gNMI) and CLI/API. We covered <a href="https://blog.networktocode.com/post/network_telemetry_for_snmp_devices/">SNMP</a> and <a href="https://blog.networktocode.com/post/using_python_and_telegraf_for_metrics/">CLI</a> gathering in previous posts. This post will focus on gathering telemetry data with gNMI. Beyond the collection of data, when we are collecting the same type of data from multiple sources it’s important to ensure that the data will have the format in the database. In this post, we’ll look at how Telegraf can help normalize and decorate the data before sending it to the database.</p> <h2 id="network-topology">Network Topology</h2> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/topology.png" alt="Topology" /></p> <p>In the topology there is a mix of devices per the table below:</p> <table> <thead> <tr> <th>Device Name</th> <th>Device Type</th> <th>Telemetry Source</th> </tr> </thead> <tbody> <tr> <td>houston</td> <td>Cisco IOS-XE</td> <td>SNMP</td> </tr> <tr> <td>amarillo</td> <td>Cisco NXOS</td> <td>SNMP</td> </tr> <tr> <td>austin</td> <td>Cisco IOS-XR</td> <td>gNMI</td> </tr> <tr> <td>el-paso</td> <td>Cisco IOS-XR</td> <td>gNMI</td> </tr> <tr> <td>san-antonio</td> <td>Cisco IOS-XR</td> <td>gNMI</td> </tr> <tr> <td>dallas</td> <td>Cisco IOS-XR</td> <td>gNMI</td> </tr> </tbody> </table> <p>This blog post was created based on a Cisco-only environment, but if you’re interested in a multi-vendor approach check out <a href="https://twitter.com/damgarros/">@damgarros</a>’s NANOG 77 presentation on <a href="https://youtu.be/lzppzWGRHGo">YouTube</a>. That video shows how to use only gNMI to collect data from Arista, Juniper, and Cisco devices in a single place. This topology used here is meant to show the collection from multiple sources (SNMP + gNMI) in one.</p> <h3 id="application-installs-note">Application Installs Note</h3> <p>Software installation was covered in the previous post in this series, and I recommend taking a look at either that post for the particular installation instructions, or heading over to the product page referenced in the introduction.</p> <h2 id="overview">Overview</h2> <p>Here is the sequence of events that is being addressed in this post. I am starting with Telegraf gathering and collecting gNMI data from network devices. This is being processed into Prometheus metrics that will be scraped by a Prometheus server. Then Grafana will generate graphs on the data that is gathered and processed appropriately.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/sequence_diagram.png" alt="Sequence Diagram" /></p> <h3 id="gnmi-introduction">gNMI Introduction</h3> <p>gNMI stands for gRPC (Remote Procedure Calls) Network Management Interface. gRPC is a standard developed by Google that leverages HTTP/2 for transport using Protocol Buffers. gNMI is a gRPC-based protocol to get configuration and telemetry from a network device. All messages are defined as protocol buffers that intend to keep data as small as possible in the definition to be as efficient as possible. The data is serialized into the proper format by the device and sent off. This can hold quite a bit of information and is read by the receiver. You can take a look at the <a href="https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-specification.md#1-introduction">gNMI reference</a> for more detailed information.</p> <blockquote> <p>gNMI can handle not only telemetry data that this post is about, but also is intended to transport configuration about the device as well.</p> </blockquote> <p>So why use gNMI? gRPC is incredibly fast and efficient at transmitting data, and by extension gNMI is also fast and efficient.</p> <h3 id="gnmi-cisco-configuration">gNMI Cisco Configuration</h3> <p>gNMI is supported by many of today’s leading network vendors. As an example for configuring a Cisco IOS-XR device here are the configuration lines needed to enable gNMI in this demo environment:</p> <pre><code class="language-cisco">grpc port 50000 no-tls </code></pre> <p>Pretty straight to the point. If you wish to create a subscription model within the Cisco IOS-XR there are some more detailed configuration options available. Take a look at <a href="https://www.cisco.com/c/en/us/td/docs/iosxr/ncs5000/telemetry/65x/b-telemetry-cg-ncs5000-65x/b-telemetry-cg-ncs5000-65x_chapter_010.html">Cisco’s Guide to Configure Model-driven Telemetry</a></p> <h2 id="telegraf">Telegraf</h2> <h3 id="gathering-streaming-data-with-gnmi">Gathering Streaming Data With gNMI</h3> <p>The first step that I will be walking through is setting up Telegraf to subscribe to gNMI data. This is specifically to collect telemetry data from IOS-XR devices in this lab scenario. With gNMI, like other streaming Telemetry <em>subscriptions</em>, you need to tell the network device that you want to subscribe to receive the data. The device will then send the periodic updates of telemetry data to the receiver. There is a periodic “keep-alive” message sent to keep the subscription active by the subscriber.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/gnmi.png" alt="gNMI Subscription" /></p> <h4 id="gnmi-telegraf-configuration">gNMI Telegraf Configuration</h4> <p>Telegraf has a plugin that will take care of the subscription and the input section looks like the code below. Note that the subscription port is defined within the <em>addresses</em> section.</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[inputs.cisco_telemetry_gnmi]</span><span class="err">]</span> <span class="py">addresses</span> <span class="p">=</span> <span class="s">["dallas.create2020.ntc.cloud.tesuto.com:50000"]</span> <span class="py">username</span> <span class="p">=</span> <span class="s">&lt;redacted&gt;</span> <span class="py">password</span> <span class="p">=</span> <span class="s">&lt;redacted&gt;</span> <span class="c">## redial in case of failures after </span> <span class="py">redial</span> <span class="p">=</span> <span class="s">"10s"</span> <span class="py">tagexclude</span> <span class="p">=</span> <span class="s">["openconfig-network-instance:/network-instances/network-instance/protocols/protocol/name"]</span> <span class="nn">[[inputs.cisco_telemetry_gnmi.subscription]</span><span class="err">]</span> <span class="py">origin</span> <span class="p">=</span> <span class="s">"openconfig-interfaces"</span> <span class="py">path</span> <span class="p">=</span> <span class="s">"/interfaces/interface"</span> <span class="py">subscription_mode</span> <span class="p">=</span> <span class="s">"sample"</span> <span class="py">sample_interval</span> <span class="p">=</span> <span class="s">"10s"</span> <span class="nn">[[inputs.cisco_telemetry_gnmi.subscription]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"bgp_neighbor"</span> <span class="py">origin</span> <span class="p">=</span> <span class="s">"openconfig-network-instance"</span> <span class="py">path</span> <span class="p">=</span> <span class="s">"/network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state"</span> <span class="py">subscription_mode</span> <span class="p">=</span> <span class="s">"sample"</span> <span class="py">sample_interval</span> <span class="p">=</span> <span class="s">"10s"</span> <span class="nn">[[outputs.prometheus_client]</span><span class="err">]</span> <span class="py">listen</span> <span class="p">=</span> <span class="s">":9011"</span> </code></pre></div></div> <p>The configuration shows that you define the address, username, and password. This configuration also shows a redial setup in case of a failure and particular subscriptions to be excluded from the request.</p> <p>There are two subscriptions that we are subscribing to in this instance:</p> <ul> <li>openconfig-interfaces</li> <li>openconfig-network-instance (To collect BGP neighbor state)</li> </ul> <p>In each of these cases the sampling will be every 10 seconds in this demo, which means that the device will send the statistics every 10 seconds. Every 10 seconds there will be new metrics available to be scraped by Prometheus. The sample interval and Prometheus scrape interval should be the same interval.</p> <p>To collect the telemetry for this demo we are once again using the Prometheus client output from Telegraf. Telegraf will collect, process, and format the data that will then be scraped by a Prometheus server. Let’s take a look at what that output looks like next.</p> <h4 id="gnmi-output---bgp">gNMI Output - BGP</h4> <p>I’m only going to take a look at a few of the items in the output here. There are too many that would fill up too much real estate in your screen to make it worthwhile.</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># HELP bgp_neighbor_messages_received_UPDATE Telegraf collected metric</span> <span class="c"># TYPE bgp_neighbor_messages_received_UPDATE untyped</span> <span class="n">bgp_neighbor_messages_received_UPDATE</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">identifier</span><span class="o">=</span><span class="s2">"BGP"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"default"</span><span class="p">,</span><span class="na">neighbor_address</span><span class="o">=</span><span class="s2">"10.0.0.1"</span><span class="p">,</span><span class="na">peer_type</span><span class="o">=</span><span class="s2">"EXTERNAL"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">9</span> <span class="n">bgp_neighbor_messages_received_UPDATE</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">identifier</span><span class="o">=</span><span class="s2">"BGP"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"default"</span><span class="p">,</span><span class="na">neighbor_address</span><span class="o">=</span><span class="s2">"10.0.0.17"</span><span class="p">,</span><span class="na">peer_type</span><span class="o">=</span><span class="s2">"EXTERNAL"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">bgp_neighbor_messages_received_UPDATE</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">identifier</span><span class="o">=</span><span class="s2">"BGP"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"default"</span><span class="p">,</span><span class="na">neighbor_address</span><span class="o">=</span><span class="s2">"10.0.0.25"</span><span class="p">,</span><span class="na">peer_type</span><span class="o">=</span><span class="s2">"EXTERNAL"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">9</span> <span class="n">bgp_neighbor_messages_received_UPDATE</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">identifier</span><span class="o">=</span><span class="s2">"BGP"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"default"</span><span class="p">,</span><span class="na">neighbor_address</span><span class="o">=</span><span class="s2">"10.0.0.9"</span><span class="p">,</span><span class="na">peer_type</span><span class="o">=</span><span class="s2">"EXTERNAL"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">9</span> </code></pre></div></div> <blockquote> <p>Some items were removed to assist in readability and message delivery</p> </blockquote> <p>The output is what you would expect. A list of the neighbors identified by the <code class="highlighter-rouge">neighbor_address</code> key in the tags. With the BGP subscription you get:</p> <ul> <li>bgp_neighbor_established_transitions</li> <li>bgp_neighbor_last_established</li> <li>bgp_neighbor_messages_received_NOTIFICATION</li> <li>bgp_neighbor_messages_received_UPDATE</li> <li>bgp_neighbor_messages_sent_NOTIFICATION</li> <li>bgp_neighbor_messages_sent_UPDATE</li> <li>bgp_neighbor_peer_as</li> <li>bgp_neighbor_peer_as</li> <li>bgp_neighbor_queues_output</li> <li>bgp_neighbor_session_state</li> </ul> <h4 id="gnmi-output---interfaces">gNMI Output - Interfaces</h4> <p>There are a lot of statistics sent back with the interface subscription. We’ll be taking a look at just one of them, interface_state_counters_in_octets, in this instance. We get a look at each interface and its associated counter in the data.</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/0"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mf">3.2022595</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/1"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mf">3.077077</span><span class="n">e</span><span class="o">+</span><span class="mi">06</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/2"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mf">1.5683204947</span><span class="n">e</span><span class="o">+</span><span class="mi">10</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/3"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mf">1.627459</span><span class="n">e</span><span class="o">+</span><span class="mi">06</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/4"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mf">1.523158</span><span class="n">e</span><span class="o">+</span><span class="mi">06</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/5"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">35606</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/6"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">35318</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/7"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">35550</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/8"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">35878</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/9"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">36684</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"MgmtEth0/RP0/CPU0/0"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mf">2.2033861</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"Null0"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">interface_state_counters_in_octets</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"SINT0/0/0"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">0</span> </code></pre></div></div> <p>This is great information, and we have seen something similar with SNMP. Now to the transformations that Telegraf offers.</p> <h3 id="changing-data-with-enum-and-replacement">Changing data with Enum and Replacement</h3> <p>Telegraf has a couple of different <strong>processors</strong> available to process the data and get it into a format that is appropriate and consistent for your environment. Let’s take a look at a couple of them and how they are used in the use case here.</p> <h4 id="telegraf---enum">Telegraf - Enum</h4> <p>The first processor used is within the BGP data collection. When the data comes back from the subscription for a BGP session state, it comes back as a string value. It is great to be able to read the current state, but is not very helpful for a Time Series Data Base (TSDB). A TSDB is looking to get the data back represented as a number of some sort, either an integer or a float. The whole point is to measure information at a point in time.</p> <p>The Telegraf process then looks like this:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/telegraf_process.png" alt="Telegraf Process" /></p> <p>To accommodate this, the use of the <a href="https://github.com/influxdata/telegraf/tree/master/plugins/processors/enum">enum processor</a> is put into action. The following is added to the configuration:</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[processors.enum]</span><span class="err">]</span> <span class="nn">[[processors.enum.mapping]</span><span class="err">]</span> <span class="c">## Name of the field to map </span> <span class="py">field</span> <span class="p">=</span> <span class="s">"session_state"</span> <span class="nn">[processors.enum.mapping.value_mappings]</span> <span class="py">IDLE</span> <span class="p">=</span> <span class="s">1</span> <span class="py">CONNECT</span> <span class="p">=</span> <span class="s">2</span> <span class="py">ACTIVE</span> <span class="p">=</span> <span class="s">3</span> <span class="py">OPENSENT</span> <span class="p">=</span> <span class="s">4</span> <span class="py">OPENCONFIRM</span> <span class="p">=</span> <span class="s">5</span> <span class="py">ESTABLISHED</span> <span class="p">=</span> <span class="s">6</span> </code></pre></div></div> <p>Within the <code class="highlighter-rouge">session_state</code> any instances of the string <code class="highlighter-rouge">IDLE</code> will be replaced with the integer <code class="highlighter-rouge">1</code>. This is then set up to store for the long term within a TSDB. This is the same then for all of the rest of the states as well, with ESTABLISHED states stored as the integer <code class="highlighter-rouge">6</code>. Later in Grafana this number will be reversed into the word for representation on a graph.</p> <h4 id="telegraf---rename">Telegraf - Rename</h4> <p>The second <strong>processor</strong> that is used in this demo is the <a href="https://github.com/influxdata/telegraf/tree/master/plugins/processors/rename">rename processor</a>. This rename processor has a function to replace items. Below is what is used to rename the SNMP counters that are collected for SNMP devices and moved to match the names for gNMI.</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[processors.rename]</span><span class="err">]</span> <span class="nn">[[processors.rename.replace]</span><span class="err">]</span> <span class="py">field</span> <span class="p">=</span> <span class="s">"ifHCInOctets"</span> <span class="py">dest</span> <span class="p">=</span> <span class="s">"state_counters_in_octets"</span> <span class="nn">[[processors.rename.replace]</span><span class="err">]</span> <span class="py">field</span> <span class="p">=</span> <span class="s">"ifHCOutOctets"</span> <span class="py">dest</span> <span class="p">=</span> <span class="s">"state_counters_out_octets"</span> </code></pre></div></div> <p>This states that if looking for <code class="highlighter-rouge">ifHCInOctets</code> - replace the field with <code class="highlighter-rouge">state_counters_in_octets</code>. And the same for the outbound with <code class="highlighter-rouge">ifHCOutOctets</code> replacing with <code class="highlighter-rouge">state_counters_out_octets</code>. Once Telegraf has replaced those fields, you can use the data gathered with SNMP and that with gNMI in the same queries!</p> <h3 id="tagging-data">Tagging Data</h3> <p>Tagging data is one of the biggest favors that you can do for yourself. Tagging gives flexibility for future analysis, comparison, and graphing data points. For instance if you tag your BGP neighbors with the upstream peer provider, you will be able to easily identify the interfaces which belong to that particular peer. If you have four geographically diverse interfaces, this will allow you to quickly identify the interfaces based on the tag rather than manually deciding later at the time of graphing or alerting.</p> <p>This brings us to the third Telegraf <strong>prcoessor</strong> in this post <a href="https://github.com/influxdata/telegraf/tree/master/plugins/processors/regex">regex processor</a>. This processor will take a regex search pattern, and complete the replacement. Something new here is that if you use the <code class="highlighter-rouge">result_key</code> option, a new field will be created and not replace what is there, resulting in a whole new field. This regex replacement will add a new tag for <code class="highlighter-rouge">intf_role</code> using <code class="highlighter-rouge">server</code> as the definition.</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nn">[[processors.regex.tags]</span><span class="err">]</span> <span class="py">key</span> <span class="p">=</span> <span class="s">"name"</span> <span class="py">pattern</span> <span class="p">=</span> <span class="s">"^GigabitEthernet0</span><span class="se">\\</span><span class="s">/0</span><span class="se">\\</span><span class="s">/0</span><span class="se">\\</span><span class="s">/2$"</span> <span class="py">replacement</span> <span class="p">=</span> <span class="s">"server"</span> <span class="py">result_key</span> <span class="p">=</span> <span class="s">"intf_role"</span> </code></pre></div></div> <p>Looking at just this particular replacement in the output, there are now additional tags for graphing, alerting, and general data analysis.</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">interface_state_admin_status</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">intf_role</span><span class="o">=</span><span class="s2">"server"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/2"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">1</span> <span class="n">interface_state_counters_in_broadcast_pkts</span><span class="p">{</span><span class="na">device</span><span class="o">=</span><span class="s2">"dallas"</span><span class="p">,</span><span class="na">intf_role</span><span class="o">=</span><span class="s2">"server"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet0/0/0/2"</span><span class="p">,</span><span class="na">role</span><span class="o">=</span><span class="s2">"leaf"</span><span class="p">}</span> <span class="mi">8</span> </code></pre></div></div> <h2 id="prometheus">Prometheus</h2> <h3 id="prometheus-query-language">Prometheus Query Language</h3> <p>Throughout the upcoming Grafana section you will get to see a number of PromQL (Prometheus Query Language) queries. Take a look at the Prometheus.io <a href="https://prometheus.io/docs/prometheus/latest/querying/basics/">basics page</a> to get full documentation of the queries that are available. It is these queries that are being executed that will be used by Grafana to populate the data in the graphs.</p> <h2 id="grafana">Grafana</h2> <p>Through the next several sections you will get to see how to build a dashboard using PromQL and variable substitution, among other topics to build these dashboards on a per device basis. From a device perspective dashboard, these two dashboards look different in the number of interfaces and neighbors displayed, but they are born out of the same dashboard configuration.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_device_01.png" alt="Grafana Device Overview 1" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_device_02.png" alt="Grafana Device Overview 2" /></p> <h3 id="variables">Variables</h3> <p>First, you’ll need to set up the variable device that is seen on the upper left hand corner of the dashboard. When I first started building dashboards I remember that this may be one of the most important skills when looking to level up your Grafana dashboards, as it will allow you to get significant amount of value while reducing re-work to keep adding additional devices into a panel.</p> <h4 id="variables---adding-to-your-dashboard">Variables - Adding to Your Dashboard</h4> <p>To add a dashboard wide variable follow these steps:</p> <ul> <li>Navigate into your dashboard</li> <li>Click on the gear icon in the upper right hand navigation section</li> <li>Click on <em>Variables</em></li> <li>Click the green <em>New</em> button on the right hand side</li> </ul> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_variables.png" alt="Grafana Variables Add" /></p> <blockquote> <p>This image already had a variable added, which is the devices</p> </blockquote> <p>Once in the new screen you will have the following image:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_add_variable.png" alt="Grafana Add Variable" /></p> <p>Here you will be defining a PromQL query to build out your device list. In the bottom section of the screen you see the heading of <strong>Preview of values</strong>. Here you will be able to observe a sample of what the query will result in for your variables.</p> <p>The fields that you need to fill in include:</p> <table> <thead> <tr> <th>Field</th> <th>Information Needed</th> </tr> </thead> <tbody> <tr> <td>Name</td> <td>Name of the variable you wish to use</td> </tr> <tr> <td>Type</td> <td>Query</td> </tr> <tr> <td>Data source</td> <td>Prometheus</td> </tr> <tr> <td>Refresh</td> <td>When would you like to refresh the variables? Use the dropdown to select which fits your org best</td> </tr> <tr> <td>Query</td> <td>PromQL to get the data points</td> </tr> <tr> <td>Regex</td> <td>Regex search to reduce the search results</td> </tr> </tbody> </table> <p>You can experiment with the rest of the fields as you see fit to get your variables defined properly.</p> <blockquote> <p>Once you have your search pattern set, make sure to click Save on the left hand side of Grafana.</p> </blockquote> <p>To reference the variables once they are created, you use the dollar sign ($) in front of the name for Grafana to execute that as a variable within a query. Within the Legend area the use of Jinja-like formatting of the double curly braces will identify as a variable.</p> <h4 id="grafana-plugins">Grafana Plugins</h4> <p>Grafana is extensible via the use of plugins. There are quite a few plugins available for Grafana and I encourage you to take a look at the <a href="https://grafana.com/grafana/plugins">plugin page</a> to be able to search for what you may want to use on your own. There are 3 types of plugins: Panel, Data Source, and App to help extend the Grafana capabilities.</p> <h3 id="grafana-discrete-plugin-bgp-state-over-time">Grafana Discrete Plugin (BGP State Over Time)</h3> <p>The next table to take a look at is using a feature within Grafana that allows you to add plugins. I’ll look at how we were able to build out the graph. This can help to identify issues quickly in your environment by just looking at the dashboard. Take a look at this where there was a BGP neighbor that was down. It is quickly identifiable on the dashboard and that action will be needed.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_bgp_down.png" alt="Grafana BGP Down" /></p> <p>The two panels in the top row are using a Grafana plugin called <a href="https://grafana.com/grafana/plugins/natel-discrete-panel">Discrete</a>. This provides data values in the color that is defined within the configuraiton over time. The panel then gives you the ability to hover over to see the changes over time. You install the plugin with the <code class="highlighter-rouge">grafana-cli</code> ccommand:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grafana-cli plugins <span class="nb">install </span>natel-discrete-panel <span class="nb">sudo </span>systemctl restart grafana </code></pre></div></div> <p>Once installed you can setup a new panel with the panel type of <em>Discrete</em>.</p> <p>The panel will be created with the following parameters</p> <h5 id="bgp-session-status---discrete-panel-1">BGP Session Status - Discrete Panel 1</h5> <p>Query: Prometheus data source</p> <table> <thead> <tr> <th>Key</th> <th>Value</th> </tr> </thead> <tbody> <tr> <td>Metrics</td> <td>bgp_neighbor_session_state{device=”$device”}</td> </tr> <tr> <td>Legend</td> <td>{{ device }} {{ neighbor_address }}</td> </tr> <tr> <td>Min step</td> <td><leave blank=""></leave></td> </tr> <tr> <td>Resolution</td> <td>1/1</td> </tr> <tr> <td>Fromat</td> <td>Time series</td> </tr> <tr> <td>Instant</td> <td>Unchecked</td> </tr> </tbody> </table> <p>You will note that in the Metrics section the variable reference is <strong>$device</strong>, noted by the dollar sign in the device name. The <code class="highlighter-rouge">Legend</code> has two variables included in both the <code class="highlighter-rouge">device</code> and <code class="highlighter-rouge">neighbor_address</code> within the Legend. This is what gets displayed in the discrete table for each line.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_discrete_page1.png" alt="Grafana Discrete Page 1" /></p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_discrete_color_selection.png" alt="Grafana Discrete Panel 1 Color Selection" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_discrete_value_mappings.png" alt="Grafana Discrete Panel 1 Value Mappings" /></p> <h5 id="critical-interface-state---discrete-panel-2">Critical Interface State - Discrete Panel 2</h5> <p>Now because the interfaces have been assigned a label, a discrete panel can be generated to show the interface state, along with the role. For demonstration, we are naming this panel <code class="highlighter-rouge">Critical Interfaces</code>, the interfaces for Servers or Uplinks to other network devices have been labeled as with <code class="highlighter-rouge">server</code> or <code class="highlighter-rouge">uplink</code> accordingly. By querying for any role we can get thi information into the panel. The legend has the value of <code class="highlighter-rouge">{{device}} {{name}} &gt; {{intf_role}} &gt; {{neighbor}}</code> to get to the appropriate mappings that are to be shown. This is the resulting panel:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_critical_intf_state.png" alt="Grafana Intf State" /></p> <p>To get to this panel we can see the following discrete panel settings. This panel build is a little bit smaller, but gets a lot of information added into a panel!</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_intf_state_pg1.png" alt="Grafana Intf Page 1" /> <img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_inft_state_txt_color1.png" alt="Grafana Intf Page 2" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part2/grafana_intf_state_mappings.png" alt="Grafana Intf Page 2" /></p> <h3 id="device-dashboards-vs-environment-dashboards">Device Dashboards vs Environment Dashboards</h3> <p>This is not a pick one over the other segment, rather this is saying that both should be present in your Grafana Dashboard setup.</p> <p>In this post I have gone through and shown a lot of device-specific panels. The value here is that you are able to get to a device by device-specific view very quickly, without having to create a separate page for each and every device in your environment. That said, that the panels can be expanded by the use of variables to identify individual devices.</p> <p>You should also look at using an environment dashboard where you are getting specific pieces of information to match your need. Need to know what an application performance looks like that includes Network, Server, Storage, and Application performance? You can work to build out these dashboards by hand, but this will take longer to build. As you leverage tags in the gathering of telemetry into your TSDB, you will be on your way to building dashboards in an automated fashion to get the big picture very quickly.</p> <h2 id="conclusion">Conclusion</h2> <p>Hopefully this has been helpful. Again, check out the <a href="https://blog.networktocode.com/post/network_telemetry_for_snmp_devices/">first post</a> in the series if you need more information on these tools generally. In the next post, I will cover how to advance your Prometheus environment with monitoring remote sites and a I’ll discuss a couple of methodologies to enable alerting within the environment.</p> <p>The <a href="blog.networktocode.com/post/monitoring_websites_and_alerting_with_telegraf_and_prometheus/">next post</a> will include how to alert using this technology stack.</p> <p>-Josh</p>Josh VanDeraaThis post, the second in a series focused on using Telegraf, Prometheus, and Grafana for Network Telemetry, will focus on transforming data and making additional graphs within Grafana. This post will cover the following topics:The State of Network Operation Through Automation / NetDevOps Survey 20192020-04-28T00:00:00+00:002020-04-28T00:00:00+00:00https://blog.networktocode.com/post/state-network-operations-netdevops-survey-2019<p>Network automation has become prevalent in the network industry over the last few years and yet we have little data on the state of the market today. There is a lot of discussion about Ansible and Python but beyond that there is not a good source for those seeking to understand what tools are being used by different companies, what operations people are automating the most/least, or even how long it is taking on average to learn network automation.</p> <p>The <a href="https://dgarros.github.io/netdevops-survey/">NetDevOps Survey project</a> was started in 2016 to address these questions and more. The idea was to start a survey about the network automation industry to help bring clarity to these questions. Network automation is deeply rooted in open source, and it was decided to make the project open and collaborative, following the best practices from open source projects. The intention was to have the survey be both anonymous and vendor neutral.</p> <p>When the initiative started in 2016, 20 of us came together to define the first set of questions. At the time, I was working at Juniper and Jason Edelman was on the early days of Network to Code, but we worked together collaboratively on the project.</p> <p>After a few years of inactivity, the second edition of the survey was released in October 2019. This is in big part thanks to Francois Caen who pushed for it to come back, and provided the help to organize this new edition.</p> <p>As we worked on updating the survey for the 2019 edition, we tried to reuse the same questions as much as possible to allow us to compare the evolution of the responses over time. We also added a completely new section to understand how organizations and individuals are transitioning into network automation. This section was suggested by the community and was a welcome addition–the insights we are getting from it have been very interesting.</p> <h2 id="participants-in-the-2019-netdevops-survey">Participants in the 2019 NetDevOps Survey</h2> <p>The 2019 edition resulted in 300 responses which is about the same number that as the first edition in 2016.</p> <p>The first set of questions was designed to give a better understanding of the type of networks and environments the participants come from.</p> <p>Looking at the graphs below, there is a good distribution of participants both in term of types of environment and network sizes with an average around 1000 devices. It’s interesting to note that while there is a lot of coverage (blogs/podcasts/press) around network automation, most sources are focused on data centers. 60% of the participants in this survey are also managing Campus and/or WAN networks, but the data center is still the environment mentioned by most participants (~75%). This number has declined slightly since 2016, when data centers were mentioned by 80% of participants. These numbers are in line with the migration to the cloud that we here at Network to Code have observed with our customers.</p> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/env-type_tool_perc.png" alt="What type of environment are you managing" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/netdevops_survey_2019_env-nbr-devices_bar_perc.png" alt="How many network devices are you managing" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/netdevops_survey_env-type_compare.png" alt="Comparison of type of environment over tie" /></p> <h2 id="state-of-network-automation-through-automation">State of Network Automation Through Automation</h2> <p>The main section of the survey is meant to understand which day-to-day operations are currently automated and which tools are used for each use case. We put together a list of 13 of the most common operations spanning topics such as configuration management, troubleshooting, and software upgrades.</p> <p>While the 3 main operations that are automated today are focused on configuration management, it is interesting to see a significant increase around <em>compliance check</em> and <em>pre-post changes</em>. At the bottom of the graph we are also seeing a noticeable increase in responses on <em>troubleshooting</em> and <em>software qualification</em>.</p> <!-- ![xxx](../../../static/images/blog_posts/netdevops-survey-2019/operation-automated_tool_perc.png) --> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/netdevops_survey_operation-automated_compare.png" alt="What operations in your network are currently automated" /></p> <h3 id="configuration-management">Configuration Management</h3> <p>If we look specifically at <em>configuration management</em>, it’s interesting to see that 60% of the participants are using Ansible and roughly the same percentage are also using some scripts at different levels of abstraction. Nornir and Saltstack are both used by ~10% of the participants, an impressive achievement for these 2 open source projects that have been mainly driven/promoted by the community. Kudos to David Barroso, Mircea Ulinic, Kirk Byers, and Dmitry Figol.</p> <blockquote> <p><em>the graph is a little bit misleading because we split scripts in 2 categories this year but if we add #2 and #3 we are close to 60%</em>.</p> </blockquote> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/config-gen-deploy_tool_perc.png" alt="solution configuration management" /></p> <p>Interestingly, on average, participants selected more than 2 responses to this question, which means that a lot of participants are using more than 2 solutions to generate and deploy configuration. This fact got me curious, so I decided to dive deeper into the responses to understand which tools people are mostly using in addition to Ansible.<br /> In the graph below, I narrowed down the responses to only the participants that selected Ansible. It is interesting to note that 12% of them are also using Nornir and more than 60% are using some scripts in addition to Ansible. There is not enough information to truly explain the reasoning behind these results but it is something I think it would be interesting to investigate deeper in the next edition.</p> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/config-gen-deploy_tool_sub_answer.png" alt="sub response, configuration management" /></p> <blockquote> <p>As a side note, there are a lot of interesting analytics that haven’t been done yet on the data, such as diving deeper into each response or exploring how certain groups of participants respond to specific questions. If you are interested in doing some analysis on your own, the database and some tools are available in GitHub.</p> </blockquote> <h3 id="maturity-level--automated-changes">Maturity level / Automated Changes</h3> <p>At Network to Code, we often refer to network automation as a journey, which takes couple of years on average. As a part of the survey, I was personally interested in understanding the current level of maturity of our industry. How fast or how slowly is the market evolving? In the graph below, we can see that 37% of the participants have been leveraging automation in a significant way for less than 1 year and another 29% have been for 1 to 2 years. These numbers will be interesting to monitor year over year.</p> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/transition-team-how-long_pie.png" alt="team how long" /></p> <p>Another way to measure the level of maturity is to look at how manual and automated changes are coexisting, or not, within an organization. Usually, in the most advanced environment, manual changes are completely forbidden. There are two questions in the survey that give some good insight on this topic:</p> <ul> <li>Do you allow configuration to be manually changed in the CLI in addition to automated deployment?</li> <li>Have you automated the decision to deploy a new configuration?</li> </ul> <p>To the first question, <strong>14.5% of the participants indicated that they don’t allow manual changes in addition to automated deployment</strong>. This marks a significant increase from 2016, where only 8.8% of the participants responded “No”. And 46% of the participants indicated that they have fully or partially automated the decision to deploy a new configuration.</p> <!-- ![xxx](../../../static/images/blog_posts/netdevops-survey-2019/config-automated-changes_pie.png) --> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/config-decide-changes_pie.png" alt="decision automated change" /></p> <h3 id="anomaly-detection--telemetry--analytics">Anomaly Detection / Telemetry &amp; Analytics</h3> <p>There has been an increase in conversations and projects surrounding telemetry and analytics in the last few years. A lot of my friend and colleagues working for webscale companies have reported using or building new telemetry and analytics stacks that are becoming an integral part of automation platforms.</p> <p>Interestingly, the two questions related to anomaly detection/telemetry and analytics are showing a different picture. The majority of participants are still leveraging traditional monitoring solutions based on SNMP/Syslog and leveraging mostly Up/Down signals to detect issues in the network. With only 40% of the participants leveraging flows data and 10% using end to end probes.</p> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/anomaly-detection-sources_tool_perc.png" alt="anomaly detection sources" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/anomaly-detection-signal_tool_perc.png" alt="anomaly detection signal" /></p> <p>My personal take-away is that <strong>today, telemetry and analytics is where network automation was 3-4 years ago</strong> with a significant disconnect between the most advanced companies and traditional enterprises.<br /> A few years ago, network automation was not even a topic for most enterprise engineers, while a handful of companies were already all-in. At the pace at which the industry is moving these days, I think telemetry and analytics will make some progress in the enterprise space in the next couple of years.</p> <h2 id="transition-to-network-automation">Transition to Network Automation</h2> <p>As mentioned earlier, based on the input of the community we added a new section to understand how both organizations and individuals are transitioning to network automation, how long is it taking, what strategies are they adopting and more.</p> <h3 id="team--org">Team / Org</h3> <p>The results to the question <em>what actions did you team take to transition to network automation</em> show that most enterprises don’t have a concrete strategy and are relying on their existing staff to learn on their own or are <em>just</em> sending them to training. Less than 20% of the participants mentioned hiring a dedicated resource for network automation and less than 10% mentioned working with a consulting firm to help them in their automation journey.</p> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/transition-team-actions_tool_perc.png" alt="org transition" /></p> <h3 id="individual">Individual</h3> <p>As individuals, most participants (81%) estimated that it took them less than 1000 hours to learn network automation and 25% even estimated less than 200 hours. The majority of participants had to invest some personal time to learn new skills, while 40% where able to learn on the job either part-time or full-time.</p> <p>Overall 34% of the participants mentioned that it took them less than 1 year to make the transition and another 45% estimated the transition at 1 to 2 years.</p> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/transition-self-nbr-hours_pie.png" alt="self transition nbr hours" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/transition-self-how-long_bar_perc.png" alt="self transition how long" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/transition-self-find-time_tool_perc.png" alt="self transition find time" /></p> <h2 id="industry-trends">Industry Trends</h2> <p>The last section of the survey focuses on trends. What topics and tools are, or are not, top of mind right now? For this section, we selected a dozen tools and another dozen topics. For each of them we asked the participants if they are:</p> <ul> <li>Already using them in production (dark green)</li> <li>Currently evaluating them (green)</li> <li>Thinking about it (light green)</li> <li>Not interested (grey)</li> <li>No idea (orange)</li> </ul> <p>There is a of information in the graphs below, so it is hard to cover everything but my personal takeaways are:</p> <ul> <li>35% of the participants are already using a Source of Truth (SoT) in production and another 50% are either evaluating one or thinking about it. In our experience at Network to Code, a SoT (or SoT strategy) is a critical component of a network automation strategy and it often seems like the topic is not getting enough attention. It’s very encouraging to see such high level of interest in this topic.</li> <li>The level of adoption for ChatOps is still relatively low, with only ~15% of the participants using them in production and almost 30% of the participants expressing no interest. At NTC, we are seeing a lot of interesting use cases that can be solved with ChatOps and we are expecting this technology to get adopted more broadly in the future.</li> <li>DevOps, Infrastructure as Code (IaC), and CI/CD are getting a lot of interest and are getting used in production more and more.</li> </ul> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-topics_stack.png" alt="trends topic 2019" /></p> <p>On the tools side, there is even more going on. My personal takeaways are:</p> <ul> <li>Git and Ansible are used in production at a massive scale–both solutions are used in production by ~70% of the participants.</li> <li>Modern monitoring tools like ELK, Grafana, Prometheus &amp; Influx are used in production by more than 30% of the participants. These numbers are encouraging but don’t necessarily align with the previous responses to the anomaly detection questions. <em>This could be explained if both new and legacy solutions are coexisting right now and the new solutions are still mostly used for visibility but are not used yet for alerting.</em></li> <li>Nornir and Network Verification Software (Batfish, Forward Networks, etc. ) have a disproportionate ratio of production deployment compared to the level of participants evaluating or considering them. These two technologies will be interesting to monitor in the upcoming months/years.</li> </ul> <p><img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-tools_stack.png" alt="trends tools 2019" /></p> <h3 id="evolution-over-time">Evolution over Time</h3> <p>Another interesting way to look at these results is to examine the evolution of the responses between 2016 and 2019. I selected a few below that I found the most interesting/surprising.</p> <p>Looking at Git and Ansible, it’s interesting to see that for both technologies the level of interest was already very high in 2016 but the deployment in production were significantly lower. Both have gained significant market share in the last few years. <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-tools_trend-tools-git_compare.png" alt="trend tools git" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-tools_trend-tools-ansible_compare.png" alt="trend tools ansible" /></p> <p>On the other side, solutions like Chef and Puppet have followed the opposite trajectory with a significant decrease in interest and deployment in production from the participants over the last three years. <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-tools_trend-tools-chef_compare.png" alt="trend tools chef" /></p> <p>The results surrounding event driven automation are surprising because, while the level of interest was already very high in 2016, the number of deployments in production has not significantly increased between 2016 and 2019. One explanation could be that EDA requires a higher level of maturity and expertise to be properly deployed in production. Based on the previous results, with 2/3 of the participants using automation for less than 2 years, it’s likely that the market has not reached this level of maturity yet. <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-topics_trend-topics-event-driven_compare.png" alt="trend topic eda" /></p> <p>Last but not least, it’s interesting to visualize the progression of Infrastructure as Code, CI/CD, and NAPALM over the last few years. Increased interest in these topics confirms what we are witnessing every day with our customers. <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-topics_trend-topics-iac_compare.png" alt="trend topic iac " /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-topics_trend-topics-ci-cd_compare.png" alt="trend topic ci cd" /> <img src="../../../static/images/blog_posts/netdevops-survey-2019/trend-tools_trend-tools-napalm_compare.png" alt="trend topic napalm" /></p> <blockquote> <p><a href="https://github.com/dgarros/netdevops-survey/tree/master/graphs/png">More graphs are available in Github</a></p> </blockquote> <h2 id="netdevops-survey">NetDevOps Survey</h2> <p>If you’re interested in learning more about about the NetDevOps Survey project, you can find the project on <a href="https://github.com/dgarros/netdevops-survey/">Github</a> or join the conversation in the #netdevops_survey channel in the <a href="http://slack.networktocode.com/">Network to Code slack channel</a>.</p> <p>All the results are available in Github in different formats:</p> <ul> <li><a href="https://github.com/dgarros/netdevops-survey/tree/master/results">Raw TSV files</a></li> <li><a href="https://github.com/dgarros/netdevops-survey/tree/master/results">SQLite Database</a> with a Python library to query it</li> <li><a href="https://github.com/dgarros/netdevops-survey/tree/master/graphs/png">150+ Graphs similar to the one used in this blog</a></li> </ul> <p>The plan is to start working on the 2020 Edition around August 2020 to have it ready to accept responses by October 2020.</p> <h3 id="how-to-help">How to help</h3> <p>If you’re interested in helping with the project or providing feedback, the best way to reach us is to open an issue on GitHub or join us in Slack.</p> <p>At this point one of our biggest concerns is increasing the visibility of the project. The more participants we can get for the next edition, the deeper the insights and the better the project. Being community driven, we’ve been lacking marketing support to reach a broader and more diverse audience. Anything you can do to help here would go a long way.</p> <p>Thanks for reading all the way to the end and for your interest in this project. If you are interested in diving deeper, the complete <a href="https://dgarros.github.io/netdevops-survey/reports/2019">results of the 2019 Edition</a> are available online.<br /> I am personally looking forward to reading more analysis and hearing more perspectives on these results. I’m also looking forward to the next edition.</p> <p>-Damien (@damgarros)</p>Damien GarrosNetwork automation has become prevalent in the network industry over the last few years and yet we have little data on the state of the market today. There is a lot of discussion about Ansible and Python but beyond that there is not a good source for those seeking to understand what tools are being used by different companies, what operations people are automating the most/least, or even how long it is taking on average to learn network automation.Network Telemetry for SNMP Devices2020-04-21T00:00:00+00:002020-04-21T00:00:00+00:00https://blog.networktocode.com/post/network_telemetry_for_snmp_devices<p>This is going to be the first in a multipart series where I will be taking a look at a method to get telemetry data from network devices into a modern Time Series Database (TSDB).</p> <p>In this particular post I will be working through adding SNMP based device data into the <a href="https://prometheus.io/">Prometheus</a> TSDB. I will be using <a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraf</a> from <a href="https://www.influxdata.com/">InfluxData</a> to gather the SNMP data from <a href="https://www.cisco.com">Cisco</a> devices on an emulation platform. Prometheus will then scrape the data from Telegraf and store the metrics. I will then show in how to <em>start</em> building out graphs within <a href="https://www.grafana.com">Grafana</a>.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/sequence.png" alt="Sequence Diagram" /></p> <p>Here is an example of a Grafana dashboard that could be made:</p> <p>From <a href="https://twitter.com/SNMPguy/status/1139306547459178497/photo/1">@SNMPguy for Cisco Live</a></p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/ciscoliveusgraph.png" alt="CiscoLive US Grafana" /></p> <h2 id="gathering-data---concepts">Gathering Data - Concepts</h2> <p>At this point there are many advertisements that <code class="highlighter-rouge">Streaming Telemetry</code> is a must have in this day and age for gathering network device metrics. However, there are still quite a few network devices that do not support Streaming Telemetry in networks today. If you have a large deployment of these types of devices are you out of luck if you want to use a modern TSDBs? No you are not. Gathering data into a TSDB is all about just that, gathering data. If you gather the data via Streaming Telemetry or SNMP, either way, you are gathering the data. Streaming Telemetry is generally thought of as less intensive of a process on devices and has some other benefits. So if you can gather the data with Streaming Telemetry, then you should. But if you must use SNMP, this article is here to help you out.</p> <h3 id="gathering-data---cli-parsing">Gathering Data - CLI Parsing</h3> <p>Through this post you will see information gathered via SNMP. If you wish to look at using CLI parsing as a method to get metrics, take a look at our previous <a href="https://blog.networktocode.com/post/using_python_and_telegraf_for_metrics/">post</a>.</p> <h2 id="gathering-data-via-snmp">Gathering Data via SNMP</h2> <p>This post will outline what Telegraf has to offer when it comes to gathering data. Telegraf is an application made available by InfluxData that will gather data from various places. The gathering of information is known as an <strong>input</strong>. Then you will see how to send or make the data available for TSDB - Prometheus. These are known as the <strong>outputs</strong>. You can take a look at the <a href="https://docs.influxdata.com/telegraf/v1.14/plugins/plugin-list/">plugins list</a> to see the list of plugins for Telegraf 1.14, which as of this writing (2020-04-21) is the latest version.</p> <p>Telegraf has the capability to also transform, tag, and modify data as needed. Portions of that will be covered in a follow-up post.</p> <p>Within the configuration files you can setup to have a single Telegraf process poll multiple devices or you can have multiple Telegraf processes or containers, with each one polling one device. In this post I will be showing how to configure a single device to be polled by Telegraf. By this nature you can have your Telegraf agents centralized or distributed as needed.</p> <p>A Prometheus nuance is that Prometheus will assume a device is down if Prometheus is unable to scrape the device’s metric page. But collecting SNMP data, the collection will be of the Telegraf process, which should get tied to its ability to poll the device. So additional configuration will be needed for Prometheus alerting in respects to reading metrics from a Telegraf plugin.</p> <p>The SNMP configuration is made within the Telegraf configuration. This configuration may look like the following:</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[inputs.snmp]</span><span class="err">]</span> <span class="py">agents</span> <span class="p">=</span> <span class="s">["minneapolis.ntc"]</span> <span class="py">version</span> <span class="p">=</span> <span class="s">2</span> <span class="py">community</span> <span class="p">=</span> <span class="s">"SecuredSNMPString"</span> <span class="py">interval</span> <span class="p">=</span> <span class="s">"60s"</span> <span class="py">timeout</span> <span class="p">=</span> <span class="s">"10s"</span> <span class="py">retries</span> <span class="p">=</span> <span class="s">3</span> <span class="nn">[[inputs.snmp.field]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"hostname"</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">".1.3.6.1.2.1.1.5.0"</span> <span class="py">is_tag</span> <span class="p">=</span> <span class="s">true</span> <span class="nn">[[inputs.snmp.field]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"uptime"</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"1.3.6.1.2.1.1.3.0"</span> <span class="nn">[[inputs.snmp.field]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"cpmCPUTotal1min"</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">".1.3.6.1.4.1.9.9.109.1.1.1.1.4.7"</span> <span class="c">##################################################### </span> <span class="c"># </span> <span class="c"># Gather Interface Statistics via SNMP </span> <span class="c"># </span> <span class="c">##################################################### </span> <span class="c"># IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards. </span> <span class="nn">[[inputs.snmp.table]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"interface"</span> <span class="py">inherit_tags</span> <span class="p">=</span> <span class="s">[ "hostname" ]</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"IF-MIB::ifTable"</span> <span class="c"># Interface tag - used to identify interface in metrics database </span> <span class="nn">[[inputs.snmp.table.field]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"name"</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"IF-MIB::ifDescr"</span> <span class="py">is_tag</span> <span class="p">=</span> <span class="s">true</span> <span class="c"># IF-MIB::ifXTable contains newer High Capacity (HC) counters that do not overflow as fast for a few of the ifTable counters </span> <span class="nn">[[inputs.snmp.table]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"interface"</span> <span class="py">inherit_tags</span> <span class="p">=</span> <span class="s">[ "hostname" ]</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"IF-MIB::ifXTable"</span> <span class="c"># Interface tag - used to identify interface in metrics database </span> <span class="nn">[[inputs.snmp.table.field]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"name"</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"IF-MIB::ifDescr"</span> <span class="py">is_tag</span> <span class="p">=</span> <span class="s">true</span> <span class="c"># EtherLike-MIB::dot3StatsTable contains detailed ethernet-level information about what kind of errors have been logged on an interface (such as FCS error, frame too long, etc) </span> <span class="nn">[[inputs.snmp.table]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"interface"</span> <span class="py">inherit_tags</span> <span class="p">=</span> <span class="s">[ "hostname" ]</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"EtherLike-MIB::dot3StatsTable"</span> <span class="c"># Interface tag - used to identify interface in metrics database </span> <span class="nn">[[inputs.snmp.table.field]</span><span class="err">]</span> <span class="py">name</span> <span class="p">=</span> <span class="s">"name"</span> <span class="py">oid</span> <span class="p">=</span> <span class="s">"IF-MIB::ifDescr"</span> <span class="py">is_tag</span> <span class="p">=</span> <span class="s">true</span> </code></pre></div></div> <blockquote> <p>Note: In testing I have found that the Cisco CPU query can be different per device. I recommend testing per platform and perhaps per OS version to verify that the SNMP polling works properly. I have found that issuing the command <code class="highlighter-rouge">snmpwalk -v 2c -c SecuredSNMPString minneapolis.ntc .1.3.6.1.4.1.9.9.109.1.1.1.1.4</code> to find the response. You can also look at some other SNMP OIDs available as well for Cisco at their doc <a href="https://www.cisco.com/c/en/us/support/docs/ip/simple-network-management-protocol-snmp/15215-collect-cpu-util-snmp.html">page</a></p> </blockquote> <h3 id="difference-between-snmptable-and-snmpfield">Difference Between snmp.table and snmp.field</h3> <p>Now, we’ll briefly dig in to what each of the lines are doing here. When an SNMP field is defined, this is going to act like an <code class="highlighter-rouge">snmpget</code> on a device. The first section that we call hostname is getting the hostname of the device.</p> <h3 id="what-is-a-tag">What is a Tag?</h3> <p>The <code class="highlighter-rouge">is_tag</code> will be used as a tag on the data that is called later. Tags are data points that will help to classify other pieces of information. This can be helpful in filtering data points, or associating data points with a particular query or other data point.</p> <p>Tags will be covered in more detail in a subsequent post, but note that by leveraging tags in your templates that build the Telegraf configuration you are able to identify key components in the environment that will enhance the monitoring capabilities.</p> <p>Jumping ahead, and using the Prometheus output you can see some of these tags and fields in action. <code class="highlighter-rouge">snmp_</code> is added to the front of the name as a part of the Prometheus export. You are able see the result of the query on the right most, outside of the <code class="highlighter-rouge">{}</code>. Inside of the <code class="highlighter-rouge">{}</code> you have the various tags that are being applied.</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># HELP snmp_cpmCPUTotal1min Telegraf collected metric</span> <span class="c"># TYPE snmp_cpmCPUTotal1min untyped</span> <span class="n">snmp_cpmCPUTotal1min</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">device</span><span class="o">=</span><span class="s2">"minneapolis"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,}</span> <span class="mi">31</span> <span class="c"># HELP snmp_uptime Telegraf collected metric</span> <span class="c"># TYPE snmp_uptime untyped</span> <span class="n">snmp_uptime</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">device</span><span class="o">=</span><span class="s2">"minneapolis"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,}</span> <span class="mf">1.2636057</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> </code></pre></div></div> <h2 id="exporting-the-snmp-data">Exporting the SNMP Data</h2> <p>There are two majority leaders in my opinion in the open source TSDB market, InfluxDB and Prometheus. Both have outputs that you can leverage with Telegraf to get the data into the TSDB. I will focus on the Prometheus methodology here. By exporting data with the Prometheus output there are a couple of benefits. One, the data is able to be scraped by the Prometheus system. The second is you can get a very good visual representation of the data for troubleshooting your connections.</p> <blockquote> <p>If you are using InfluxDB as your DB and need to troubleshoot, I find setting up a Prometheus exporter as a helpful step to be able to see what tags are being defined and what data is being gathered from an SNMP standpoint.</p> </blockquote> <h3 id="output-to-prometheus-configuration">Output to Prometheus Configuration</h3> <p>The configuration for Telegraf to use the Prometheus metrics exporter is relatively short and sweet. Telegraf handles the heavy lifting once you set the configuration file.</p> <div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">##################################################### # # Export SNMP Information to Prometheus # ##################################################### </span> <span class="nn">[[outputs.prometheus_client]</span><span class="err">]</span> <span class="py">listen</span> <span class="p">=</span> <span class="s">":9012"</span> <span class="py">metric_version</span> <span class="p">=</span> <span class="s">2</span> </code></pre></div></div> <p>Here you see that the section begins with <code class="highlighter-rouge">[[outputs.prometheus_client]]</code>. This is with no indentation within the configuration file. It sets the <code class="highlighter-rouge">metric_version</code> to <code class="highlighter-rouge">2</code>, and then sets a port that the metrics will be exposed at on, here <code class="highlighter-rouge">tcp/9012</code>. The url is then <code class="highlighter-rouge">http://&lt;server_url/ip&gt;:&lt;listen_port&gt;/metrics</code>. Note the <code class="highlighter-rouge">/metrics</code> as defined is a best practice of Prometheus.</p> <p>Let’s take a look at the output from the metrics page below. There are many more metrics that get exposed than just what is shown. This will show only the one related to octets inbound on the interface.</p> <h4 id="prometheus-output">Prometheus Output</h4> <p>Within the tags you see the main metric name begins with <code class="highlighter-rouge">interface_</code>. This is added by the client exporter to assist in classification of the metric. You then see the actual metric name as collected by SNMP. Here it is appended to the end of <code class="highlighter-rouge">interface_</code> to get the metric name.</p> <p>You also see the tags that are assigned to the metric being presented. Below is a table of the tag and where it came from:</p> <table> <thead> <tr> <th>Tag</th> <th style="text-align: left">Came From</th> </tr> </thead> <tbody> <tr> <td>agent_host</td> <td style="text-align: left">Created by Telegraf</td> </tr> <tr> <td>host</td> <td style="text-align: left">Host that is collecting the data, here the name of the Docker container</td> </tr> <tr> <td>hostname</td> <td style="text-align: left">Tag defined within the input section for gathering the hostname, the input section specifies <code class="highlighter-rouge">inherit_tags</code> to inherit the hostname</td> </tr> <tr> <td>ifName</td> <td style="text-align: left">Within the <code class="highlighter-rouge">inputs.snmp.table.field</code> section of the ifTable, noted by is_tag</td> </tr> <tr> <td>name</td> <td style="text-align: left">The name of the interface, defined in the input section</td> </tr> </tbody> </table> <p>After the tags, the Prometheus metric definition indicates that this is where the actual measurement is to be placed. The Prometheus engine will “scrape” this information from the HTTP page and then ingest the data appropriately into its DB.</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># HELP interface_ifHCInOctets Telegraf collected metric</span> <span class="c"># TYPE interface_ifHCInOctets untyped</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi1"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet1"</span><span class="p">}</span> <span class="mf">2.4956199</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi7"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet7"</span><span class="p">}</span> <span class="mi">0</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi8"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet8"</span><span class="p">,}</span> <span class="mi">0</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Nu0"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"Null0"</span><span class="p">,}</span> <span class="mi">0</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Vo0"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"VoIP-Null0"</span><span class="p">,}</span> <span class="mi">0</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi3"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet3"</span><span class="p">,}</span> <span class="mf">1.092917</span><span class="n">e</span><span class="o">+</span><span class="mi">08</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi2"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet2"</span><span class="p">,}</span> <span class="mf">1.477766</span><span class="n">e</span><span class="o">+</span><span class="mi">06</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi4"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet4"</span><span class="p">,}</span> <span class="mf">1.9447063</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi5"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet5"</span><span class="p">,}</span> <span class="mf">1.2468643</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> <span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">agent_host</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">host</span><span class="o">=</span><span class="s2">"225bb1fc7f4c"</span><span class="p">,</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"minneapolis.ntc"</span><span class="p">,</span><span class="na">ifName</span><span class="o">=</span><span class="s2">"Gi6"</span><span class="p">,</span><span class="na">name</span><span class="o">=</span><span class="s2">"GigabitEthernet6"</span><span class="p">,}</span> <span class="mf">1.6549974</span><span class="n">e</span><span class="o">+</span><span class="mi">07</span> </code></pre></div></div> <h2 id="prometheus">Prometheus</h2> <p>After getting the data into a format that Prometheus can read, you need to install Prometheus. You will get a link for the long lived installation, but the best part about Prometheus is that you can get up and running by just executing the binary file.</p> <h3 id="installation---binary-execution">Installation - Binary Execution</h3> <blockquote> <p>Link: <a href="https://prometheus.io/docs/prometheus/latest/installation/">Prometheus installation</a> provides for documentation on getting Prometheus up and running on your system.</p> </blockquote> <h4 id="installation---download-decompress-and-copy-binary-to-local-folder">Installation - Download, Decompress, and Copy Binary to Local Folder</h4> <p>For this the installation will be of the 2.16.0 version that has a download link of <a href="[https://github.com/prometheus/prometheus/releases/download/v2.16.0/prometheus-2.16.0.linux-amd64.tar.gz">https://github.com/prometheus/prometheus/releases/download/v2.16.0/prometheus-2.16.0.linux-amd64.tar.gz</a>.</p> <p>On a Linux host, <code class="highlighter-rouge">wget</code> is able to download the file into your local working directory.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ wget https://github.com/prometheus/prometheus/releases/download/v2.16.0/prometheus-2.16.0.linux-amd64.tar.gz --2020-03-14 18:20:41-- https://github.com/prometheus/prometheus/releases/download/v2.16.0/prometheus-2.16.0.linux-amd64.tar.gz Resolving github.com (github.com)... 140.82.114.3 Connecting to github.com (github.com)|140.82.114.3|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/6838921/13326f00-4ede-11ea-98d2-3ed3a8fdfe99?X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200314%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Date=20200314T182041Z&amp;X-Amz-Expires=300&amp;X-Amz-Signature=9d4b3578b43c357056d75698f94bf8fb3263510787046db5fe04fabd3196023a&amp;X-Amz-SignedHeaders=host&amp;actor_id=0&amp;response-content-disposition=attachment%3B%20filename%3Dprometheus-2.16.0.linux-amd64.tar.gz&amp;response-content-type=application%2Foctet-stream [following] --2020-03-14 18:20:41-- https://github-production-release-asset-2e65be.s3.amazonaws.com/6838921/13326f00-4ede-11ea-98d2-3ed3a8fdfe99?X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200314%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Date=20200314T182041Z&amp;X-Amz-Expires=300&amp;X-Amz-Signature=9d4b3578b43c357056d75698f94bf8fb3263510787046db5fe04fabd3196023a&amp;X-Amz-SignedHeaders=host&amp;actor_id=0&amp;response-content-disposition=attachment%3B%20filename%3Dprometheus-2.16.0.linux-amd64.tar.gz&amp;response-content-type=application%2Foctet-stream Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.238.3 Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.238.3|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 59608515 (57M) [application/octet-stream] Saving to: ‘prometheus-2.16.0.linux-amd64.tar.gz’ prometheus-2.16.0.linux-amd64.tar.gz 100%[==========================================================================================&gt;] 56.85M 23.8MB/s in 2.4s 2020-03-14 18:20:44 (23.8 MB/s) - ‘prometheus-2.16.0.linux-amd64.tar.gz’ saved [59608515/59608515] </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ tar -xvzf prometheus-2.16.0.linux-amd64.tar.gz prometheus-2.16.0.linux-amd64/ prometheus-2.16.0.linux-amd64/LICENSE prometheus-2.16.0.linux-amd64/promtool prometheus-2.16.0.linux-amd64/NOTICE prometheus-2.16.0.linux-amd64/consoles/ prometheus-2.16.0.linux-amd64/consoles/node.html prometheus-2.16.0.linux-amd64/consoles/index.html.example prometheus-2.16.0.linux-amd64/consoles/prometheus-overview.html prometheus-2.16.0.linux-amd64/consoles/node-disk.html prometheus-2.16.0.linux-amd64/consoles/node-overview.html prometheus-2.16.0.linux-amd64/consoles/node-cpu.html prometheus-2.16.0.linux-amd64/consoles/prometheus.html prometheus-2.16.0.linux-amd64/console_libraries/ prometheus-2.16.0.linux-amd64/console_libraries/menu.lib prometheus-2.16.0.linux-amd64/console_libraries/prom.lib prometheus-2.16.0.linux-amd64/prometheus prometheus-2.16.0.linux-amd64/prometheus.yml prometheus-2.16.0.linux-amd64/tsdb </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp prometheus-2.16.0.linux-amd64/prometheus . </code></pre></div></div> <h4 id="create-a-base-configuration-on-host">Create a Base Configuration on Host</h4> <p>You can use this as a start of the configuration, it will be stored in the same local directory that you are working in. It is setting a <em>default</em> scrape interval for other jobs that do not have a <code class="highlighter-rouge">scrape_interval</code> set to 15s. The example will use <code class="highlighter-rouge">prometheus_config.yml</code> for the file name.</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">global</span><span class="pi">:</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s2">"</span><span class="s">15s"</span> <span class="na">scrape_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">job_name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">prometheus'</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s2">"</span><span class="s">5s"</span> <span class="na">static_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">targets</span><span class="pi">:</span> <span class="pi">[</span><span class="s1">'</span><span class="s">localhost:9090'</span><span class="pi">]</span> </code></pre></div></div> <h3 id="execution">Execution</h3> <p>Now that there is a configuration file ready to go, you can start the local server. This will start up without polling anything other than the local Prometheus instance.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ ./prometheus --config.file="prometheus_config.yml" level=info ts=2020-03-14T18:29:50.782Z caller=main.go:295 msg="no time or size retention was set so using the default time retention" duration=15d level=info ts=2020-03-14T18:29:50.783Z caller=main.go:331 msg="Starting Prometheus" version="(version=2.16.0, branch=HEAD, revision=b90be6f32a33c03163d700e1452b54454ddce0ec)" level=info ts=2020-03-14T18:29:50.783Z caller=main.go:332 build_context="(go=go1.13.8, user=root@7ea0ae865f12, date=20200213-23:50:02)" level=info ts=2020-03-14T18:29:50.783Z caller=main.go:333 host_details="(Linux 4.15.0-88-generic #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020 x86_64 prometheus_demo (none))" level=info ts=2020-03-14T18:29:50.783Z caller=main.go:334 fd_limits="(soft=1024, hard=1048576)" level=info ts=2020-03-14T18:29:50.783Z caller=main.go:335 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2020-03-14T18:29:50.784Z caller=web.go:508 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2020-03-14T18:29:50.784Z caller=main.go:661 msg="Starting TSDB ..." level=info ts=2020-03-14T18:29:50.788Z caller=head.go:577 component=tsdb msg="replaying WAL, this may take awhile" level=info ts=2020-03-14T18:29:50.788Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=2 level=info ts=2020-03-14T18:29:50.788Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=1 maxSegment=2 level=info ts=2020-03-14T18:29:50.788Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=2 maxSegment=2 level=info ts=2020-03-14T18:29:50.789Z caller=main.go:676 fs_type=EXT4_SUPER_MAGIC level=info ts=2020-03-14T18:29:50.789Z caller=main.go:677 msg="TSDB started" level=info ts=2020-03-14T18:29:50.790Z caller=main.go:747 msg="Loading configuration file" filename=prometheus_config.yml level=info ts=2020-03-14T18:29:50.790Z caller=main.go:775 msg="Completed loading of configuration file" filename=prometheus_config.yml level=info ts=2020-03-14T18:29:50.790Z caller=main.go:630 msg="Server is ready to receive web requests." </code></pre></div></div> <p>At the end you should see a message that states that the <code class="highlighter-rouge">Server is ready to receive web requests</code>.</p> <h3 id="prometheus-1">Prometheus</h3> <p>With a web browser, open to the URL: <code class="highlighter-rouge">http://&lt;server_ip&gt;:9090</code> or if using a local installation <code class="highlighter-rouge">http://localhost:9090</code> which should add a redirect to <code class="highlighter-rouge">/graph</code> and bring you to a screen like this:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/prom-search1.png" alt="PrometheusBase" /></p> <p>Once you have Prometheus loaded, you can start to use PromQL to do a few searches. The system currently only has one metric source, about itself. This is where a query to see what the process looks like can be done. In the search box enter the query <code class="highlighter-rouge">scrape_duration_seconds</code> and click <strong>Execute</strong>. A response is given back in text form that has an <em>Element</em> and a <em>Value</em> to it.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/prom-search2.png" alt="PrometheusScrape1" /></p> <p>When changing to view the graph of these queries you start to see what may be possible within this time series DB.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/prom-search3.png" alt="PrometheusScrapeTimesGraph" /></p> <h4 id="update-and-add-network-urls-to-the-prometheus-config">Update and Add Network URLs to the Prometheus Config</h4> <p>Now the configuration will get updated to poll two hosts that have SNMP working on it. You see that the <code class="highlighter-rouge">http://</code> and <code class="highlighter-rouge">/metrics</code> portions are removed. If not supplied these are applied by default. The <code class="highlighter-rouge">prometheus_config.yml</code> file will now look like below:</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">global</span><span class="pi">:</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s">15s</span> <span class="na">scrape_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">job_name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">prometheus'</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s">5s</span> <span class="na">static_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">targets</span><span class="pi">:</span> <span class="pi">[</span><span class="s1">'</span><span class="s">localhost:9090'</span><span class="pi">]</span> <span class="pi">-</span> <span class="na">job_name</span><span class="pi">:</span> <span class="s1">'</span><span class="s">snmp'</span> <span class="na">scrape_interval</span><span class="pi">:</span> <span class="s">60s</span> <span class="na">static_configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">targets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s1">'</span><span class="s">jumphost.create2020.ntc.cloud.tesuto.com:9012'</span> <span class="pi">-</span> <span class="s1">'</span><span class="s">jumphost.create2020.ntc.cloud.tesuto.com:9001'</span> </code></pre></div></div> <h4 id="prometheus-promql-snmp-example">Prometheus PromQL SNMP Example</h4> <p>After updating the Prometheus configuration and starting the Prometheus server you can now start to get SNMP data into the graph form. Now updating the PromQL to query for <code class="highlighter-rouge">interface_ifHCInOctets</code> you can start to see what the data is that Prometheus is getting from the SNMP data that Telegraf is presenting.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/prom-octets1.png" alt="PrometheusSNMPInterfaceData" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part1/prom-octets2.png" alt="PrometheusSNMPInterfaceData" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part1/prom-octets3.png" alt="PrometheusSNMPInterfaceData" /></p> <p>This is all nice, but it is hardly a system that will have a lot of graphs and be something to present to others. This is the role that Grafana will play as a graphing engine.</p> <h2 id="grafana">Grafana</h2> <h3 id="download-and-install-grafana">Download and Install Grafana</h3> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install -y adduser libfontconfig1 wget https://dl.grafana.com/oss/release/grafana_6.6.2_amd64.deb sudo dpkg -i grafana_6.6.2_amd64.deb </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ wget https://dl.grafana.com/oss/release/grafana_6.6.2_amd64.deb josh@prometheus_demo:~$ wget https://dl.grafana.com/oss/release/grafana_6.6.2_amd64.deb --2020-03-15 19:21:08-- https://dl.grafana.com/oss/release/grafana_6.6.2_amd64.deb Resolving dl.grafana.com (dl.grafana.com)... 2a04:4e42:3b::729, 151.101.250.217 Connecting to dl.grafana.com (dl.grafana.com)|2a04:4e42:3b::729|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 63232320 (60M) [application/x-debian-package] Saving to: ‘grafana_6.6.2_amd64.deb’ grafana_6.6.2_amd64.deb 100%[==========================================================================================&gt;] 60.30M 19.8MB/s in 3.1s 2020-03-15 19:21:12 (19.8 MB/s) - ‘grafana_6.6.2_amd64.deb’ saved [63232320/63232320] josh@prometheus_demo:~$ sudo dpkg -i grafana_6.6.2_amd64.deb Selecting previously unselected package grafana. (Reading database ... 67127 files and directories currently installed.) Preparing to unpack grafana_6.6.2_amd64.deb ... Unpacking grafana (6.6.2) ... Setting up grafana (6.6.2) ... Adding system user `grafana' (UID 111) ... Adding new user `grafana' (UID 111) with group `grafana' ... Not creating home directory `/usr/share/grafana'. ### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd sudo /bin/systemctl daemon-reload sudo /bin/systemctl enable grafana-server ### You can start grafana-server by executing sudo /bin/systemctl start grafana-server Processing triggers for systemd (237-3ubuntu10.39) ... Processing triggers for ureadahead (0.100.0-21) ... </code></pre></div></div> <h3 id="enable-grafana-to-start-on-boot-and-start-grafana-server">Enable Grafana to Start on Boot and Start Grafana Server</h3> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> sudo /bin/systemctl daemon-reload sudo /bin/systemctl enable grafana-server sudo /bin/systemctl start grafana-server </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ sudo /bin/systemctl daemon-reload josh@prometheus_demo:~$ sudo systemctl enable grafana-server Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable grafana-server Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service. josh@prometheus_demo:~$ sudo systemctl start grafana-server </code></pre></div></div> <h3 id="verify-grafana-is-running">Verify Grafana is Running</h3> <p>I like to verify that Grafana is in fact running by checking for the listening ports. You can do this by using the <code class="highlighter-rouge">ss -lt</code> command to get the output, and checking that there is a <code class="highlighter-rouge">*:3000</code> entry in the output. TCP/3000 is the default port for Grafana.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ ss -lt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 127.0.0.53%lo:domain 0.0.0.0:* LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:* LISTEN 0 128 [::]:ssh [::]:* LISTEN 0 128 *:3000 *:* </code></pre></div></div> <h3 id="verify---navigate-to-the-default-page">Verify - Navigate to the Default Page</h3> <p>The default login is <code class="highlighter-rouge">admin/admin</code>. When you first log in you will be prompted for a new admin password.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-login1.png" alt="GrafanaDefaultLogin" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-login2.png" alt="GrafanaNewPassword" /></p> <h2 id="getting-to-the-graphing">Getting to the Graphing</h2> <h3 id="re-start-prometheus">Re-start Prometheus</h3> <p>Before you add in additional data sources that are needed, you need to restart the service on your Linux host.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>josh@prometheus_demo:~$ ./prometheus --config.file=prometheus_config.yml </code></pre></div></div> <h3 id="add-data-source-to-grafana">Add Data Source to Grafana</h3> <p>Now that you are in you need to add a data source for Grafana. In this demo you are going to see us add a localhost connection to Prometheus. Going back to the web interface on the main menu that you started into you can click on <code class="highlighter-rouge">Add datasource</code>.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-datasource-1.png" alt="GrafanaMainMenu" /></p> <p>In this instance of 6.6.x Grafana had Prometheus on the top of the list. Navigate to where you see <em>Prometheus</em> and click <em>select</em>.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-datasource-2.png" alt="GrafanaDataSources" /></p> <p>The data source will bring you to a configuration screen.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana_prometheus_start.png" alt="GrafanaPrometheusStart" /></p> <p>Here make the following changes:</p> <h4 id="field-changes-from-default">Field Changes from Default</h4> <table> <thead> <tr> <th>Field</th> <th>Setting</th> </tr> </thead> <tbody> <tr> <td>URL</td> <td>http://localhost:9090</td> </tr> </tbody> </table> <p>Once modified, click <strong>Save and Test</strong> to test and verify connectivity to the DB. If you setup a different host as the Prometheus server, then you would enter the hostname/IP address combination that corresponds to the Prometheus host.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-datasource-4.png" alt="GrafanaDatasourceChanged" /></p> <p>When you get the message <strong>Data source is working</strong> you have successfully connected.</p> <h3 id="grafana-dashboard-creation">Grafana Dashboard Creation</h3> <p>Now navigate to the left hand navigation and select the plus icon, select <em>Dashboard</em> to get a new dashboard created.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-menu-create-dash.png" alt="GrafanaNewDashboard1" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-add-query.png" alt="GrafanaNewDashboardBase" /></p> <p>You get a new panel page, and then select <em>Add Query</em>.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-graph-create.png" alt="GrafanaNewQuery" /></p> <p>Once on the new query page we will set a search to get the Inbound utilization on an interface. Set up the query as follows:</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-graph-fill.png" alt="GrafanaInterfaceQuery" /></p> <p>Note that the queries used on this Grafana example are going to be of <strong>PromQL</strong> - the Prometheus Query Langague. In this graphic, the {{ifName}} is telling Grafana that <code class="highlighter-rouge">ifName</code> is the variable to lookup to add to the legend for each measurement.</p> <blockquote> <p>If your data source for Grafana is Graphite or InfluxDB, you would use the same query language used by the database system of the data source.</p> </blockquote> <p>To explain what each item is doing to help generate your own queries. Given the following PromQL query:</p> <div class="language-prometheus highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">rate</span><span class="p">(</span><span class="n">interface_ifHCInOctets</span><span class="p">{</span><span class="na">hostname</span><span class="o">=</span><span class="s2">"houston.tesuto.internal"</span><span class="p">}[</span><span class="mi">2m</span><span class="p">])</span><span class="o">*</span><span class="mi">8</span> </code></pre></div></div> <h4 id="rate">Rate</h4> <p>The rate query from Prometheus covers the rate of change. With SNMP, the number gathered for Interface utilization is an increasing number, not a rate. So the Prometheus system needs to calculate what that rate is. The [2m] indicates to calculate the per-second rate measured over the past 2 minutes.</p> <h4 id="metric-name">Metric Name</h4> <p>The metric name in the query is <code class="highlighter-rouge">interface_ifHCInOctets</code>. This is the metric that was taken a look at earlier in the post. This is the exact measurement.</p> <h4 id="query-tags">Query Tags</h4> <p>The tags in the search is to help filter out what is being searched upon to give the proper graph. In this instance you will only see interfaces on the device hostname <code class="highlighter-rouge">houston.tesuto.internal</code>.</p> <h4 id="math">Math</h4> <p>In the query there is a <code class="highlighter-rouge">*8</code> at the end. This is to convert the measurement from octets as defined in the metric over to bits. An octet is 8 bits, thus the multiplication by 8.</p> <h3 id="visualization-changes">Visualization Changes</h3> <p>Now we’re going to make a few more updates on the graph. Here are the changes being made on the <em>Visualization</em> section (2nd of four items on the left hand side of the panel configuration). Specifically, the changes being made are in the <em>Axis</em> subsection. You can play around with settings in the upper section to get some changes made to the graphs.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana_visualization_customization.png" alt="GrafanaVisualizationCustomization" /><br /> <img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana_legend.png" alt="GrafanaLegend" /></p> <table> <thead> <tr> <th>Setting</th> <th>Modification</th> </tr> </thead> <tbody> <tr> <td>Left Y: Unit</td> <td>bits/sec (under <strong>Data Rate</strong>)</td> </tr> <tr> <td>Legend Values: Min</td> <td>Checked</td> </tr> <tr> <td>Legend Values: Avg</td> <td>Checked</td> </tr> <tr> <td>As Table</td> <td>Checked</td> </tr> <tr> <td>To Right</td> <td>Checked</td> </tr> </tbody> </table> <h3 id="general-section-changes">General Section Changes</h3> <p>Here is where you can set the title of the panel. Let’s change that to <em>Houston Interface Utilization</em>. After making the update, click on the upper left to go back to the dashboard.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana_general.png" alt="GrafanaGeneral" /></p> <p>The panel size can be adjusted in size by dragging the corners as you see fit to make your dashboard.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/grafana-graph-completed.png" alt="GrafanaDashboard1" /></p> <h3 id="update-dashboard-name">Update Dashboard Name</h3> <p>On the main dashboard page to change the name on the dashboard select the Save icon on the upper right. This will give you a prompt with a <strong>New Name</strong> and <strong>Folder</strong> to save the dashboard into. This allows you to add heirarchy to your dashboarding system.</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/update_name.png" alt="GrafanaNaming" /></p> <p><strong>Important Note</strong> – if you make changes you do need to save the changes. Grafana as of this version does <em>not</em> save changes after a change. It does require you to save your changes once you are done making changes.</p> <p>After you save the changes you get a visual confirmation that the changes are saved and that you now have a title on dashboard!</p> <p><img src="../../../static/images/blog_posts/prometheus_for_net_part1/dashboard_saved.png" alt="DashboardSaved" /></p> <h2 id="conclusion">Conclusion</h2> <p>Hopefully this will help on your journey! In a follow-up post I will take a look at a few more capabilities within Telegraf, Prometheus, and Grafana.</p> <ul> <li>How to gather streaming data with gNMI</li> <li>Telegraf Tags</li> <li>Transforming data with Telegraf</li> <li>Prometheus queries</li> <li>Grafana Tables</li> <li>Grafana Thresholds &amp; Alerts</li> </ul> <p>To continue on in the journey, take a look at <a href="https://blog.networktocode.com/post/network_telemetry_advancing_your_dashboards_with_gnmi/">Network Telemetry - Advancing Your Dashboards</a> and <a href="blog.networktocode.com/post/monitoring_websites_with_telegraf_and_prometheus/">monitoring websites</a>.</p> <p>-Josh</p>Josh VanDeraaThis is going to be the first in a multipart series where I will be taking a look at a method to get telemetry data from network devices into a modern Time Series Database (TSDB).Intro to Data Structures2020-04-14T00:00:00+00:002020-04-14T00:00:00+00:00https://blog.networktocode.com/post/data-structures<p>From an automation standpoint, it is important to understand what data structures are, how to use them, and more importantly how to access the data within the structure.</p> <p>This post won’t go into how to build data structures as that is a whole topic in and of itself, but how do we obtain the data we want from a data structure that a system provides us such as Ansible, a web API, etc? We’re going to use Python when dissecting a data structure as we can use it to tell us what type the data is or what a specific type is within the data structure.</p> <p>There are two main data types that we will discuss as they’re the most common.</p> <p>Let’s start with looking at what a dictionary is and how we can get data from a dictionary using the built-in Python interactive interpreter.</p> <h2 id="dictionaries">Dictionaries</h2> <p>A dictionary is referred to as a mapping in other programming languages and is made up of key value pairs. The key must be an integer or a string, where the value may be any type of object. Dictionaries are mainly used when the order does not matter, but accessing specific data that can be found by a key. Let’s take a look at a dictionary.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span> <span class="o">=</span> <span class="p">{</span> <span class="o">...</span> <span class="s">'test_one'</span><span class="p">:</span> <span class="s">'My first key:value'</span><span class="p">,</span> <span class="o">...</span> <span class="s">'test'</span><span class="p">:</span> <span class="s">'My second key:value'</span><span class="p">,</span> <span class="o">...</span> <span class="mi">10</span><span class="p">:</span> <span class="s">'Look at my key'</span><span class="p">,</span> <span class="o">...</span><span class="p">}</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span> <span class="p">{</span><span class="s">'test'</span><span class="p">:</span> <span class="s">'My second key:value'</span><span class="p">,</span> <span class="s">'test_one'</span><span class="p">:</span> <span class="s">'My first key:value'</span><span class="p">,</span> <span class="mi">10</span><span class="p">:</span> <span class="s">'Look at my key'</span><span class="p">}</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span><span class="o">.</span><span class="n">keys</span><span class="p">()</span> <span class="p">[</span><span class="s">'test'</span><span class="p">,</span> <span class="s">'test_one'</span><span class="p">,</span> <span class="mi">10</span><span class="p">]</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span><span class="o">.</span><span class="n">values</span><span class="p">()</span> <span class="p">[</span><span class="s">'My second key:value'</span><span class="p">,</span> <span class="s">'My first key:value'</span><span class="p">,</span> <span class="s">'Look at my key'</span><span class="p">]</span> <span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">my_dict</span><span class="p">)</span> <span class="o">&lt;</span><span class="nb">type</span> <span class="s">'dict'</span><span class="o">&gt;</span> </code></pre></div></div> <p>Note how the dictionary is in a different order than we created it in, which means the keys are significant in our ability to extract the values stored within the dictionary.</p> <p>Let’s take a look at how to access data within a dictionary.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span><span class="p">[</span><span class="s">'test_one'</span><span class="p">]</span> <span class="s">'My first key:value'</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="s">'Look at my key'</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_dict</span><span class="p">[</span><span class="s">'test_two'</span><span class="p">]</span> <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span> <span class="n">File</span> <span class="s">"&lt;stdin&gt;"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o">&lt;</span><span class="n">module</span><span class="o">&gt;</span> <span class="nb">KeyError</span><span class="p">:</span> <span class="s">'test_two'</span> </code></pre></div></div> <p>Notice if we try and access a key that does not exist, we get a <code class="highlighter-rouge">KeyError</code>. This error can be avoided by using the <code class="highlighter-rouge">.get()</code> method on the dictionary. This will attempt to get the key from the dictionary and, if it doesn’t exist, will return <code class="highlighter-rouge">None</code>. It also accepts an argument that it will return if the key does not exist.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">my_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"test_two"</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">test</span> <span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">test</span><span class="p">)</span> <span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">NoneType</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; test = my_dict.get("test_two", "Return this value")</span><span class="err"> </span><span class="s">&gt;&gt;&gt; test</span><span class="err"> </span><span class="s">'</span><span class="n">Return</span> <span class="n">this</span> <span class="n">value</span><span class="s">'</span><span class="err"> </span><span class="s">&gt;&gt;&gt; if my_dict.get('</span><span class="n">test_one</span><span class="s">'):</span><span class="err"> </span><span class="s">... print('</span><span class="n">It</span> <span class="n">exists</span><span class="err">!</span><span class="s">')</span><span class="err"> </span><span class="s">...</span><span class="err"> </span><span class="s">It exists!</span><span class="err"> </span></code></pre></div></div> <p>Now that we understand how dictionaries work and how we can obtain data from a dictionary, let’s move onto lists.</p> <h2 id="lists">Lists</h2> <p>A list is referred to as an array in other programming languages and is a collection of different data types that are stored in indices within the list. A list can consist of objects of any type (strings, integers, dictionaries, tuples, etc.). The order of a list is maintained in the same order the list is created and the data can be obtained by accessing the indexes of the list.</p> <p>Let’s take a look at creating a list.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span> <span class="o">...</span> <span class="s">'index one'</span><span class="p">,</span> <span class="o">...</span> <span class="p">{</span><span class="s">'test'</span><span class="p">:</span> <span class="s">'dictionary'</span><span class="p">},</span> <span class="o">...</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span> <span class="p">,</span><span class="mi">3</span><span class="p">],</span> <span class="o">...</span> <span class="p">]</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span> <span class="p">[</span><span class="s">'index one'</span><span class="p">,</span> <span class="p">{</span><span class="s">'test'</span><span class="p">:</span> <span class="s">'dictionary'</span><span class="p">},</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]]</span> <span class="o">&gt;&gt;&gt;</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">my_list</span><span class="p">:</span> <span class="o">...</span> <span class="nb">type</span><span class="p">(</span><span class="n">item</span><span class="p">)</span> <span class="o">...</span> <span class="o">&lt;</span><span class="nb">type</span> <span class="s">'str'</span><span class="o">&gt;</span> <span class="o">&lt;</span><span class="nb">type</span> <span class="s">'dict'</span><span class="o">&gt;</span> <span class="o">&lt;</span><span class="nb">type</span> <span class="s">'list'</span><span class="o">&gt;</span> </code></pre></div></div> <p>As you can see, the list can store different data types, and the order is in the same order that we contructed the list in. We also iterated over the list to access each index, but we can access each item by the index they’re stored at as well.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="s">'index one'</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="p">{</span><span class="s">'test'</span><span class="p">:</span> <span class="s">'dictionary'</span><span class="p">}</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span> <span class="n">File</span> <span class="s">"&lt;stdin&gt;"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o">&lt;</span><span class="n">module</span><span class="o">&gt;</span> <span class="nb">IndexError</span><span class="p">:</span> <span class="nb">list</span> <span class="n">index</span> <span class="n">out</span> <span class="n">of</span> <span class="nb">range</span> <span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">my_list</span><span class="p">)</span> <span class="o">&lt;</span><span class="nb">type</span> <span class="s">'list'</span><span class="o">&gt;</span> </code></pre></div></div> <p>The first index of a list starts at zero and increments up by one at each index. Let’s add a new item to the list and validate the order is still intact.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span> <span class="p">[</span><span class="s">'index one'</span><span class="p">,</span> <span class="p">{</span><span class="s">'test'</span><span class="p">:</span> <span class="s">'dictionary'</span><span class="p">},</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mi">5</span><span class="p">]</span> <span class="o">&gt;&gt;&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">my_list</span><span class="p">)</span> <span class="mi">4</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">my_list</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="mi">5</span> </code></pre></div></div> <p>Now that we understand how lists work and how we can obtain data from a list, let’s move on and put this all together when we encounter data structures in the wild!</p> <h2 id="how-to-navigate-data-structures">How to Navigate Data Structures</h2> <p>After the above sections, it seems like navigating and obtaining the information from a data structure is a no-brainer, but can be intimidating when you come across a more complex data structure.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">facts</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"ansible_check_mode"</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span> <span class="s">"ansible_diff_mode"</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span> <span class="s">"ansible_facts"</span><span class="p">:</span> <span class="p">{</span> <span class="s">"_facts_gathered"</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span> <span class="s">"discovered_interpreter_python"</span><span class="p">:</span> <span class="s">"/usr/bin/python"</span><span class="p">,</span> <span class="s">"net_all_ipv4_addresses"</span><span class="p">:</span> <span class="p">[</span> <span class="s">"192.168.1.1"</span><span class="p">,</span> <span class="s">"10.111.41.12"</span><span class="p">,</span> <span class="s">"172.16.133.1"</span><span class="p">,</span> <span class="s">"172.16.130.1"</span><span class="p">,</span> <span class="p">],</span> <span class="s">"net_filesystems"</span><span class="p">:</span> <span class="p">[</span> <span class="s">"bootflash:"</span> <span class="p">],</span> <span class="s">"net_filesystems_info"</span><span class="p">:</span> <span class="p">{</span> <span class="s">"bootflash:"</span><span class="p">:</span> <span class="p">{</span> <span class="s">"spacefree_kb"</span><span class="p">:</span> <span class="mf">5869720.0</span><span class="p">,</span> <span class="s">"spacetotal_kb"</span><span class="p">:</span> <span class="mf">7712692.0</span> <span class="p">}</span> <span class="p">},</span> <span class="s">"net_gather_network_resources"</span><span class="p">:</span> <span class="p">[],</span> <span class="s">"net_gather_subset"</span><span class="p">:</span> <span class="p">[</span> <span class="s">"hardware"</span><span class="p">,</span> <span class="s">"default"</span><span class="p">,</span> <span class="s">"interfaces"</span><span class="p">,</span> <span class="s">"config"</span> <span class="p">],</span> <span class="s">"net_hostname"</span><span class="p">:</span> <span class="s">"csr1000v"</span><span class="p">,</span> <span class="s">"net_image"</span><span class="p">:</span> <span class="s">"bootflash:packages.conf"</span><span class="p">,</span> <span class="s">"net_interfaces"</span><span class="p">:</span> <span class="p">{</span> <span class="s">"GigabitEthernet1"</span><span class="p">:</span> <span class="p">{</span> <span class="s">"bandwidth"</span><span class="p">:</span> <span class="mi">1000000</span><span class="p">,</span> <span class="s">"description"</span><span class="p">:</span> <span class="s">"MANAGEMENT INTERFACE - DON'T TOUCH ME"</span><span class="p">,</span> <span class="s">"duplex"</span><span class="p">:</span> <span class="s">"Full"</span><span class="p">,</span> <span class="s">"ipv4"</span><span class="p">:</span> <span class="p">[</span> <span class="p">{</span> <span class="s">"address"</span><span class="p">:</span> <span class="s">"10.10.20.48"</span><span class="p">,</span> <span class="s">"subnet"</span><span class="p">:</span> <span class="s">"24"</span> <span class="p">}</span> <span class="p">],</span> <span class="s">"lineprotocol"</span><span class="p">:</span> <span class="s">"up"</span><span class="p">,</span> <span class="s">"macaddress"</span><span class="p">:</span> <span class="s">"0050.56bb.e14e"</span><span class="p">,</span> <span class="s">"mediatype"</span><span class="p">:</span> <span class="s">"Virtual"</span><span class="p">,</span> <span class="s">"mtu"</span><span class="p">:</span> <span class="mi">1500</span><span class="p">,</span> <span class="s">"operstatus"</span><span class="p">:</span> <span class="s">"up"</span><span class="p">,</span> <span class="s">"type"</span><span class="p">:</span> <span class="s">"CSR vNIC"</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>The above data structure is what we get from gather facts in Ansible. We’re going to deal with the data structure outside of Ansible so we can determine breakdown each type data type in the structure. This is a great example as it’s a real world data structure and has nesting that we will need to traverse to get the data we want.</p> <p>Let’s start by looking at the data type of the initial structure and then see how we can get the <code class="highlighter-rouge">ansible_check_mode</code> data.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">facts</span><span class="p">)</span> <span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; facts.get('</span><span class="n">ansible_check_mode</span><span class="s">')</span><span class="err"> </span><span class="s">False</span><span class="err"> </span></code></pre></div></div> <p>As you can see, the initial data structure is a dictionary and since <code class="highlighter-rouge">ansible_check_mode</code> is in this initial dictionary it makes it a key. We can get the value of <code class="highlighter-rouge">ansible_check_mode</code> by using the <code class="highlighter-rouge">.get()</code> method.</p> <p>What if we want to loop over all the IP addresses within <code class="highlighter-rouge">net_all_ipv4_addresses</code>? Let’s see how we can do that.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">facts</span><span class="p">[</span><span class="s">'ansible_facts'</span><span class="p">])</span> <span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; facts['</span><span class="n">ansible_facts</span><span class="s">'].keys()</span><span class="err"> </span><span class="s">dict_keys(['</span><span class="n">_facts_gathered</span><span class="s">', '</span><span class="n">discovered_interpreter_python</span><span class="s">', '</span><span class="n">net_all_ipv4_addresses</span><span class="s">', '</span><span class="n">net_filesystems</span><span class="s">', '</span><span class="n">net_filesystems_info</span><span class="s">', '</span><span class="n">net_gather_network_resources</span><span class="s">', '</span><span class="n">net_gather_subset</span><span class="s">', '</span><span class="n">net_hostname</span><span class="s">', '</span><span class="n">net_image</span><span class="s">', '</span><span class="n">net_interfaces</span><span class="s">'])</span><span class="err"> </span><span class="s">&gt;&gt;&gt; type(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_all_ipv4_addresses</span><span class="s">'])</span><span class="err"> </span><span class="s">&lt;class '</span><span class="nb">list</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; for ip in facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_all_ipv4_addresses</span><span class="s">']:</span><span class="err"> </span><span class="s">... print(ip)</span><span class="err"> </span><span class="s">...</span><span class="err"> </span><span class="s">192.168.1.1</span><span class="err"> </span><span class="s">10.111.41.12</span><span class="err"> </span><span class="s">172.16.133.1</span><span class="err"> </span><span class="s">172.16.130.1</span><span class="err"> </span></code></pre></div></div> <p>As we can see above, <code class="highlighter-rouge">net_all_ipv4_addresses</code> is a key within the <code class="highlighter-rouge">ansible_facts</code> dictionary. We have to navigate through two nested dictionaries to get to the list of IPv4 addresses we want to print out.</p> <p>Let’s move on and obtain the IP address and subnet mask on GigabitEthernet1.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">facts</span><span class="p">[</span><span class="s">'ansible_facts'</span><span class="p">][</span><span class="s">'net_interfaces'</span><span class="p">])</span> <span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; type(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_interfaces</span><span class="s">']['</span><span class="n">GigabitEthernet1</span><span class="s">'])</span><span class="err"> </span><span class="s">&lt;class '</span><span class="nb">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; type(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_interfaces</span><span class="s">']['</span><span class="n">GigabitEthernet1</span><span class="s">']['</span><span class="n">ipv4</span><span class="s">'])</span><span class="err"> </span><span class="s">&lt;class '</span><span class="nb">list</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; len(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_interfaces</span><span class="s">']['</span><span class="n">GigabitEthernet1</span><span class="s">']['</span><span class="n">ipv4</span><span class="s">'])</span><span class="err"> </span><span class="s">1</span><span class="err"> </span><span class="s">&gt;&gt;&gt; type(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_interfaces</span><span class="s">']['</span><span class="n">GigabitEthernet1</span><span class="s">']['</span><span class="n">ipv4</span><span class="s">'][0])</span><span class="err"> </span><span class="s">&lt;class '</span><span class="nb">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; gi1_subnet = facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_interfaces</span><span class="s">']['</span><span class="n">GigabitEthernet1</span><span class="s">']['</span><span class="n">ipv4</span><span class="s">'][0]['</span><span class="n">subnet</span><span class="s">']</span><span class="err"> </span><span class="s">&gt;&gt;&gt; gi1_address = facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_interfaces</span><span class="s">']['</span><span class="n">GigabitEthernet1</span><span class="s">']['</span><span class="n">ipv4</span><span class="s">'][0]['</span><span class="n">address</span><span class="s">']</span><span class="err"> </span><span class="s">&gt;&gt;&gt; f"{gi1_address}/{gi1_subnet}"</span><span class="err"> </span><span class="s">'</span><span class="mf">10.10.20.48</span><span class="o">/</span><span class="mi">24</span><span class="s">'</span><span class="err"> </span></code></pre></div></div> <p>This is definitely a more complex data structure to traverse to get the data we need. We’ll walk through how to traverse this data structure.</p> <p>We can see that <code class="highlighter-rouge">net_interfaces</code> is a dictionary so we’ll use a key to traverse to the next level. The data we’re interested in is in the <code class="highlighter-rouge">GigabitEthernet1</code> key. We see that is also a dictionary so we understand to get to the next level in the hierarchy, we will use another key which is <code class="highlighter-rouge">ipv4</code>. The <code class="highlighter-rouge">ipv4</code> data is stored within a list and the length of the list is one, which means we can access it at index zero. The data within index zero is a dictionary which means we now need to access the <code class="highlighter-rouge">address</code> and <code class="highlighter-rouge">subnet</code> via their respective keys.</p> <p>We store the address and the subnet in their own variables that tell the story of how we got to the data we want.</p> <p><code class="highlighter-rouge">dictionary[dictionary][dictionary][dictionary][dictionary][list][dictionary]</code></p> <p>Now that you understand how each data type works, you can tackle any complex data structure you encounter.</p> <p>Let’s take a look at another example and keep flexing this muscle memory. Let’s determine how much space has been used on <code class="highlighter-rouge">bootflash:</code> by subtracting the <code class="highlighter-rouge">spacefree_kb</code> value from the <code class="highlighter-rouge">spacetotal_kb</code> value.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="nb">type</span><span class="p">(</span><span class="n">facts</span><span class="p">[</span><span class="s">'ansible_facts'</span><span class="p">])</span> <span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; type(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_filesystems_info</span><span class="s">'])</span><span class="err"> </span><span class="s">&lt;class '</span><span class="nb">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; type(facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_filesystems_info</span><span class="s">']['</span><span class="n">bootflash</span><span class="p">:</span><span class="s">'])</span><span class="err"> </span><span class="s">&lt;class '</span><span class="nb">dict</span><span class="s">'&gt;</span><span class="err"> </span><span class="s">&gt;&gt;&gt; space_free = facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_filesystems_info</span><span class="s">']['</span><span class="n">bootflash</span><span class="p">:</span><span class="s">']['</span><span class="n">spacefree_kb</span><span class="s">']</span><span class="err"> </span><span class="s">&gt;&gt;&gt; space_free</span><span class="err"> </span><span class="s">5869720.0</span><span class="err"> </span><span class="s">&gt;&gt;&gt; space_total = facts['</span><span class="n">ansible_facts</span><span class="s">']['</span><span class="n">net_filesystems_info</span><span class="s">']['</span><span class="n">bootflash</span><span class="p">:</span><span class="s">']['</span><span class="n">spacetotal_kb</span><span class="s">']</span><span class="err"> </span><span class="s">&gt;&gt;&gt; space_total</span><span class="err"> </span><span class="s">7712692.0</span><span class="err"> </span><span class="s">&gt;&gt;&gt; space_used = space_total - space_free</span><span class="err"> </span><span class="s">&gt;&gt;&gt; space_used</span><span class="err"> </span><span class="s">1842972.0</span><span class="err"> </span></code></pre></div></div> <p>As you can see, we had to navigate through four dictionaries including the intitial structure, but we didn’t have to navigate through any lists this time to get the data we wanted.</p> <p>Remember that these complex data structures can be intimidating, but breaking down each data type within the structure helps us deconstruct them into smaller chunks to navigate and process until we get to the data we want.</p> <p>-Mikhail</p>Mikhail YohmanFrom an automation standpoint, it is important to understand what data structures are, how to use them, and more importantly how to access the data within the structure.Office Manager Appreciation Day2020-04-10T00:00:00+00:002020-04-10T00:00:00+00:00https://blog.networktocode.com/post/office-managers-2020<p>Pet appreciation day is April 11! But at a company with a large remote workforce, our pets aren’t just companions – around here we lovingly refer to them as our “office managers.” Whether they are demanding treats or need us to take them for a walk, pets are a key part of many home office set ups and are great excuse to take a quick break during a busy day (see our roundup of <a href="https://blog.networktocode.com/post/work-from-home/">work from home tips</a> for more on the importance of breaking during the work day). In honor of pet appreciation day, we’d like to highlight the key roles and responsibilities some of our four-legged friends have taken on around the home office:</p> <h3 id="dixie">Dixie</h3> <p><img src="../../../static/images/blog_posts/office_managers_2020/Dixie.jpg" alt="Dixie" /></p> <h4 id="department">Department:</h4> <p>New York Pet Detective - Nap Division</p> <h4 id="responsibilities">Responsibilities:</h4> <p>Enforcing nap time regularly, ensuring “fair” distribution of carrots.</p> <h3 id="mars-anya">Mars, Anya</h3> <p><img src="../../../static/images/blog_posts/office_managers_2020/mars_anya.jpg" alt="Mars+Anya" /></p> <h4 id="department-1">Department:</h4> <p>Mars: International Affairs - Russia<br /> Anya: Side-eye Extraordinaire &amp; Doge W/ Opinion</p> <h4 id="responsibilities-1">Responsibilities:</h4> <p>Enforcing nap time regularly, ensuring “fair” distribution of carrots. Mars: Zdravstvuyte comrades, after many hours journey from the Russian motherland and long lines for immigration I have joined my fellow Houston Office Manager Anya in keeping two-legged keeper in line.  He needs constant reminding that my miska or bowl is not &gt; 75% full and Anya must also be reminded that I am now the ranking member of NTC Houston!!</p> <p>Anya: My “roommate” recently had another hooman come live in MY house and had the nerve to bring a cat! I guess it’s not too bad, the cat is too busy looking for something called “Moose &amp; Squirrel”.  Anywho, this is my house and it is my duty to stand back and judge everything hoomans do when I’m not busy making memes. #SendSnackooosssss</p> <h3 id="birdie-norman-and-scout">Birdie, Norman, and Scout</h3> <p><img src="../../../static/images/blog_posts/office_managers_2020/Scout.jpg" alt="Scout" /></p> <h4 id="department-2">Department:</h4> <p>Snacks</p> <h4 id="responsibilities-2">Responsibilities:</h4> <p>Birdie: Organizes all office outings (with a strong preference towards long walks).<br /> Norman: Tracks food metrics and provide daily meal reminders.<br /> Scout: Prefers to keep a high-level view and ensure that Norman and Birdie handle all relevant office tasks.</p> <h3 id="riley-mya">Riley, Mya</h3> <p><img src="../../../static/images/blog_posts/office_managers_2020/Mya.JPG" alt="Mya" /></p> <h4 id="department-3">Department:</h4> <p>Riley: Property Management<br /> Mya: Union Representation</p> <h4 id="responsibilities-3">Responsibilities:</h4> <p>Riley: Primary intruder detection. Hiding under benches, behind chairs, between toilets and tubs. Eating random objects. Quieter than a cat until a door is knocked upon or open, then chief barker.<br /> Mya: Strict enforcement of mealtimes and overall breaks. Bathroom breaks. Lunch cleanup breaks. Break breaks. Resident whiner and prima donna. Steals more food than is necessary.</p> <p>-The NTC Team</p>The NTC TeamPet appreciation day is April 11! But at a company with a large remote workforce, our pets aren’t just companions – around here we lovingly refer to them as our “office managers.” Whether they are demanding treats or need us to take them for a walk, pets are a key part of many home office set ups and are great excuse to take a quick break during a busy day (see our roundup of work from home tips for more on the importance of breaking during the work day). In honor of pet appreciation day, we’d like to highlight the key roles and responsibilities some of our four-legged friends have taken on around the home office:Exploring Python’s args and kwargs in a Networking Context2020-04-07T00:00:00+00:002020-04-07T00:00:00+00:00https://blog.networktocode.com/post/exploring-args-kwargs-using-networking-data<p>At a certain point in time, while working with Python, you might have seen code that uses <code class="highlighter-rouge">*args</code> and <code class="highlighter-rouge">**kwargs</code> as parameters in functions. This feature brings excellent flexibility when developing Python code. I’m going to go over some of the basics of packing arguments into a function, showing how to pass a variable amount of positional arguments and key-value pairs into a function. Feel free to follow along if you have access to a Python interpreter.</p> <blockquote> <p>Note: Print functions are used for demonstration. These functions don’t create a connection to a device. The goal is to focus on <code class="highlighter-rouge">*args</code> and <code class="highlighter-rouge">**kwargs</code>.</p> </blockquote> <h2 id="using-args">Using args</h2> <p>The current function, <code class="highlighter-rouge">device_connection</code>, has four different required positional parameters and are printed out inside the function.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">device_connection</span><span class="p">(</span><span class="n">ip</span><span class="p">,</span> <span class="n">device_type</span><span class="p">,</span> <span class="n">username</span><span class="p">,</span> <span class="n">password</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s">"ip: "</span><span class="p">,</span> <span class="n">ip</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"device_type: "</span><span class="p">,</span> <span class="n">device_type</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"username :"</span><span class="p">,</span> <span class="n">username</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"password: "</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span> </code></pre></div></div> <p>This next step is calling the <code class="highlighter-rouge">device_connection</code> function and it’s passing string arguments at the exact number of parameters in the function. As expected, when calling the function, the data is printed.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span><span class="s2">"10.10.10.2"</span>, <span class="s2">"cisco_ios"</span>, <span class="s2">"cisco"</span>, <span class="s2">"cisco123"</span><span class="o">)</span> ip: 10.10.10.2 device_type: cisco_ios username : cisco password: cisco123 <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>As mentioned before, these are positional parameters and if the order in which they are called is incorrect, then the output is printed inaccurately. In this case, the device connection is not established if the function was built correctly for connectivity. The next example shows what it looks like when calling the function incorrectly.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">device_connection</span><span class="p">(</span><span class="s">"cisco_ios"</span><span class="p">,</span> <span class="s">"10.10.10.2"</span><span class="p">,</span> <span class="s">"cisco"</span><span class="p">,</span> <span class="s">"cisco123"</span><span class="p">)</span> <span class="n">ip</span><span class="p">:</span> <span class="n">cisco_ios</span> <span class="n">device_type</span><span class="p">:</span> <span class="mf">10.10.10.2</span> <span class="n">username</span> <span class="p">:</span> <span class="n">cisco</span> <span class="n">password</span><span class="p">:</span> <span class="n">cisco123</span> <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>Notice how in the above output, the IP value is now populated with the device type string while the <code class="highlighter-rouge">device_type</code> is populated with 10.10.10.2.</p> <p>Another caveat could be if another argument is defined, because the connection requires a different SSH port to establish the connection. If added when calling the function, it fails because the current function was only built to take 4 arguments. Take a look at the example below. The new argument with the port number is added to the function.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">device_connection</span><span class="p">(</span><span class="s">"cisco_ios"</span><span class="p">,</span> <span class="s">"10.10.10.2"</span><span class="p">,</span> <span class="s">"cisco"</span><span class="p">,</span> <span class="s">"cisco123"</span><span class="p">,</span> <span class="s">"8022"</span><span class="p">)</span> <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span> <span class="n">File</span> <span class="s">"&lt;stdin&gt;"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o">&lt;</span><span class="n">module</span><span class="o">&gt;</span> <span class="nb">TypeError</span><span class="p">:</span> <span class="n">device_connection</span><span class="p">()</span> <span class="n">takes</span> <span class="mi">4</span> <span class="n">positional</span> <span class="n">arguments</span> <span class="n">but</span> <span class="mi">5</span> <span class="n">were</span> <span class="n">given</span> <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>The next section is going to show how to solve some of these issues.</p> <h2 id="using-args-1">Using *args</h2> <p>The basics of packing with <code class="highlighter-rouge">*</code>-operator are compressing multiple values into an agument/parameter. The first line inside the function is using the type function inside a print statement to display the data type returned from the <code class="highlighter-rouge">*args</code> variable. The next few lines have print statements accessing each element of the tuple, which are accessed similar to a list by the index number.</p> <blockquote> <p>Note: The asterisk (*) symbol is needed to pack the data into a tuple, or it fails and views the parameter as a single argument when the function call is initiated.</p> </blockquote> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">device_connection</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">args</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">"tuple: "</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"ip: "</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">print</span><span class="p">(</span><span class="s">"device_type: "</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">print</span><span class="p">(</span><span class="s">"username: "</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="k">print</span><span class="p">(</span><span class="s">"password: "</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span> </code></pre></div></div> <p>The example below calls the <code class="highlighter-rouge">device_connection</code> function, except for this time, the parameter used when building the function is <code class="highlighter-rouge">*args</code>.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span><span class="s2">"10.10.10.2"</span>, <span class="s2">"cisco_ios"</span>, <span class="s2">"cisco"</span>, <span class="s2">"cisco123"</span><span class="o">)</span> &lt;class <span class="s1">'tuple'</span><span class="o">&gt;</span> tuple: <span class="o">(</span><span class="s1">'10.10.10.2'</span>, <span class="s1">'cisco_ios'</span>, <span class="s1">'cisco'</span>, <span class="s1">'cisco123'</span><span class="o">)</span> ip: 10.10.10.2 device_type: cisco_ios username: cisco password: cisco123 <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>Notice how the function output above printed each argument added to function call and it was “compressed” into the args variable.</p> <p>Another way of accessing data from a tuple is to use a for loop like the example below.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">device_connection</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">args</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">"tuple: "</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span> </code></pre></div></div> <p>Data can also be assigned to variables and used as input when calling the function.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">ip</span> <span class="o">=</span> <span class="s">"10.10.10.2"</span> <span class="n">device_type</span> <span class="o">=</span> <span class="s">"cisco_ios"</span> <span class="n">username</span> <span class="o">=</span> <span class="s">"cisco"</span> <span class="n">password</span> <span class="o">=</span> <span class="s">"cisco123"</span> </code></pre></div></div> <p>Check out the output when the function is called.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span>ip, device_type, username, password<span class="o">)</span> &lt;class <span class="s1">'tuple'</span><span class="o">&gt;</span> tuple: <span class="o">(</span><span class="s1">'10.10.10.2'</span>, <span class="s1">'cisco_ios'</span>, <span class="s1">'cisco'</span>, <span class="s1">'cisco123'</span><span class="o">)</span> 10.10.10.2 cisco_ios cisco cisco123 <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>Notice how each value was accessed using a for loop when called inside the function without having to provide the index.</p> <p>Another great thing about using <code class="highlighter-rouge">*args</code> when creating a function is that it can take multiple inputs without having to specify it when building the function. So if another value needs to be processed, then it can be passed as another argument when calling the function.</p> <blockquote> <p>Note: The <code class="highlighter-rouge">*</code> asterisk is doing all the “magic” when packing all the arguments; args can be any arbitrary variable.</p> </blockquote> <p>In this step a new variable called <code class="highlighter-rouge">port</code> is created with a value of 22.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">port</span> <span class="o">=</span> <span class="mi">22</span> </code></pre></div></div> <p>The example below shows the output when calling the <code class="highlighter-rouge">device_connection</code> when passing all the same previous variables plus the new variable.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span>ip, device_type, username, password, port<span class="o">)</span> &lt;class <span class="s1">'tuple'</span><span class="o">&gt;</span> tuple: <span class="o">(</span><span class="s1">'10.10.10.2'</span>, <span class="s1">'cisco_ios'</span>, <span class="s1">'cisco'</span>, <span class="s1">'cisco123'</span>, 22<span class="o">)</span> 10.10.10.2 cisco_ios cisco cisco123 22 <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <h2 id="using-kwargs">Using **kwargs</h2> <p>Previously when passing parameters with no <code class="highlighter-rouge">*</code>-operator, it was by just passing a variable argument. If the same number of arguments aren’t passed when calling the function, it returns with an error. When using the single <code class="highlighter-rouge">*</code>-operator multiple arguments could be passed as options and the data is packed into a tuple.</p> <p>Another way of using function parameters is to use the <code class="highlighter-rouge">**</code> double asterisk operator and any arbitrary variable when packing <code class="highlighter-rouge">key:value</code> pairs as a parameter into a dictionary.</p> <p>The function below is using <code class="highlighter-rouge">**kwargs</code> with a variable. Inside the function, it prints the data type and the data that is passed when calling the function.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">device_connection</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">kwargs</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="n">kwargs</span><span class="p">)</span> </code></pre></div></div> <p>The example below shows the output when calling the function with two keyword arguments.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span><span class="nv">device_type</span><span class="o">=</span><span class="s2">"ios"</span>, <span class="nv">ip</span><span class="o">=</span><span class="s2">"10.10.10.2"</span><span class="o">)</span> &lt;class <span class="s1">'dict'</span><span class="o">&gt;</span> <span class="o">{</span><span class="s1">'device_type'</span>: <span class="s1">'ios'</span>, <span class="s1">'ip'</span>: <span class="s1">'10.10.10.2'</span><span class="o">}</span> <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>Like mentioned before, the (**) asterisk operator is doing the work to pack the data into a variable. Users don’t have to use the variables named <code class="highlighter-rouge">args</code> and <code class="highlighter-rouge">kwargs</code>. They could use an arbitrary variable such as <code class="highlighter-rouge">**connection_data</code> instead of <code class="highlighter-rouge">**kwargs</code>, as shown in the example below.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">device_connection</span><span class="p">(</span><span class="o">**</span><span class="n">connection_data</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">connection_data</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="n">connection_data</span><span class="p">)</span> </code></pre></div></div> <p>The example below shows the output when calling the function using <code class="highlighter-rouge">**connection_data</code>.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span><span class="nv">device_type</span><span class="o">=</span><span class="s2">"ios"</span>, <span class="nv">ip</span><span class="o">=</span><span class="s2">"10.10.10.2"</span><span class="o">)</span> &lt;class <span class="s1">'dict'</span><span class="o">&gt;</span> <span class="o">{</span><span class="s1">'device_type'</span>: <span class="s1">'ios'</span>, <span class="s1">'ip'</span>: <span class="s1">'10.10.10.2'</span><span class="o">}</span> <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>This next example calls the function again, but with more keyword arguments.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span><span class="nv">device_type</span><span class="o">=</span><span class="s2">"ios"</span>, <span class="nv">ip</span><span class="o">=</span><span class="s2">"10.10.10.2"</span>, <span class="nv">username</span><span class="o">=</span><span class="s2">"cisco"</span>, <span class="nv">password</span><span class="o">=</span><span class="s2">"cisc123"</span><span class="o">)</span> &lt;class <span class="s1">'dict'</span><span class="o">&gt;</span> <span class="o">{</span><span class="s1">'device_type'</span>: <span class="s1">'ios'</span>, <span class="s1">'ip'</span>: <span class="s1">'10.10.10.2'</span>, <span class="s1">'username'</span>: <span class="s1">'cisco'</span>, <span class="s1">'password'</span>: <span class="s1">'cisc123'</span><span class="o">}</span> <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <p>Notice how the data type output is a dictionary and the input arguments are <code class="highlighter-rouge">key:value</code> pairs.</p> <p>Something else to point out is that <code class="highlighter-rouge">*args</code> can also be used with <code class="highlighter-rouge">**kwargs</code> together in a single function. Using both <code class="highlighter-rouge">*args</code> and <code class="highlighter-rouge">**kwargs</code> together gives the function some more options to choose from in terms of how it can consume the data.</p> <p>The example below is using a variable containing a dictionary of device data.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">csr1</span> <span class="o">=</span> <span class="p">{</span> <span class="s">'device_type'</span><span class="p">:</span> <span class="s">'ios'</span><span class="p">,</span> <span class="s">'ip'</span><span class="p">:</span> <span class="s">'10.10.10.2'</span><span class="p">,</span> <span class="s">'username'</span><span class="p">:</span> <span class="s">'cisco'</span><span class="p">,</span> <span class="s">'password'</span><span class="p">:</span> <span class="s">'cisco123'</span><span class="p">,</span> <span class="s">'port'</span><span class="p">:</span> <span class="mi">8022</span><span class="p">,</span> <span class="s">'secret'</span><span class="p">:</span> <span class="s">'secret'</span><span class="p">}</span> </code></pre></div></div> <p>The function below is built to take in both types of parameters.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">device_connection</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s">"*args: "</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">args</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">"**kwargs: "</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">kwargs</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">"args[0]: "</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">print</span><span class="p">(</span><span class="s">"args: "</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"kwargs: "</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">)</span> </code></pre></div></div> <p>When calling the function, notice how the first argument is the variable. The second and third arguments are keyword arguments.</p> <p>The first output prints out a <code class="highlighter-rouge">tuple</code> for <code class="highlighter-rouge">*args</code>, the second output is <code class="highlighter-rouge">dict</code> for <code class="highlighter-rouge">**kwargs</code>, the third output gets access to the dictionary by accessing index 0 in the tuple and the last two outputs print out the data “packed” in <code class="highlighter-rouge">args</code> and <code class="highlighter-rouge">kwargs</code>.</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> device_connection<span class="o">(</span>csr1, <span class="nv">device_type</span><span class="o">=</span><span class="s2">"ios"</span>, <span class="nv">host</span><span class="o">=</span><span class="s2">"csr1"</span><span class="o">)</span> <span class="k">*</span>args: &lt;class <span class="s1">'tuple'</span><span class="o">&gt;</span> <span class="k">**</span>kwargs: &lt;class <span class="s1">'dict'</span><span class="o">&gt;</span> args[0]: <span class="o">{</span><span class="s1">'device_type'</span>: <span class="s1">'ios'</span>, <span class="s1">'ip'</span>: <span class="s1">'10.10.10.2'</span>, <span class="s1">'username'</span>: <span class="s1">'cisco'</span>, <span class="s1">'password'</span>: <span class="s1">'cisco123'</span>, <span class="s1">'port'</span>: 8022, <span class="s1">'secret'</span>: <span class="s1">'secret'</span><span class="o">}</span> args: <span class="o">({</span><span class="s1">'device_type'</span>: <span class="s1">'ios'</span>, <span class="s1">'ip'</span>: <span class="s1">'10.10.10.2'</span>, <span class="s1">'username'</span>: <span class="s1">'cisco'</span>, <span class="s1">'password'</span>: <span class="s1">'cisco123'</span>, <span class="s1">'port'</span>: 8022, <span class="s1">'secret'</span>: <span class="s1">'secret'</span><span class="o">}</span>,<span class="o">)</span> kwargs: <span class="o">{</span><span class="s1">'device_type'</span>: <span class="s1">'ios'</span>, <span class="s1">'host'</span>: <span class="s1">'csr1'</span><span class="o">}</span> <span class="o">&gt;&gt;&gt;</span> </code></pre></div></div> <h2 id="summary">Summary</h2> <p>The examples shown in this blog use networking data to explore Python parameters in a function using <code class="highlighter-rouge">*</code>(for tuples) and <code class="highlighter-rouge">**</code>(for dictionaries). There are different ways of building Python functions, and they don’t always need to use <code class="highlighter-rouge">*</code> and <code class="highlighter-rouge">**</code> operators for every function built. Sometimes functions can be basic and only need specific data. But when a function takes in variable amount or unknown amount of data, then <code class="highlighter-rouge">*</code> and <code class="highlighter-rouge">**</code> operators can come in handy to build more flexible functions.</p> <p>-Hector</p>Hector IsazaAt a certain point in time, while working with Python, you might have seen code that uses *args and **kwargs as parameters in functions. This feature brings excellent flexibility when developing Python code. I’m going to go over some of the basics of packing arguments into a function, showing how to pass a variable amount of positional arguments and key-value pairs into a function. Feel free to follow along if you have access to a Python interpreter.