Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Configure SNAP* Telemetry Framework to Monitor Your Data Center

$
0
0

Figure 1

Figure 1.Snap* logo.

Introduction

Would you believe that you can get useful, insightful information on your data center's operations AND provide a cool interface to it that your boss will love—all in the space of an afternoon, and entirely with open source tooling? It's true. Free up a little time to read this, and then maybe free up your afternoon to reap the benefits!

This article shows you how to use Snap* to rapidly select and begin collecting useful measurements, from basic system information to metrics, on sophisticated cloud orchestration.

We'll also show you how to publish that information in ways that are useful to you, as someone who needs true insight into their data center's operations (and, possibly, ways to trigger automation on the basis of it). Finally, we'll show how to publish that information in ways that your management will like, making a useful dashboard with Grafana*.

After that, you'll discover a way to do all of that even faster. Let's get started!

What Is Snap*?

Snap is an open-source telemetry framework. Telemetry is simply information about your data center systems. It covers anything and everything you can collect, from basic descriptive information, to performance counters and statistics, to log file entries.

In the past, this huge stew of information has been difficult to synthesize and analyze together. There were collectors for log files that were separate from collectors for performance data, and so on. Snap unifies the collection of common telemetry as a set of community-provided plugins. It does quite a bit more than that too, but for now let's drill-down on collectors, and introduce our demonstration environment.

A Sample Data Center

To keep things simple we'll be working with a small Kubernetes* cluster, consisting of only two hosts. Why Kubernetes? We're aiming at a typical cloud data center. We could have chosen Mesos* or OpenStack* just as easily; Snap has plugins for all of them. Kubernetes just gives us an example to work with. It's important to realize that even if you're running another framework or even something proprietary in your data center, you can still benefit from the system-level plugins. The principles are the same, regardless.

The test environment has two nodes. One is a control node from which we will control and launch Kubernetes Pods; the other is the sole Kubernetes worker node.

We'll be collecting telemetry from both hosts, and the way that Snap is structured makes it easy to extrapolate how to do much larger installations from this smallish example. The nodes are running CentOS* 7.2. This decision was arbitrary; Snap supports most Linux* distributions by distributing binary releases in both RPM* and Debian* packaging formats.

Installation and Initial Setup

We'll need to install Snap, which is quite simple, using the packaging. Complete instructions are available at the home of the Snap repository. We won't repeat all those instructions here, but for simplicity, these are the steps that we took on both of our CentOS 7.2 hosts:

curl -s https://packagecloud.io/install/repositories/intelsdi-x/snap/script.rpm.sh | sudo bash

The point of this step is to set up the appropriate package repository from which to install Snap.

Note: It is not a good idea to run code straight off the Internet as root. This is done here for ease-of-use in an isolated lab environment. You can and should download the script separately, and examine it to be satisfied with exactly what it will do. Alternatively, you can build Snap from source and install it using your own cluster management tools.

After that, we can install the package:

     sudo yum install -y snap-telemetry

This step installs the Snap binaries and startup scripts. We’ll make sure the Snap daemon runs on system startup, and that we’re running it now:

     systemctl enable snap-telemetry
     systemctl start snap-telemetry

Now we have the snap daemon (snapteld) up and running. Below is some sample output from systemctl status snap-telemetry. You can also validate that you are able to communicate with the daemon with a command like ‘snaptel plugin list’. For now, you'll just get a message that says ‘No plugins found. Have you loaded a plugin?’ This is fine, and it means you're all set.

Figure 2

Figure 2. Screen capture showing the snap daemon ‘snapteld’ running.

Now we're ready to get some plugins!

The Plugin Catalog

The first glimpse of the power of Snap is in taking a look at the Snap plugin catalog. For now, we'll concentrate on the first set of plugins, labeled 'COLLECTORS'. Have a quick browse through that list and you'll see that Snap's capabilities are quite extensive. There are low-level system information collectors like PSUtil* and Meminfo*. There are Intel® processor-specific collectors such as CPU states and Intel® Performance Counter Monitor (Intel® PCM). There are service and application-level collectors such as Apache* and Docker*. There are cluster-level collectors such as the OpenStack services (Keystone*, and so on) and Ceph*.

The first major challenge a Snap installer faces is what NOT to collect! There's so much available that it can quickly add up to a mountain of data. We'll leave those decisions to you, but for our examples, we're going to select three:

  • PSUtil—basic system information, common to any Linux data center.
  • Docker—Information about Docker containers and the Docker system.
  • Kubestate*—Information about Kubernetes clusters.

This selection will give us a good spread of different types of collectors to examine.

Installing Plugins

Installing a plugin is a relatively straightforward process. We'll start with PSUtil. First, look at the plugin catalog and click the release link on the far right of the PSUtil entry, shown here:

Figure 3

Figure 3. The line for PSUtil in the plugin catalog.

On the next page, the most recent release is at the top of the page. We'll copy the link to the binary for the current release of PSUtil for Linux x86_64.

Figure 4

Figure 4. Copying the binary link from the plugin release page.

Now we’ll download that plugin and load it. Paste in the URL that you copied above, and you get a pair of commands that look like this:You should receive some output indicating that the plugin loaded. Check that with ‘snaptel plugin list’.

Following the exact same procedure with the Docker plugin will work great:

curl -sfL https://github.com/intelsdi-x/snap-plugin-collector-docker/releases/download/7/snap-plugin-collector-docker_linux_x86_64 -o snap-plugin-collector-docker

snaptel plugin load snap-plugin-collector-docker

The final plugin we are interested in is Kubestate*. Note that the maintainer of Kubestate is not Intel, but Grafana. That means the releases are not maintained in the same GitHub* repository, so the procedure has to change a bit. Fortunately, by examining the documentation of the Kubestate repository, you can easily find the Kubestate release repository.

From there the procedure is exactly the same:

curl -sfL https://github.com/grafana/snap-plugin-collector-kubestate/releases/download/1/snap-plugin-collector-kubestate_linux_x86_64 -o snap-plugin-collector-kubestate

snaptel plugin load snap-plugin-collector-kubestate

If you want to track to current code updates, you are more than welcome to build your own binaries and load them, instead of the precompiled releases. Most of the plugin repository home pages provide instructions for doing it this way.

Publishers

We aren't quite ready to collect information just yet with Snap. Let’s take a look at the overall flow of Snap:

Figure 5

Figure 5. Snap workflow.

You can find the collectors we’ve been dealing with easily on the left-hand side.

In the middle are processors, which is another set of plugins available in the plugin catalog. We won't be installing any of these as part of this demonstration, but they are very useful plugins for making your telemetry data genuinely useful to you. Statistical transformations and filtering can help you get a handle on your operational environment, or even trigger automatic responses to loading or other events. Tagging allows you to usher data through to appropriate endpoints; for example, separating data out by customer, in the case of a cloud service provider data center.

Finally, on the right you can find publishers. These plugins allow you to take the collected, processed telemetry and output it to something useful. Note that Snap itself doesn't USE the telemetry. It is all about collecting, processing, and publishing the telemetry data as simply and flexibly as possible, but what is done with it from that point is up to you.

In the simplest case, Snap can publish to a file on the local file system. It can publish to many different databases such as MySQL* or Cassandra*. It can publish to message queues like RabbitMQ*, to feed into automation systems.

For our examples, we're going to use the Graphite* publisher, for three reasons. One is that Graphite itself is a flexible, well-known, and useful package for dealing with lots of metrics. A data center operation can use the information straight out of Graphite to get all kinds of useful insight into their data center operations. The second reason we're using Graphite is that it feeds naturally into Grafana, which will give us a pretty, manager-friendly dashboard. Finally, most of the example tasks (which we'll discuss shortly) that are provided in the plugin repositories are based on a simple file publisher. Using Graphite will involve a bit more complexity and is a more likely real-world use of publisher plugins.

Loading the publisher plugin works exactly the same as the previous plugins: Find the latest binary release, download it, and load it:

curl -sfL https://github.com/intelsdi-x/snap-plugin-publisher-graphite/releases/download/6/snap-plugin-publisher-graphite_linux_x86_64 -o snap-plugin-publisher-graphite

snaptel plugin load snap-plugin-publisher-graphite

Now we have all the plugins we need for our installation:

[plse@cspnuc03 ~]$ snaptel plugin list
NAME             VERSION         TYPE            SIGNED          STATUS          LOADED TIME
psutil           9               collector       false           loaded          Mon, 27 Mar 2017 20:36:05 EDT
docker           7               collector       false           loaded          Mon, 27 Mar 2017 20:39:45 EDT
kubestate        1               collector       false           loaded          Mon, 27 Mar 2017 20:51:17 EDT
graphite         6               publisher       false           loaded          Mon, 27 Mar 2017 21:03:31 EDT
[plse@cspnuc03 ~]$

We're almost ready to tie all this together. First, we need to pick out the metrics we're interested in collecting.

Metrics

Now that you've got all these plugins installed, it's time to select some metrics that you're interested in collecting. Most plugins offer a large listing of metrics; you may not want to take all of them.

To see what's available, you can view the master list of metrics that are available from the plugins you have installed with a single command:

snaptel metric list

The output from this is quite long—234 lines for the three collectors we have loaded, at the time of this writing. Rather than paste all of the output here, we'll just look at a few from each namespace, generated by our three collector plugins.

PSUtil

There are many metrics available from this package that should be readily identifiable to anyone who runs Linux. Here's the selection we'll go with for our examples:

     /intel/psutil/load/load1
     /intel/psutil/load/load15
     /intel/psutil/load/load5
     /intel/psutil/vm/available
     /intel/psutil/vm/free
     /intel/psutil/vm/used

These look like filesystem paths, but they are not. They are Snap namespace paths. The first element, 'intel', indicates the maintainer of the plugin. The second is the name of the plugin, after which comes the namespaces of the various metrics provided.

The metrics themselves are familiar; the typical, three-value load-average numbers, for 1-minute, 5-minute, and 15-minute averages, and some simple memory usage values.

Docker

For the Docker plugin, there are 150 metrics available at the time of this writing. They run from simple container information to details about network and filesystem usage per-container. For this, we'll take a few of the broader values, as recommended in the task examples provided in the plugin repository.

     /intel/docker/*/spec/*
     /intel/docker/*/stats/cgroups/cpu_stats/*
     /intel/docker/*/stats/cgroups/memory_stats/*

Kubestate

The Kubestate plugin (at /grafanalabs/kubestate in the namespace) provides 34 metrics for tracking Kubernetes information. Since Kubernetes is the top-level application platform, it's worth ensuring that we collect all of them. The full list will show up in the task definition file, below.

All of the metrics lists and documentation (available from the plugin repos) are worth having a closer look to get the insight you need for your workloads.

Now that we've selected some metrics to track, let's actually get some data flowing!

Our First Task

Putting together collector, processor, and publisher processors into an end-to-end workflow is performed by specifying task manifests. These are either JSON or YAML definition files that are loaded into the running Snap daemon to tell it what data to collect, what to do with it, and where to send it. A single Snap daemon can be in charge of many tasks at once, meaning you can run multiple collectors and publish to multiple endpoints. This makes it very easy and flexible to direct telemetry where you need it, when you need it.

All of the plugin repositories generally include some sample manifests to help you get started. We're going to take some of those and extend them just a bit to tie in the Graphite publisher.

For Graphite itself, we've run a simple graphite container image from Docker Hub* on the 'control' node in our setup. Of course, you can use any Graphite service you wish or, of course, try another publisher. (For example, you may be using Elasticsearch*, and there's a publisher plugin for that!)

sudo docker run -d --name graphite --restart=always -p 80:80 -p 2003-2004:2003-2004 -p 2023-2024:2023-2024 -p 8125:8125/udp -p 8126:8126 hopsoft/graphite-statsd

The first task we define will be to collect data from the PSUtil plugin and publish it to our Graphite instance. We'll start with this YAML file. It's saved in our lab to psutil-graphite.yaml.

---
version: 1
schedule:
  type: simple
  interval: 1s
max-failures: 10
workflow:
  collect:
    metrics:"/intel/psutil/load/load1": {}"/intel/psutil/load/load15": {}"/intel/psutil/load/load5": {}"/intel/psutil/vm/available": {}"/intel/psutil/vm/free": {}"/intel/psutil/vm/used": {}
    publish:
    - plugin_name: graphite
      config:
        server: cspnuc02
        port: 2003

The best part about this is that it is straightforward and quite readable. To begin with, we've defined a collection interval of one second in the schedule section. The simple schedule type just means to run at every interval. There are two other types: windowed, which means you can define exactly when the task will start and stop, and cron, which allows you to set a crontab- like entry for when the task will run. There's lots more information on the flexibility of task scheduling in the task documentation.

The ‘max-failures’ value indicates just what you would expect: After 10 consecutive failures of the task, the Snap daemon will disable the task and stop trying to run it.

Finally, the ‘workflow’ section defines our collector and our publisher, what metrics to collect, and defines the Graphite service to connect to as cspnuc02 on port 2003. This is the control node where the Graphite container we're using is running. Starting this task on both machines will ensure that both report into the same Graphite server.

To create the task, we'll run the task creation command:

     snaptel task create -t psutil-graphite.yaml

Assuming all is well, you should see output similar to this:

     [plse@cspnuc03 ~]$ snaptel task create -t psutil-graphite.yaml
     Using task manifest to create task
     Task created
     ID: d067a659-0576-44eb-95fe-0f01f7e33fbf
     Name: Task-d067a659-0576-44eb-95fe-0f01f7e33fbf
     State: Running

The task is running! For now, anyway. There's a couple of ways to keep tabs on your running tasks. One is just to list the installed tasks:

snaptel task list

You'll get a listing like the following that shows how many times the task has completed successfully (HIT), come up empty with no values (MISS), or failed altogether (FAIL). It will also indicate if the task is Running, Stopped, or Disabled.

ID                                       NAME                                            STATE           HIT     MISS    FAIL
CREATED                  LAST FAILURE
d067a659-0576-44eb-95fe-0f01f7e33fbf     Task-d067a659-0576-44eb-95fe-0f01f7e33fbf       Running         7K      15      3
3:55PM 3-28-2017         rpc error: code = 2 desc = Error Connecting to graphite at cspnuc02:2003. Error: dial tcp: i/o timeout

Even though you can see a failure here (from the last time it failed), you can also see that the task is still running, it's got 7000 hits, 15 misses and only 3 failures. Without 10 consecutive failures, the task will still try to run.

Another way to see your tasks working is to watch them. By taking the task ID field from the above output and pasting it into this command:

snaptel task watch d067a659-0576-44eb-95fe-0f01f7e33fbf

You can get a continuously updating text-mode output of the values as they stream by:

Figure 6

Figure 6. Output from snaptel task watch.

Now that we've created our first simple task, let's get the other collectors collecting!

Remaining Tasks

For Docker, our YAML looks like this, saved as docker-graphite.yaml:

---
max-failures: 10
schedule:
  interval: 5s
  type: simple
version: 1
workflow:
  collect:
    config:
      /intel/docker:
        endpoint: unix:///var/run/docker.sock
    metrics:
      /intel/docker/*/spec/*: {}
      /intel/docker/*/stats/cgroups/cpu_stats/*: {}
      /intel/docker/*/stats/cgroups/memory_stats/*: {}
    publish:
      -
        config:
          server: cspnuc02
          port: 2003
        plugin_name: graphite

By now you can probably see the general outline of how this works. Note the additional configuration of the collector to be able to communicate with the local Docker daemon. In this case we're using a socket; other examples work with a network connection to Docker on a specific port.

We'll enable this one the same way on both nodes:

snaptel task create -t docker-graphite.yaml

Finally, we'll enable Kubestate. This one is a bit different than the other two. We don't want to enable it on both nodes, since we're interested in the overall state of the Kubernetes cluster, rather than the values on a specific node. Let's enable it on the control node only.

The example task manifest for Kubestate is a JSON file, so the modifications made to it will be too. This one is saved as kubestate-graphite.json:

{"version": 1,"schedule": {"type": "simple","interval": "10s"
  },"workflow": {"collect": {"metrics": {"/grafanalabs/kubestate/container/*/*/*/*/limits/cpu/cores": {},"/grafanalabs/kubestate/container/*/*/*/*/limits/memory/bytes": {},"/grafanalabs/kubestate/container/*/*/*/*/requested/cpu/cores": {},"/grafanalabs/kubestate/container/*/*/*/*/requested/memory/bytes": {},"/grafanalabs/kubestate/container/*/*/*/*/status/ready": {},"/grafanalabs/kubestate/container/*/*/*/*/status/restarts": {},"/grafanalabs/kubestate/container/*/*/*/*/status/running": {},"/grafanalabs/kubestate/container/*/*/*/*/status/terminated": {},"/grafanalabs/kubestate/container/*/*/*/*/status/waiting": {},"/grafanalabs/kubestate/deployment/*/*/metadata/generation": {},"/grafanalabs/kubestate/deployment/*/*/spec/desiredreplicas": {},"/grafanalabs/kubestate/deployment/*/*/spec/paused": {},"/grafanalabs/kubestate/deployment/*/*/status/availablereplicas": {},"/grafanalabs/kubestate/deployment/*/*/status/deploynotfinished": {},"/grafanalabs/kubestate/deployment/*/*/status/observedgeneration": {},"/grafanalabs/kubestate/deployment/*/*/status/targetedreplicas": {},"/grafanalabs/kubestate/deployment/*/*/status/unavailablereplicas": {},"/grafanalabs/kubestate/deployment/*/*/status/updatedreplicas": {},"/grafanalabs/kubestate/node/*/spec/unschedulable": {},"/grafanalabs/kubestate/node/*/status/allocatable/cpu/cores": {},"/grafanalabs/kubestate/node/*/status/allocatable/memory/bytes": {},"/grafanalabs/kubestate/node/*/status/allocatable/pods": {},"/grafanalabs/kubestate/node/*/status/capacity/cpu/cores": {},"/grafanalabs/kubestate/node/*/status/capacity/memory/bytes": {},"/grafanalabs/kubestate/node/*/status/capacity/pods": {},"/grafanalabs/kubestate/node/*/status/outofdisk": {},"/grafanalabs/kubestate/pod/*/*/*/status/condition/ready": {},"/grafanalabs/kubestate/pod/*/*/*/status/condition/scheduled": {},"/grafanalabs/kubestate/pod/*/*/*/status/phase/Failed": {},"/grafanalabs/kubestate/pod/*/*/*/status/phase/Pending": {},"/grafanalabs/kubestate/pod/*/*/*/status/phase/Running": {},"/grafanalabs/kubestate/pod/*/*/*/status/phase/Succeeded": {},"/grafanalabs/kubestate/pod/*/*/*/status/phase/Unknown": {}
      },"config": {"/grafanalabs/kubestate": {"incluster": false,"kubeconfigpath": "/home/plse/.kube/config"
        }
      },"process": null,"publish": [
        {"plugin_name": "graphite","config": {"server": "localhost","port": 2003
          }
        }
      ]
    }
  }
}

Again, most of this is quite straightforward. As noted above, we're collecting all metrics available in the namespace, straight from what we would have gotten from snaptel metric list. To configure the collector itself, we tell it that we're not running from within the cluster ("incluster": false), and where to look for information on how to connect to the cluster management ("kubeconfigpath": "/home/plse/.kube/config").

The config file for Kubernetes that's referenced there contains the server name and port to connect to and the cluster and context to use when conducting queries. So, clearly, multiple tasks could be set up to query different clusters and contexts, and route them as desired. We could even add in a tagging processor plugin to tag the data by cluster and deliver it with the tags; that would allow us to split cluster data out by customer, for example.

Also note that here the server for Graphite is ‘localhost’ since only this one node needs to access the service. We could have used the hostname here as well; it works either way.

Enabling the service is the same as the others:

snaptel task create -t kubestate-graphite.json

Once we're satisfied that our tasks are up and running, we can go take a look at them with Graphite's native tools.

Real-World Deployments

We'll take a moment here to pause and have a quick look at how to make these kinds of settings permanent, as well as how to integrate Snap's tooling into a real-world environment.

The Snap daemon has several methods of configuration. We've been doing all of them via the command-line interface, but none of it is persistent. If we rebooted our nodes right now, the Snap daemon would start up, but it wouldn't load the plugins and tasks we've defined for it.

To make that happen, you would want to use the guidelines at Snap Daemon Configuration.

We won't get into the specifics here, but suffice it to say that /etc/snap/snapteld.conf can be set up as either a YAML or JSON file that contains more or less the same information that our task manifests did. This file will suffice to install plugins and run tasks at boot time. As well, it defines many defaults about the way that Snap runs, so that you can tune the daemon to collect properly without imposing too much of its own overhead on your servers.

Likewise, it's likely you've been wondering about loading plugins and how secure that process is. The default installation method that we've used here sets the value plugin_trust_level to 0 in the /etc/snap/snapteld.conf configuration file. This means that the plugins that we've been downloading and installing haven't been checked for integrity by the daemon.

Snap uses GPG* keys and signing methods to allow you to sign and trust binaries in your deployment. The instructions for doing this are at Snap Plugin Signing. Again, it is beyond the scope of this article to examine this system deeply, but we strongly advise that any production deployments integrate with the plugin signing system, and that signatures are distributed independently of the plugins. This should not be an unusual model for deploying software in most data centers (although the work will probably run past our single afternoon).

Examining the Data

The Graphite container that we ran earlier exposes a useful web interface from the server. If we connect to it, we'll find that the available metrics are listed on the left-hand side of the resulting page.

The headings by hostname are the ones that we're interested in here (the others are defaults provided by the Graphite and StatsD* container system we've started). In the screenshot below I've expanded some of the items so you can get a feel for where the metrics come out in Graphite.

Figure 7

Figure 7. Metrics in the Graphite homepage.

From here it would be quite simple to construct Graphite-based graphs that offer interesting and useful data about your servers. For example, Figure 8 shows a graph that we constructed that looks at the Kubernetes node, and combines information about system load, Docker containers, and Kubernetes pods over a 30-minute period.

You can see from here when a new pod was launched and then removed. The purple line that shoots to 1.0 and drops again is for one of the containers in the pod; it didn't exist before spawning, and ceased to exist afterwards.

The green line is one-minute load average on the worker node, and the red line is Docker memory utilization on the same node.

Figure 8

Figure 8. A simple Graphite chart.

This is a simple example, just to give an idea of what can be generated, and the point here is that it took very little time to construct a graph with viable information. A little bit of experimentation with your specific environment and workloads will almost certainly reveal more useful data collection and uses in a very short period of time!

Making a Nice Dashboard

From here, it's a relatively simple matter to make some nice boss-friendly charts out of our pile of Graphite data. First, we'll load up another couple of quickie containers on the control host, to store our Grafana info, and to run the tool itself:

sudo docker run -d -v /var/lib/grafana --name grafana-storage busybox:latest
sudo docker run   -d   -p 3000:3000   --name=grafana   --volumes-from grafana-storage   grafana/grafana

Now you've got a Grafana installation running on port 3000 of the node these commands were run on. We'll bring it up in a browser and log in with the default credentials of admin for both Username and Password. You will arrive at the Home dashboard, and the next task on the list is to add a data source. Naturally, we'll add our Graphite installation:

Figure 9

Figure 9. Adding a data source in Grafana.

Once that's set up, we can return to the Home dashboard using the pull-down menu from the Grafana log in the upper-left hand corner. Select Dashboards and Home. From there, click New Dashboard. You'll get a default graph that's empty on that page. Click the Panel Title, then Edit, and you can add metrics, change the title, and so on.

Adding metrics works the same here as it did on the Graphite screen, more or less. A little time spent exploring can give you something reasonably nice, as shown below. In an afternoon you could generate a very cool dashboard for your data sources!

Here's a quick dashboard we did with our one-minute load average chart and a chart of memory utilization of all Kubernetes containers in the cluster. The containers come into being as the Pod is deployed, which is why there is no data for them on the first part of the graph. We can also see that the initial deployment of the Pod was quickly halted and re-spawned between 17:35 and 17:40.

Figure 10

Figure 10. A simple dashboard in Grafana.

This may or may not be useful information for you; the point is to generate useful information for yourself that is very simple and quick.

Once Again, But Faster!

So by now we've explored a lot of Snap's potential, but one area we haven't covered too much is its extensibility. The plugin framework and open source tooling allows it to be extended quite easy by anyone interested.

For the example we used here, a Kubernetes setup, it turns out that there is a nice extension designed to plug directly into Snap with a full set of metrics and dashboards already available. It's called the Grafana Kubernetes app, and it runs all the components directly in your Kubernetes cluster instead of outside of it, the way we've done in this article.

You can find that in the Grafana Kubernetes app.

Besides prepared, chaining plugins like this one, other areas of extension are possible for Snap as well. For example, more advanced schedulers than the basic three (simple, windowed, cron) can be slotted-in with relative ease. And of course, new collector and publisher plugins are always welcome!

Summary

In this article we've introduced you to Snap, an extensible and easy-to-use framework for collecting data center telemetry. We've talked about the kinds of system, application, and cluster-level data that Snap can collect (including Intel® architecture-specific counters like CPU states and Intel PCM). We've demonstrated some common ways to put plugins together to produce usable data for data center operators and, furthermore, create good-looking graphs with Grafana. We've shown you how to install, configure, add plugins, schedule and create tasks, and check the health of the running tasks.

We've also had a short discussion on how to take it to the next level and deploy Snap for real with signed plugins and persistent state. Finally, we've shown that Snap is easily extended to deliver the telemetry you need, to where you need it.

We hope you take an afternoon to try it out and see what it can do for you!

About the Author

Jim Chamings is a Sr. Software Engineer at Intel Corporation, who focuses on enabling cloud technology for Intel’s Developer Relations Division. He’d be happy to hear from you about this article at: jim.chamings@intel.com.


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>