What is it and how can it be used to improve NFV performance?
Overview
Enhanced Platform Awareness (EPA) is a concept that relies on a set of OpenStack Nova* features called Host Aggregates and Availability Zones. The Nova scheduler uses these objects to determine which host a guest should be launched on based on the capabilities of the host and the requested features of the virtual machine (VM). Most of these features appeared in the Grizzly release and have evolved in Kilo as a set of filter and weights used by the Nova scheduler to determine where a VM should be deployed.
The goal of this article is to examine the Enhanced Platform Awareness whitepaper, where the term was coined, and demonstrate how EPA can be used by the operator in OpenStack deployments and by developers to add more filters. Along the way we will try to unlock the mysteries behind Flavors, Host Aggregates, and Availability Zones and help to enable EPA in your lab.
OpenStack Architecture
Within the OpenStack architecture most of the focus is in the Nova component, and we briefly look at how Glance might play a role in simplifying future configurations.
Flavors, Filter Attributes, and Extra Specification
It is desirable to list the filter attributes and determine standard flavor attributes, available extra specifications the scheduler acts on, and any other value add features filter attributes available as criteria for the scheduler.
INSTANCE_ID = “c32ac737-1788-4420-b200-2a107d5ad335”
nova boot --flavor 2 --image $INSTANCE_ID testinstance
In this example, flavor 2
represents a flavor template with the name m1.bigger
, which contains the following:
- 1 vCPU
- 2G RAM,
- 10G root disk,
- 20G ephemeral disk
Managing Flavors
A flavor is a guest instance type or virtual hardware template. A flavor specifies a set of VM resources, such as the number of virtual CPUs, the amount of memory, and the disk space assigned to a VM instance. An example of an instance flavor is a kernel zone with 16 virtual CPUs and 16384 MB of RAM.
For more information:
IBM presented their extra specifications attributes for some of their technology.
http://www-01.ibm.com/support/knowledgecenter/SST55W_4.3.0/liaca/liacaflavorextraspecs.html
Extra Specifications
A flavor might include properties that are in addition to the base flavor properties. These extra specifications are key-value pairs that can be used to provide advanced configuration in addition to the configuration provided by the base flavor properties. This configuration is specific to the hypervisor.
An advanced configuration provided with flavor extra specifications might include the following extra specification example. Note that all key-value pairs appended to the extra specification field are specifically for the scheduler filters.
Config IO limit for the specified instance type
nova-manage flavor set_key --name m1.small --key quota:disk_read_bytes_sec --value 10240000
nova-manage flavor set_key --name m1.small --key quota:disk_write_bytes_sec --value 10240000
Config CPU limit for the specified instance type
nova-manage flavor set_key --name m1.small --key quota:cpu_quota --value 5000
nova-manage flavor set_key --name m1.small --key quota:cpu_period --value 2500
Config Bandwidth limit for instance network traffic
nova-manage flavor set_key --name m1.small --key quota:vif_inboud_average --value 10240
nova-manage flavor set_key --name m1.small --key quota:vif_outboud_average --value 10240
For more information:
OpenStack Introduction to Image Flavors:
http://docs.openstack.org/openstack-ops/content/flavors.html
http://docs.orack-aggregate1le.com/cd/E36784_01/html/E54155/flavoredit.html
Extra Specification and Namespaces
extra_specs
is an overloaded parameter that contains key-value pairs. If anextra specs
key contains a colon (:), anything before the colon is treated as a namespace, and anything after the colon is treated as the key to be matched.
Here is an example of an unscoped key:
nova flavor-key 1 set capabilities:vcpus='>= 6'
nova flavor-key 1 set capabilities:vcpus_used='== 0'
nova flavor-show 1
+----------------------------+-----------------------------------------------------------------------+ | Property | Value | +----------------------------+-----------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 0 | | extra_specs | {u'capabilities:vcpus': u'>= 6', u:capabilities:vcpus_used': u'== 0'} | | id | 1 | | name | m1.tiny | | os-flavor-access:is_public | True | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 1 | +----------------------------+-----------------------------------------------------------------------+
extra_specs
is useful when multiple filters are enabled and conflicts must be avoided.
Filtering Strategies
The Filter Scheduler uses filters to select the host that is ultimately chosen to launch the target guest. There are a lot of filtering options, configuration flexibility, and even the ability to roll your own. In addition to the documentation, the best way to learn about the filters and the way they work is studying the source code. There are a few specific locations to dig into: the scheduler, the filters, and the unit tests.
From the many filters available, there are a few that are important for understanding how they work and the method used to configure them.
RamFilter
The OpenStack developer documentation highlights the simplicity of the filters by reviewing the RamFilter, which requests an exact amount of memory to be required to launch the guest image.
class ExactRamFilter(filters.BaseHostFilter):"""Exact RAM Filter.""" def host_passes(self, host_state, filter_properties): """Return True if host has the exact amount of RAM available.""" instance_type = filter_properties.get('instance_type') requested_ram = instance_type['memory_mb'] if requested_ram != host_state.free_ram_mb: LOG.debug("%(host_state)s does not have exactly ""%(requested_ram)s MB usable RAM, it has ""%(usable_ram)s.", {'host_state': host_state,'requested_ram': requested_ram,'usable_ram': host_state.free_ram_mb}) return False return True
This class method host_passes
returns True
if the memory requested by the guest is exactly the amount available to the host being evaluated for selection. A slightly more complex and more useful version of this filter uses the ram_allocation_ratio
and compares the virtual RAM to physical RAM allocation ratio, which is at least 1.5 by default.
JsonFilter
This filter allows operators to write rules matching hosts capabilities based on simple JSON-like syntax. The operations for comparing host state properties are “=”, “<”, “>”, “in”, “<=”, and “>=”, and can be combined with “not”, “or”, and “and”. Make sure the JsonFilter is added to the scheduler_default_filters
parameter in /etc/nova/nova.conf
to enable this functionality.
The example below found in the unit test will filter all hosts with free RAM greater than or equal to 1024 MB and with free disk space greater than or equal to 200 GB.
['and',
['>=', '$free_ram_mb', 1024],
['>=', '$free_disk_mb', 200 * 1024]
]
Many filters use the scheduler_hints
parameter for the nova boot command when launching the guest instance.
nova boot --image cirros-0.3.1-x86_64-uec --flavor 1 \
--hint query=['>=','$free_ram_mb',1024] test-instance
ComputeCapabilitiesFilter
The ComputeCapabilitiesFilter
will only pass hosts whose capabilities satisfy the requested specifications. All hosts are passed if no extra_specs
are specified.
Recalling the earlier discussion of extra specifications and namespace: to avoid conflicts when the filter AggregateInstanceExtraSpecsFilter
is enabled, use the namespace capabilities
when adding extra specifications.
ImagePropertiesFilter
The ImagePropertiesFilter
filters hosts based on properties defined on the instance's image. It passes hosts that can support the specified image properties contained in the instance. Properties include the architecture, hypervisor type, hypervisor version (for Xen* hypervisor type only), and virtual machine mode.
For example, an instance might require a host that runs an ARM*-based processor and QEMU* as the hypervisor. You can decorate an image with these properties by using:
glance image-update $img-uuid --property architecture=x86_64 \
--property hypervisor_type=qemu
The image properties that the filter checks for are:
- Architecture. Describes the machine architecture required by the image. Examples are i686, x86_64, arm, and ppc64.
- hypervisor_type. Describes the hypervisor required by the image. Examples are xen, qemu, and xenapi.
- hypervisor_version_requires. Describes the hypervisor version required by the image. The property is supported for the Xen hypervisor type only. It can be used to enable support for multiple hypervisor versions and to prevent instances with newer Xen tools from being provisioned on an older version of a hypervisor. If available, the property value is compared to the hypervisor version of the compute host.
To filter the available hosts by the hypervisor version, add the hypervisor_version_requires
property on the image as metadata and pass an operator and a required hypervisor version as its value:
glance image-update img-uuid --property hypervisor_type=xen \
--property hypervisor_version_requires=">=4.3"
hypervisor_type
: Describes the hypervisor application binary interface (ABI) required by the image. Examples: xen for Xen 3.0 paravirtual ABI, hvm for native ABI, uml for User Mode Linux paravirtual ABI, and exe for container virt executable ABI.
A host can support multiple hypervisors. For example, a host could define [u'i686', u'qemu', u'hvm'] and [u'x86_64', u'qemu', u'hvm'] and then the image properties for guest instance might be [u'x86_64', u'qemu', u'hvm'], then the guest could be deployed on this host.
Image properties for the guest instance can be defined as subsets. For example, the image properties for a guest is [u'x86_64', u'hvm'] then guest can be deployed on the host whose supported hypervisor was [u'x86_64', u'qemu', u'hvm'].
Filter Weights
The Filter Scheduler uses weights during the evaluation and selection process to give more or less preferential treatment to a host.
The Filter Scheduler weights hosts are based on the configuration option scheduler_weight_classes
, which defaults to nova.scheduler.weights.all_weighers
, which selects the only weigher available: the RamWeigher. Hosts are then weighted and sorted with the largest weight winning.
Filter Scheduler finds local list of acceptable hosts by repeated filtering and weighing. Each time the Scheduler deploys to a host, the resources are consumed and adjusted accordingly for the next host selection evaluation. This becomes more useful when a large number of instances is requested because the weight is computed for each request.
In the end, Filter Scheduler sorts selected hosts by their weight and provisions instances on them.
Availability Zones and Host Aggregates
Host Aggregation
Host aggregates are a way to group hosts that have a particular feature or capability. The grouping criteria could be as simple as installed memory or CPU type to as complex as NUMA topology. Using host aggregates in OpenStack can be completely arbitrary.
To create a host aggregate we use the nova aggregate-create
command:
$ nova aggregate-create rack-aggregate1
+----+-----------------+-------------------+-------+----------+
| Id | Name | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 1 | rack-aggregate1 | None | | |
+----+-----------------+-------------------+-------+----------+
This creates a host aggregate that is exposed to operator as an availability zone.
$ nova aggregate-create rack-aggregate2 tokyo-az
+----+-----------------+-------------------+-------+----------+
| Id | Name | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 2 | test-aggregate2 | test-az | | |
+----+-----------------+-------------------+-------+----------+
This command creates an aggregate within the Tokyo availability zone, rather than the default availability zone. Aggregates can be used to further subdivide availability zones as an optional parameter when launching the guest with the nova boot command.
Add a host to a host aggregate, rack-aggregate2
. Since this host aggregate defines the availability zone tokyo-az
, adding a host to this aggregate makes it a part of the tokyo-az
availability zone.
$ nova aggregate-add-host 2 stack-compute1
Aggregate 2 has been successfully updated.
+----+-----------------+-------------------+---------------------+------------------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+-----------------+-------------------+---------------------+------------------------------------+ | 2 | test-aggregate2 | tokyo-az | [u'stack-compute1'] | {u'availability_zone': u'tokyo-az'}| +----+-----------------+-------------------+---------------------+------------------------------------+
So availability zones and host aggregates both segregate a group of hosts, but an administrator would use host aggregates to group hosts that have a unique hardware or special performance characteristics.
Host aggregates are not explicitly exposed to operators. Instead administrators map flavors to host aggregates. To do this, administrators set metadata on a host aggregate and match flavor extra specifications. The scheduler then matches guest launch requests for an instance of the given flavor to a host aggregate with the same key-value pair in its metadata. Compute nodes or hosts can be in more than one host aggregate.
Command-Line Interface
The nova
command-line tool supports the following aggregate-related commands.
nova aggregate-list
Print a list of all aggregates.
nova aggregate-create <name> [availability-zone]
Create a new aggregate named <name>, and optionally in the availability zone [availability-zone] if specified. The command returns the ID of the newly created aggregate. Hosts can be made available to multiple host aggregates. Be careful when adding a host to an additional host aggregate when the host is also in an availability zone. Pay attention when using the aggregate-set-metadata
and aggregate-update
commands to avoid user confusion when they boot instances in different availability zones. An error occurs if you cannot add a particular host to an aggregate zone for which it is not intended.
nova aggregate-delete <id>
Delete an aggregate with id <id>
nova aggregate-details <id>
Show details of the aggregate with id <id>
nova aggregate-add-host <id> <host>
Add a host with name <host> to the aggregate with id <id>
nova aggregate-remove-host <id> <host>
Remove the host with name <host> from the aggregate with id <id>
nova aggregate-set-metadata <id> <key=value> [<key=value> ...]
Add or update metadata (key-value pairs) associated with the aggregate with id <id>
nova aggregate-update <id> <name> [<availability_zone>]
Update the name and availability zone (optional) for the aggregate.
nova host-list
List all hosts by service.
nova host-update --maintenance [enable | disable]
Put/resume host into/from maintenance.
Availability Zones
An availability zone is a way to specify a particular location in which a guest should boot.
The most common usage for availability zones is to group together available hosts that are connected to the network. As the number of hosts grows, availability zones may be defined by geographic location.
To specify the availability zone in which your guest will be launched, add the availability-zone parameter to the nova boot command:
nova boot --flavor 2 --image 1fe4b52c-bda5-11e2-a40b-f23c91aec05e \
--availability-zone tokyo-az testinstance
nova show testinstance
+-------------------------------------+--------------------------------------------------------------+ | Property | Value | +-------------------------------------+--------------------------------------------------------------+ | status | BUILD | | updated | 2013-05-21T19:46:06Z | | OS-EXT-STS:task_state | spawning | | OS-EXT-SRV-ATTR:host | styx | | key_name | None | | image | cirros-0.3.1-x86_64-uec(64d985ba-2cfa-434d-b789-06eac141c260)| | private network | 10.0.0.2 | | hostId | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b | | OS-EXT-STS:vm_state | building | | OS-EXT-SRV-ATTR:instance_name | instance-00000002 | | OS-EXT-SRV-ATTR:hypervisor_hostname | styx | | flavor | m1.bigger (2) | | id | 107d332a-a351-451e-9cd8-aa251ce56006 | | security_groups | [{u'name': u'default'}] | | user_id | d0089a5a8f5440b587606bc9c5b2448d | | name | testinstance | | created | 2013-05-21T19:45:48Z | | tenant_id | 6c9cfd6c838d4c29b58049625efad798 | | OS-DCF:diskConfig | MANUAL | | metadata | {} | | accessIPv4 | | | accessIPv6 | | | progress | 0 | | OS-EXT-STS:power_state | 0 | | OS-EXT-AZ:availability_zone | tokyo-az | | config_drive | | +-------------------------------------+--------------------------------------------------------------+
This example specifies that the m1.bigger flavor instance Tokyo data center will be launched in the availability zone. The availability zone for a host is set in the nova.conf
file using node_availability_zone
. The following options also can be configured for the availability zones in the /etc/nova/nova.conf
file:
default_availability_zone = nova
Default compute node availability_zone
default_schedule_zone = None
Availability zone to use when not specified
internal_service_availability_zone = internal
The availability_zone to associate internal services
Administrators are able to optionally expose a host aggregate as an availability zone.
Availability zones are different from host aggregates in that they are explicitly exposed to the operator, and hosts can only be in a single availability zone. Administrators can use default_availability_zone
to configure a default availability zone where instances will be scheduled because the user fails to specify one.
The Scheduler and Filters
Overview
Defining Workflow Activities for Deploying a Guest
Host aggregates are a way for schedulers to know where to place guest based on some characteristics. In this example, we want to deploy a guest in a specific rack in the Tokyo data center.
Here is the workflow for using host aggregates:
1. Check whether the scheduler enables host aggregates.
$ cat /etc/nova/nova.conf | grep scheduler_default_filters
scheduler_default_filters=AggregateInstanceExtraSpecsFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter
For this particular host configuration the scheduler will examine the following filters:
- Are hosts operational and enabled? (
ComputeFilter
) - Are hosts in the requested availability zone? (
AvailabilityZoneFilter
) - Do hosts have sufficient RAM available? (
RamFilter
) - Can hosts service the request? (
ComputeFilter
) - Do hosts satisfy the extra specs associated with the instance type? (
ComputeCapabilitiesFilter
). - Do hosts satisfy any architecture, hypervisor type, or VM mode properties specified on the instance's image properties? (
ImagePropertiesFilter
.
Additional filters can be found in the Scheduler section of the OpenStack Configuration Reference.
2. Create the host aggregate:
nova aggregate-create rack-aggregate1 tokyo-az
This command creates a new aggregate in the Tokyo availability zone and creates an id.
+----+----------------+-------------------+-------+----------+
| Id | Name | Availability Zone | Hosts | Metadata |
+----+----------------+-------------------+-------+----------+
| 1 | rack-aggregate1| tokyo-az | | |
+----+----------------+-------------------+-------+----------+
3. Add the host aggregate characteristics using the rack-aggregate1 id:
nova aggregate-set-metadata 1 fastnic=true
4. Add hosts to aggregate rack-aggregate1 for the scheduler to launch guests.
nova aggregate-add-host 1 styx
nova aggregate-add-host 1 kerberos
+----+------------------+-------------------+--------------------- -+----------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+------------------+-------------------+-----------------------+----------------------+ | 1 | rack-aggregate1 | nova | [u'styx', u'kerberos']| {u'fastnic': u'true'}| +----+------------------+-------------------+-----------------------+----------------------+
5. Create a flavor m1.bigger and apply the rack-aggregate1 property:
nova flavor-create m1.bigger 6 16384 80 4
nova-manage instance_type set_key --name= m1.bigger --key=rack-aggregate1 --value=true
This creates the new flavor and specifies the extra_specs property, as you can see with the flavor-show command:
nova flavor-show m1.biggerc
+----------------------------+----------------------------+
| Property | Value |
+----------------------------+----------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 80 |
| extra_specs | {u'fastnic': u'true'} |
| id | 42 |
| name | m1.bigger |
| os-flavor-access:is_public | True |
| ram | 16384 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+----------------------------+----------------------------+
6. Operators can use the flavor to ensure their guests are launched in rack-aggregate1:
$ nova boot --image f69a1e3e-bdb1-11e2-a40b-f23c91aec05e --flavor m1.bigger
Now that the test-az availability zone has been defined and contains one host, a user can boot an instance and request this availability zone.
$ nova boot --flavor 10 --image 64d985ba-2cfa-434d-b789-06eac141c260 \ --availability-zone tokyo-az testinstance
$ nova show testinstance
+-------------------------------------+----------------------------------------------------------------+ | Property | Value | +-------------------------------------+----------------------------------------------------------------+ | status | BUILD | | updated | 2015-05-21T11:36:023 | | OS-EXT-STS:task_state | spawning | | OS-EXT-SRV-ATTR:host | devstack | | key_name | None | | image | cirros-0.3.1-x86_64-uec (64d985ba-2cfa-434d-b789-06eac141c260) | | private network | 10.0.0.2 | | hostId | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b | | OS-EXT-STS:vm_state | building | | OS-EXT-SRV-ATTR:instance_name | instance-00000002 | | OS-EXT-SRV-ATTR:hypervisor_hostname | styx | | flavor | m1.bigger (10) | | id | 107d332a-a351-451e-9cd8-aa251ce56006 | | security_groups | [{u'name': u'default'}] | | user_id | d0089a5a8f5440b587606bc9c5b2448d | | name | testinstance | | created | 2015-05-21T11:36:023 | | tenant_id | 6c9cfd6c838d4c29b58049625efad798 | | OS-DCF:diskConfig | MANUAL | | metadata | {} | | accessIPv4 | | | accessIPv6 | | | progress | 0 | | OS-EXT-STS:power_state | 0 | | OS-EXT-AZ:availability_zone | tokyo-az | | config_drive | | +-------------------------------------+----------------------------------------------------------------+
The above examples show how host-aggregates provide an API-driven mechanism for cloud administrators to define availability zones. The other use case that host aggregates serves is a way to tag a group of hosts with a type of capability. When creating custom flavors, you can set a requirement for a capability. When a request is made to boot an instance of that type, it will only consider hosts in host aggregates tagged with this capability in its metadata.
We can add some metadata to the original host aggregate we created that was not also an availability zone, rack-aggregate1.
$ nova aggregate-set-metadata 1 fastnic=true
Aggregate 1 has been successfully updated.
+----+-----------------+-------------------+-------+----------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+-----------------+-------------------+-------+----------------------------+ | 1 | rack-aggregate1 | None | [] | {u'fastnic': u'true'} |
The scheduler in this case knows the following:
- flavor m1.bigger requires fastnic to be true
- all hosts in the rack-aggregate1 have fastnic=true
- the kerberos and styx are hosts in the rack-aggregate1
The scheduler starts the new guest on the host that is the most available of the two.
There are some other considerations of when to use host aggregates or availability zones. An operator of OpenStack can only use availability zones, and the administrator is the only one that can set up host aggregates. Host aggregates most likely need to be set up ahead of time. Here are some guidelines of when to use each construct.
- If there is a physical separation between hosts, use availability zones.
- If there is a hardware capabilities separation between hosts, use host aggregates.
- If hosts within a particular grouping are spread across multiple locations, use host aggregates to group together hosts from multiple availability zones by creating a host aggregate with the desired metadata in each zone.
- If operators want to group guests, use availability zones because they can be specified without administrative assistance.
Availability zones enable operators to choose from a group of hosts. Host aggregates enable an administrator to specify the way host hardware is utilized.
The Nova scheduler is responsible for determining which host or compute nodes to launch a guest instance based on a series of configurable filters and weights. In the next release, Liberty, work is underway to decouple the scheduler from Nova and create an object for image metadata. The current scheduler framework plays a significant role on resource utilization.
Also new is the caching scheduler, which uses the existing facilities for applying scheduler filters and weights but caches the list of available hosts. When a user request is passed to the caching scheduler it attempts to perform scheduling based on the list of cached hosts, with a view to improving scheduler performance.
A new scheduler filter, AggregateImagePropertiesIsolation
, has been introduced. The new filter schedules instances to hosts based on matching namespace-scoped image properties with host aggregate properties. Hosts that do not belong to any host aggregate remain valid scheduling targets for instances based on all images. The new Nova service configuration keys aggregate_image_properties_isolation_namespace
and aggregate_image_properties_isolation_separator
are used to determine which image properties are examined by the filter.
Setting Up Filtering Use Cases Specific for the Intel® Architecture Platform
Trusted Computing Group
OpenStack Folsom release introduced Trusted Computing Group (TCP) where an attestation server is used when launching a guest instance to determine guest authorization for a target host.
http://wiki.openstack.org/TrustedComputingPools
https://github.com/openstack/nova/blob/master/nova/scheduler/filters/trusted_filter.py
1. Set the following value in nova.conf
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler,TrustedFilter
2. Add the trusted computing section to nova.conf
[trusted_computing]
server=10.10.10.10
port=8181
server_ca_file=/etc/nova/ssl.10.1.71.206.crt
api_url=/AttestationService/resources/PoolofHosts
auth_blob=i-am-openstack
3. Add the "trusted" requirement to an existing flavor by running
nova-manage instance_type set_key m1.tiny trust:trusted_host trusted
4. Restart the nova-compute and nova-scheduler service.
PCI Passthrough and SR-IOV
Please make sure your compute node has PCI passthrough support enabled perhttp://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM
Configure Nova
- Compute node:
pci_passthrough_whitelist: White list of PCI devices available to VMs.
For example:
pci_passthrough_whitelist=[{ "vendor_id":"8086","product_id":"1520"}]
defines all PCI devices in the platform with vendor_id
as 0x8086 and product_id
as 0x1520 will be assignable to the instances.
- Controller node:
pci_alias
: An alias for a PCI passthrough device requirement.
For example:
pci_alias={"vendor_id":"8086", "product_id":"1520", "name":"a1"}
defines pci alias
'a1' to present a request for PCI devices with vendor_id
as 0x8086 and product_id
as 0x1520.
- Scheduler node:
enable pci devices filter.
For example:
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter
Create a flavor
For more information:
You don't need this step for SR-IOV NIC support.
Please check https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking for passing a PCI device request through port creation.
Configure a flavor that request pci devices. For example:
nova flavor-key m1.large set "pci_passthrough:alias"="a1:2"
Update a flavor that requires two PCI devices, each with vendor_id
as 0x8086 and product_id
as 0x1520.
Create a VM
For more information:
Please check https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking for a passing PCI device request port-id.
nova boot --image new1 --key_name test --flavor m1.large 123
Create a VM with the PCI requirements. The image is the image that contains the driver to the assigned devices; the “test” is the key pair.
Check assignment instance
nova show 123
To check VM status until it becomes active.
nova ssh --private 123 -i test.pem
to log in to the guest; 'lspci' will show you all the devices.
How to check PCI status with PCI API patches
The PCI API patches extends the servers/OS hypervisor to show PCI information for instance and compute nodes, and also provides a resource endpoint to show PCI information.
- Get the patches from https://github.com/yjiang5/pci_api.git, apply the patch or copy the extension plug-in files. Update the policy file with two new policies and restart the nova api service.
"compute_extension:instance_pci": "", "compute_extension:pci": "rule:admin_api",
- Try the PCI API.
nova pci-list node_id
shows all PCI devices on a compute node with node_id
as the id. (Use nova hypervisor-list
to get all compute_node
in the system.)
nova list
will get the pci assignment to a instance, the 'os-pci:pci' contains the id of the PCI device.
nova pci-show id
shows the details of a PCI device.
PCI passthrough use notes
- alias "device_type"
alias define with device_type
is optional; currently there is no way to discover the type of PCI device from the hypervisor, so don't define device_type
in alias now:
pci_alias={"vendor_id":"8086", "product_id":"1520", "name":"a1, "device_type":"NIC""}
If an alias with device_type
defined in nova.conf, device_type
will be part of specification of pci request, and will fail to schedule a compute node to meet this request. This behavior might need to be improved with an enhancement scheduler, which can be configured to ignore device type.
For more information:
https://wiki.openstack.org/wiki/Pci_passthrough
https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking
Sample Code for Launching a Guest
This sample code demonstrates with API 2.0 how to add availability zones, flavors, host aggregates, and scheduler hints.
credentials.py
#!/usr/bin/env python
import os
def get_keystone_creds():
d = {}
d['username'] = os.environ['OS_USERNAME']
d['password'] = os.environ['OS_PASSWORD']
d['auth_url'] = os.environ['OS_AUTH_URL']
d['tenant_name'] = os.environ['OS_TENANT_NAME']
return d
def get_nova_creds():
d = {}
d['username'] = os.environ['OS_USERNAME']
d['api_key'] = os.environ['OS_PASSWORD']
d['auth_url'] = os.environ['OS_AUTH_URL']
d['project_id'] = os.environ['OS_TENANT_NAME']
return d
launch_guest.py
import os
import time
import novaclient.v1_1.client as nvclient
from credentials import get_nova_creds
creds = get_nova_creds()
nova = nvclient.Client(**creds)
if not nova.keypairs.findall(name="mykey"):
with open(os.path.expanduser('~/.ssh/id_rsa.pub')) as fpubkey:
nova.keypairs.create(name="mykey", public_key=fpubkey.read())
image = nova.images.find(name="cirros")
flavor = nova.flavors.find(name="m1.tiny")
instance = nova.servers.create(name="test", image=image, flavor=flavor, key_name="mykey")
# Poll at 5 second intervals, until the status is no longer 'BUILD'
status = instance.status
while status == 'BUILD':
time.sleep(5)
# Retrieve the instance again so the status field updates
instance = nova.servers.get(instance.id)
status = instance.status
print "status: %s" % status
Future Enhancements
Encoding OVF Images with Desired Platform Features
Kilo introduced the ability for Glance to import guest images that use the OVF format although the metadata associated with the image will not be used. The promise was that the metadata imported by glance could be utilized by the Image Properties filter to filter the available hosts to a subset of platforms from which guest image can be launched.
The OVF-based approach has the potential to provide the greatest portability and the whitepapers indicate that this is the vision, but the feature is only in the planning stage for Liberty.
Here is a bit of a background on the guest image format and some useful tools for manipulating the image, but it is difficult to determine how an operator would edit the image metadata.
Open Virtualization Format (OVF)
The OVF Specification offers the ability to describe the properties of a virtual system. The XML-based system has generous allowances for extensibility and trade-offs that in certain cases affect portability. Most commonly, an OVF file is used to describe a single VM or virtual appliance. The OVF file can contain information about the format of a virtual disk image file as well as a description of the virtual hardware that should be emulated to run the OS or application contained on such a disk image.
Open Virtual Appliance (OVA)
An OVA is an OVF file packaged together with all of its supporting files similar to a .zip or tar file. The OVF specification describes the OVA kit and folks commonly use the OVA and OVF interchangeably not realizing that an OVA file contains an OVF file and all its assets.
An OVA file consists of a descriptor (.ovf) file, a storage (.vmdk) file, and a manifest (.mf) file.
- ovf file. Descriptor file that is an XML file with extension ovf. Consists of all the metadata about the package. It encodes all the product details, virtual hardware requirements, and licensing.
- vmdk file. File format that encodes a single virtual disk from a VM.
- mf file. Optional file that stores the SHA key generated during packaging.
Conversion Tools
The following are a list of tools to help convert image formats to OVF and manage common DCIM attributes:
- qemu-img. Part of the QEMU* project, qemu-img is the most versatile image format conversion tool.
- Ovftool. Distributed with some VMWare products (in VMWare Fusion*, it’s buried inside the .app bundle), ovftool can convert between some VMWare VM formats and OVF. Here is a link to a Linux binary http://www.gibni.com/install-ovftool-debian-wheezy
- VBoxManage. VirtualBox’s command-line interface, VBoxManage has some options for exporting and converting images. Also the VirtualBox UI in general allows the operator to modify the most common DCIM attributes.
The Open Virtualization Archive (OVA) file package simplifies the process of deploying a guest image by providing a complete definition of the parameters and resource allocation requirements for the guest.
Nova Scheduler
The blue prints for Liberty indicate that the Scheduler may be broken out of Nova.
For more information:
CIM Schema: Version 2.44.0
http://dmtf.org/standards/cim/cim_schema_v2440
OVF Encoding of Metadata
Enhanced Platform Awareness – OVF Meta-Data Import in Glance:
https://blueprints.launchpad.net/glance/+spec/epa-ovf-meta-data-import
Ability to export/import metadata in OVF:
https://blueprints.launchpad.net/glance/+spec/artifact-repository-api
Define an API to manage Artifacts:
https://blueprints.launchpad.net/glance/+spec/artifact-repository-api
Introspection of images in Kilo:
https://blueprints.launchpad.net/glance/+spec/introspection-of-images
Base Glance Metadata Definitions Admin UI:
https://blueprints.launchpad.net/horizon/+spec/glance-metadata-definitions-base-admin-ui
Troubleshooting Tips
Log Analysis
The log levels of the OpenStack scheduler could be bumped to troubleshoot VM to server selections by Nova. In devstack the Nova scheduler logs are located in /opt/stack/logs/screen/n-sch.log
Developer Best Practices and Tips
VM development suggestions?
Detection and fallback of supported features? For example, Intel® Virtualization Technology for Directed I/O is not enabled or available. The Intel® QuickAssist Technlogy library cannot be initialized because the supported chipset is not there. Are there other considerations?
For more information:
Whitepapers and Related Documentation
https://01.org/openstack/home/openstack-enhanced-platform-awareness-white-paper
Video Lectures
Divide and Conquer: Resource Segregation in the OpenStack Cloud:
https://www.youtube.com/watch?v=H6I3fauKDb0
OpenStack Enhancements to Support NFV Use Cases
https://www.youtube.com/watch?v=5hZmE8ZCLLo
Deliver Cloud-ready Applications to End-users Using the Glance Artifact Repository
https://www.youtube.com/watch?v=mbRrWFMBlLM
Active OpenStack Blueprints That Might Affect EPA
Release Target Removed – SR-IOV Scheduling with NIC Capabilities
https://blueprints.launchpad.net/nova/+spec/sriov-sched-with-nic-capabilities
Proposed Filters for Liberty - Aggregate Flavor extra_spec Affinity Filter
https://blueprints.launchpad.net/nova/+spec/aggregate-extra-specs-filter
List the Flavors and the Extra Specs Applied
nova flavor-list --extra-specs +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+ | ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs | +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+ | 1 | m1.tiny | 512 | 1 | 0 | | 1 | 1.0 | True | {} | | 2 | m1.small | 2048 | 20 | 0 | | 1 | 1.0 | True | {} | | 3 | m1.medium | 4096 | 40 | 0 | | 2 | 1.0 | True | {} | | 4 | m1.large | 8192 | 80 | 0 | | 4 | 1.0 | True | {} | | 5 | m1.xlarge | 16384 | 160 | 0 | | 8 | 1.0 | True | {} | +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+
For more information:
https://github.com/openstack/nova-specs/blob/master/specs/kilo/implemented/io-ops-weight.rst
https://github.com/openstack/nova-specs/blob/master/specs/juno/implemented/pci-passthrough-sriov.rst
Filter Attributes are available for Numa topology, cpu pinning, pci passthrough/sriov exists Kilo. However, CPU pinning was only partially implemented by RedHat in the Kilo cycle, so we are working to complete the implementation for the Liberty release.