使用 Packer 创建自定义 OS 镜像
目录
使用 Packer 创建自定义 OS 镜像¶
Dask Cloudprovider 中的许多云提供商都涉及到创建虚拟机并在启动时在这些虚拟机上安装依赖项。
这会减慢集群的创建和扩展速度,因此本页讨论使用 Packer 构建自定义镜像来加快集群创建速度。
Packer 是一个工具,它可以在您想要的云上启动一个虚拟机,运行任何安装步骤,然后对该虚拟机进行快照,以便之后用作创建新虚拟机的模板。这使我们能够运行一次安装步骤,然后在启动 Dask 组件时重复使用它们。
Packer 概览¶
要使用 Packer 创建镜像,我们需要创建一个 JSON 配置文件。
Packer 配置文件分为几个部分:builders
(构建器)和 provisioners
(供应器)。
构建器配置您正在构建的镜像类型(AWS AMI、GCP VMI 等)。它描述了您构建所基于的基础镜像以及 Packer 连接到构建实例的连接信息。
当您运行 packer build /path/to/config.json
时,将根据您的 builders
配置部分自动创建一个虚拟机(如果您配置了多个,则创建多个虚拟机)。
构建虚拟机启动并运行后,将运行 provisioners
。这些是配置和供应您的机器的步骤。在下面的示例中,我们主要使用 shell
供应器,它将在虚拟机上运行命令来设置环境。
供应脚本完成后,虚拟机将自动停止,进行快照,并为您提供一个 ID,您可以在将来运行 dask-cloudprovider
时将其用作模板。
镜像要求¶
每个使用虚拟机的集群管理器对虚拟机镜像都有特定的要求。
例如,AWS ECSCluster
需要经过 ECS 优化的 AMI。
虚拟机集群管理器,如 EC2cluster
和 DropletCluster
,只需要安装 Docker(或对于 GPU 虚拟机类型,安装 NVIDIA Docker)。
示例¶
EC2Cluster
与 cloud-init¶
当任何基于 VMCluster
的集群管理器(例如 EC2Cluster
)启动一个新的默认虚拟机时,它会使用 Ubuntu 基础镜像并通过 cloud-init 安装所有依赖项。
与其每次都这样做,我们可以使用 Packer 只做一次,然后每次都重复使用该镜像。
每个 VMCluster
集群管理器都有一个名为 get_cloud_init
的类方法,它接受与创建对象本身相同的关键字参数,但返回将生成的 cloud-init 文件。
from dask_cloudprovider.aws import EC2Cluster
cloud_init_config = EC2Cluster.get_cloud_init(
# Pass any kwargs here you would normally pass to ``EC2Cluster``
)
print(cloud_init_config)
我们应该会看到类似这样的输出。
#cloud-config
packages:
- apt-transport-https
- ca-certificates
- curl
- gnupg-agent
- software-properties-common
# Enable ipv4 forwarding, required on CIS hardened machines
write_files:
- path: /etc/sysctl.d/enabled_ipv4_forwarding.conf
content: |
net.ipv4.conf.all.forwarding=1
# create the docker group
groups:
- docker
# Add default auto created user to docker group
system_info:
default_user:
groups: [docker]
runcmd:
# Install Docker
- curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
- add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
- apt-get update -y
- apt-get install -y docker-ce docker-ce-cli containerd.io
- systemctl start docker
- systemctl enable docker
# Run container
- docker run --net=host daskdev/dask:latest dask-scheduler --version
我们应该将此输出保存在某个地方以供将来参考。我们将其称为 /path/to/cloud-init-config.yaml
。
接下来我们需要一个 Packer 配置文件来构建我们的镜像,我们将其称为 /path/to/config.json
。我们将使用官方的 Ubuntu 20.04 镜像,并在 user_data_file
选项中指定我们的 cloud-init 配置文件。
Packer 不一定会等待 cloud-init 配置执行完成后才创建快照,因此我们需要添加一个供应器来阻塞直到 cloud-init 完成。
{
"builders": [
{
"type": "amazon-ebs",
"region": "eu-west-2",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*",
"root-device-type": "ebs"
},
"owners": [
"099720109477"
],
"most_recent": true
},
"instance_type": "t2.micro",
"ssh_username": "ubuntu",
"ami_name": "dask-cloudprovider {{timestamp}}",
"user_data_file": "/path/to/cloud-init-config.yaml"
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"echo 'Waiting for cloud-init'; while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 1; done; echo 'Done'"
]
}
]
}
然后我们可以使用 packer build /path/to/config.json
构建我们的镜像。
$ packer build /path/to/config.json
amazon-ebs: output will be in this color.
==> amazon-ebs: Prevalidating any provided VPC information
==> amazon-ebs: Prevalidating AMI Name: dask-cloudprovider 1600875672
amazon-ebs: Found Image ID: ami-062c2b6de9e9c54d3
==> amazon-ebs: Creating temporary keypair: packer_5f6b6c99-46b5-6002-3126-8dcb1696f969
==> amazon-ebs: Creating temporary security group for this instance: packer_5f6b6c9a-bd7d-8bb3-58a8-d983f0e95a96
==> amazon-ebs: Authorizing access to port 22 from [0.0.0.0/0] in the temporary security groups...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
amazon-ebs: Instance ID: i-0531483be973d60d8
==> amazon-ebs: Waiting for instance (i-0531483be973d60d8) to become ready...
==> amazon-ebs: Using ssh communicator to connect: 18.133.244.42
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Provisioning with shell script: /var/folders/0l/fmwbqvqn1tq96xf20rlz6xmm0000gp/T/packer-shell512450076
amazon-ebs: Waiting for cloud-init
amazon-ebs: Done
==> amazon-ebs: Stopping the source instance...
amazon-ebs: Stopping instance
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating AMI dask-cloudprovider 1600875672 from instance i-0531483be973d60d8
amazon-ebs: AMI: ami-064f8db7634d19647
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished after 4 minutes 5 seconds.
==> Wait completed after 4 minutes 5 seconds
==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
eu-west-2: ami-064f8db7634d19647
然后要使用我们的新镜像,我们可以创建一个 EC2Cluster
,指定 AMI 并禁用自动引导。
from dask.distributed import Client
from dask_cloudprovider.aws import EC2Cluster
cluster = EC2Cluster(
ami="ami-064f8db7634d19647", # AMI ID provided by Packer
bootstrap=False
)
cluster.scale(2)
client = Client(cluster)
# Your cluster is ready to use
EC2Cluster
与 RAPIDS¶
要在 AWS EC2 上启动 RAPIDS,我们可以选择一个 GPU 实例类型,选择 Amazon 提供的官方深度学习 AMI,并运行官方的 RAPIDS Docker 镜像。
from dask_cloudprovider.aws import EC2Cluster
cluster = EC2Cluster(
ami="ami-0c7c7d78f752f8f17", # Deep Learning AMI (this ID varies by region so find yours in the AWS Console)
docker_image="rapidsai/rapidsai:cuda10.1-runtime-ubuntu18.04-py3.9",
instance_type="p3.2xlarge",
bootstrap=False, # Docker is already installed on the Deep Learning AMI
filesystem_size=120,
)
cluster.scale(2)
然而,每次 EC2Cluster
创建虚拟机时,都需要从 Docker Hub 拉取 RAPIDS Docker 镜像。结果是上面的代码片段可能需要约 20 分钟才能运行完成,所以让我们创建自己的 AMI,其中已经拉取了 RAPIDS 镜像。
在我们的 builders 部分,我们将指定要在最新的深度学习 AMI 上构建,方法是指定 "Deep Learning AMI (Ubuntu 18.04) Version *"
来列出所有版本,并指定 "most_recent": true
来使用最新的版本。
我们还将所有者限制为 898082745236
,这是官方镜像频道的 ID。
官方镜像已经安装了 NVIDIA 驱动和 NVIDIA Docker 运行时,所以我们唯一需要做的就是拉取 RAPIDS Docker 镜像。这样,当调度器或工作节点虚拟机创建时,镜像就已经在机器上可用了。
{
"builders": [
{
"type": "amazon-ebs",
"region": "eu-west-2",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "Deep Learning AMI (Ubuntu 18.04) Version *",
"root-device-type": "ebs"
},
"owners": [
"898082745236"
],
"most_recent": true
},
"instance_type": "p3.2xlarge",
"ssh_username": "ubuntu",
"ami_name": "dask-cloudprovider-rapids {{timestamp}}"
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"docker pull rapidsai/rapidsai:cuda10.1-runtime-ubuntu18.04-py3.9"
]
}
]
}
然后我们可以使用 packer build /path/to/config.json
构建我们的镜像。
$ packer build /path/to/config.json
==> amazon-ebs: Prevalidating any provided VPC information
==> amazon-ebs: Prevalidating AMI Name: dask-cloudprovider-gpu 1600868638
amazon-ebs: Found Image ID: ami-0c7c7d78f752f8f17
==> amazon-ebs: Creating temporary keypair: packer_5f6b511e-d3a3-c607-559f-d466560cd23b
==> amazon-ebs: Creating temporary security group for this instance: packer_5f6b511f-8f62-cf98-ca54-5771f1423d2d
==> amazon-ebs: Authorizing access to port 22 from [0.0.0.0/0] in the temporary security groups...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
amazon-ebs: Instance ID: i-077f54ed4ae6bcc66
==> amazon-ebs: Waiting for instance (i-077f54ed4ae6bcc66) to become ready...
==> amazon-ebs: Using ssh communicator to connect: 52.56.96.165
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Provisioning with shell script: /var/folders/0l/fmwbqvqn1tq96xf20rlz6xmm0000gp/T/packer-shell376445833
amazon-ebs: Waiting for cloud-init
amazon-ebs: Bootstrap complete
==> amazon-ebs: Stopping the source instance...
amazon-ebs: Stopping instance
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating AMI dask-cloudprovider-gpu 1600868638 from instance i-077f54ed4ae6bcc66
amazon-ebs: AMI: ami-04e5539cb82859e69
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished after 20 minutes 35 seconds.
构建此镜像花费了 20 多分钟,但现在我们已经完成了一次,就可以在我们的 RAPIDS 驱动的 Dask 集群中重复使用该镜像了。
然后我们可以再次运行我们的代码片段,但这次启动运行中的集群所需时间将少于 5 分钟。
from dask.distributed import Client
from dask_cloudprovider.aws import EC2Cluster
cluster = EC2Cluster(
ami="ami-04e5539cb82859e69", # AMI ID provided by Packer
docker_image="rapidsai/rapidsai:cuda10.1-runtime-ubuntu18.04-py3.9",
instance_type="p3.2xlarge",
bootstrap=False,
filesystem_size=120,
)
cluster.scale(2)
client = Client(cluster)
# Your cluster is ready to use