Optimizing Model Load Times on EC2 with Pre-loaded AMIs

Recently, I encountered a challenge while working with machine learning models on Amazon EC2. To speed up the initial loading of a pre-trained model, I decided to create an Amazon Machine Image (AMI) with the model already pre-loaded. However, I discovered that the first inference run after instance startup took over ten times longer than usual. Subsequent inferences ran at normal speed.

Inference Program Overview

Here’s a simplified version of the inference program I used:

model = Pipeline.from_pretrained(#some options,,,)
model = model.to(device) # This line takes an unusually long time during the first run

Upon investigating the program logs, I found that the model.to(device) line, which transfers the model to the device, was the main bottleneck. Using dstat, I monitored system resource usage and noticed that disk I/O performance was significantly slower during the initial run.

Identifying the Issue

I discovered that when using EBS volumes on EC2, the file system fetches files from S3 only when they are accessed for the first time. This behavior causes the initial I/O operations to be slow, as described in the AWS documentation:

Empty EBS volumes receive their maximum performance the moment that they are created and do not require initialization (formerly known as pre-warming). For volumes, of any volume type, that were created from snapshots, the storage blocks must be pulled down from Amazon S3 and written to the volume before you can access them. This preliminary action takes time and can cause a significant increase in the latency of I/O operations the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.

Implementing the Solution

To address this, I wrote a script to read the model files using the fio command at instance startup. fio is a flexible I/O tester with more options than dd, and it supports asynchronous I/O, making it faster for reading files.

Here’s the fio command I used:

fio --name="read_test" --filename="$file" --rw=read --bs=1M --ioengine=libaio --iodepth=32 --direct=1

Alternative Solutions

Another potential solution is to use EBS fast snapshot restore, which preloads snapshots and eliminates the initial load time overhead. However, this option incurs additional costs, so I opted not to use it in this case.

Conclusion

Instances created from EBS volumes with pre-loaded models experience slow initial model loads due to the S3-to-EBS fetch process. This issue is more pronounced with larger model files.
Using fio to read model files at startup mitigates this problem by preloading data, significantly reducing initial load times.
Alternative solutions include using fast snapshot restore or other high-performance distributed storage solutions like Parallelstore or DAOS.

Inference Program Overview

Identifying the Issue

Implementing the Solution

Alternative Solutions

Conclusion

References