Optimizing Model Load Times on EC2 with Pre-loaded AMIs
/ 3 min read
Recently, I encountered a challenge while working with machine learning models on Amazon EC2. To speed up the initial loading of a pre-trained model, I decided to create an Amazon Machine Image (AMI) with the model already pre-loaded. However, I discovered that the first inference run after instance startup took over ten times longer than usual. Subsequent inferences ran at normal speed.
Inference Program Overview
Here’s a simplified version of the inference program I used:
model = Pipeline.from_pretrained(#some options,,,)
model = model.to(device) # This line takes an unusually long time during the first run
Upon investigating the program logs, I found that the model.to(device)
line, which transfers the model to the device, was the main bottleneck. Using dstat
, I monitored system resource usage and noticed that disk I/O performance was significantly slower during the initial run.
Identifying the Issue
I discovered that when using EBS volumes on EC2, the file system fetches files from S3 only when they are accessed for the first time. This behavior causes the initial I/O operations to be slow, as described in the AWS documentation:
Empty EBS volumes receive their maximum performance the moment that they are created and do not require initialization (formerly known as pre-warming). For volumes, of any volume type, that were created from snapshots, the storage blocks must be pulled down from Amazon S3 and written to the volume before you can access them. This preliminary action takes time and can cause a significant increase in the latency of I/O operations the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.
Implementing the Solution
To address this, I wrote a script to read the model files using the fio
command at instance startup. fio
is a flexible I/O tester with more options than dd
, and it supports asynchronous I/O, making it faster for reading files.
Here’s the fio
command I used:
fio --name="read_test" --filename="$file" --rw=read --bs=1M --ioengine=libaio --iodepth=32 --direct=1
Alternative Solutions
Another potential solution is to use EBS fast snapshot restore, which preloads snapshots and eliminates the initial load time overhead. However, this option incurs additional costs, so I opted not to use it in this case.
Conclusion
- Instances created from EBS volumes with pre-loaded models experience slow initial model loads due to the S3-to-EBS fetch process. This issue is more pronounced with larger model files.
- Using
fio
to read model files at startup mitigates this problem by preloading data, significantly reducing initial load times. - Alternative solutions include using fast snapshot restore or other high-performance distributed storage solutions like Parallelstore or DAOS.