Build and Push
Before you create the Deployment, you need to first build the inference server and push it to an OCI-compliant registry (e.g. Docker Hub, GitHub Container Registry, etc.).
You could also use our pre-built Templates directly. These templates could help simplify and expedite the deployment process.
Build
Currently we support four inference server frameworks:
- Mosec: a high-performance and flexible model serving framework for building ML model-enabled backend and microservices.
- Streamlit: a framework for building ML model-enabled web apps.
- Gradio: a simple and flexible framework for building ML model-enabled web apps.
- Other: you could also use your own frameworks to deploy your models.
Here we take Mosec as an example to show how to build a Docker image for the Stable Diffusion.
Building an inference server based on our inference framework Mosec could be straightforward. You will need to provide three key components:
- A
main.py
file: This file contains the code for making predictions. - A
requirements.txt
file: This file lists all the dependencies required for the server code to run. - A
Dockerfile
or a simplerbuild.envd
(opens in a new tab): This file contains instructions for building a Docker image that encapsulates the server code and its dependencies.
Here is an template modelz-template-stable-diffusion (opens in a new tab):
main.py
In the main.py
file, you need to define a class that inherits from mosec.Worker
and implements the forward
method. The forward
method takes a list of inputs and returns a list of outputs with dynamic batching. You could get more details from the Mosec page.
from io import BytesIO
from typing import List
import torch # type: ignore
from diffusers import StableDiffusionPipeline # type: ignore
from mosec import Server, Worker, get_logger
from mosec.mixin import MsgpackMixin
logger = get_logger()
class StableDiffusion(MsgpackMixin, Worker):
def __init__(self):
self.pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
)
device = "cuda" if torch.cuda.is_available() else "cpu"
self.pipe = self.pipe.to(device)
self.example = ["useless example prompt"] * 4 # warmup (bs=4)
def forward(self, data: List[str]) -> List[memoryview]:
logger.debug("generate images for %s", data)
res = self.pipe(data)
logger.debug("NSFW: %s", res[1])
images = []
for img in res[0]:
dummy_file = BytesIO()
img.save(dummy_file, format="JPEG")
images.append(dummy_file.getbuffer())
return images
if __name__ == "__main__":
server = Server()
server.append_worker(StableDiffusion, num=1, max_batch_size=4)
server.run()
requirements.txt
In the requirements.txt
file, you need to list all the dependencies required for the server code to run.
msgpack
mosec
torch --extra-index-url https://download.pytorch.org/whl/cu116
diffusers[torch]
transformers
accelerate
Dockerfile
In the Dockerfile
, you need to define the instructions for building a Docker image that encapsulates the server code and its dependencies.
In most cases, you could use the following template:
ARG base=nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
FROM ${base}
ENV DEBIAN_FRONTEND=noninteractive LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
ENV PATH /opt/conda/bin:$PATH
ARG CONDA_VERSION=py310_22.11.1-1
RUN set -x && \
UNAME_M="$(uname -m)" && \
if [ "${UNAME_M}" = "x86_64" ]; then \
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-x86_64.sh"; \
SHA256SUM="00938c3534750a0e4069499baf8f4e6dc1c2e471c86a59caa0dd03f4a9269db6"; \
elif [ "${UNAME_M}" = "s390x" ]; then \
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-s390x.sh"; \
SHA256SUM="a150511e7fd19d07b770f278fb5dd2df4bc24a8f55f06d6274774f209a36c766"; \
elif [ "${UNAME_M}" = "aarch64" ]; then \
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-aarch64.sh"; \
SHA256SUM="48a96df9ff56f7421b6dd7f9f71d548023847ba918c3826059918c08326c2017"; \
elif [ "${UNAME_M}" = "ppc64le" ]; then \
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-ppc64le.sh"; \
SHA256SUM="4c86c3383bb27b44f7059336c3a46c34922df42824577b93eadecefbf7423836"; \
fi && \
wget "${MINICONDA_URL}" -O miniconda.sh -q && \
echo "${SHA256SUM} miniconda.sh" > shasum && \
if [ "${CONDA_VERSION}" != "latest" ]; then sha256sum --check --status shasum; fi && \
mkdir -p /opt && \
bash miniconda.sh -b -p /opt/conda && \
rm miniconda.sh shasum && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
RUN conda create -n envd python=3.9
ENV ENVD_PREFIX=/opt/conda/envs/envd/bin
RUN update-alternatives --install /usr/bin/python python ${ENVD_PREFIX}/python 1 && \
update-alternatives --install /usr/bin/python3 python3 ${ENVD_PREFIX}/python3 1 && \
update-alternatives --install /usr/bin/pip pip ${ENVD_PREFIX}/pip 1 && \
update-alternatives --install /usr/bin/pip3 pip3 ${ENVD_PREFIX}/pip3 1
COPY requirements.txt /
RUN pip install -r requirements.txt
RUN mkdir -p /workspace
COPY main.py workspace/
WORKDIR workspace
RUN python main.py --dry-run
ENTRYPOINT [ "python", "main.py" ]
build.envd
On the other hand, a build.envd
(opens in a new tab) is a simplified alternative to a Dockerfile. It provides python-based interfaces that contains configuration settings for building a image.
It is easier to use than a Dockerfile as it involves specifying only the dependencies of your machine learning model, not the instructions for CUDA, Python, and other system-level dependencies.
# syntax=v1
def basic():
install.cuda(version="11.6.2")
install.python()
install.python_packages(requirements="requirements.txt")
def build():
basic()
io.copy("main.py", "/")
run(["python main.py --dry-run"])
config.entrypoint(["python", "main.py"])
Pushing the image to the registry
After building the image, you can push it to the registry.
docker push <your-image-name>:<your-image-tag>
# or
envd build --output type=image,name=<your-image-name>:<your-image-tag>,push=true
Then, you could deploy the inference server to the ModelZ.