The following text has been translated from Korean to English using AssistAce.
Sungjin (James) Kim, Ph.D. | LinkedIn
The technology of Large Language Models (LLMs) is advancing rapidly. There are various ways to utilize LLMs, including prompting, embedding, and fine-tuning. In this article, we will focus on fine-tuning, which requires a significant amount of GPU computing resources.
While it is possible to secure high-performance computing resources on devices, using cloud environments offers the advantage of reducing complexity. In line with this, there are many specialized cloud services that make AI training and inference more convenient. Vessl is one such service that considers not only LLMs, but also generative AI as a fundamental part[1]. In this article, we will explore a case study of fine-tuning the M2M100 model, one of the LLMs, using the Vessl hub[2,3].
M2M100 is a transformer-based LLM developed by Meta. It consists of an encoder and a decoder, and supports the translation of 100 languages, enabling multi-language translation.
The instructions for using Vessl are explained on the relevant website [1]. You can access Vessl through the Vessl Hub and use it with the Vessl command-line tool. In this guide, we will focus on how to fine-tune translation models using Vessl, rather than the basic usage of Vessl.
Vessl Hub provides support for Jupyter as the default coding environment. Jupyter is a popular coding environment among AI developers. To create an AI environment with Jupyter, you can use the following commands:
$ poetry shell
$ cd vessl_use
$ vessl run create -f jupyter-notebook.yaml
Since Vessl is installed as a Python command, we have converted it into an environment that can use Vessl using Poetry. Also, since Poetry stores user code in a subfolder, we changed the folder to vessl_use
. The Vessl command can be executed with the run
command. By using create
, you can create an environment that uses Vessl's cloud service according to the instructions in the given yaml file.
The following is a sample yaml file for Vessl that sets up an environment where Jupyter can be used interactively. This file is provided as one example of how to use Vessl on the Vessl website.
name: gpu-interactive-run
description: Run an interactive GPU-backed Jupyter and SSH server.
tags:
- interactive
- jupyter
- ssh
resources:
cluster: vessl-gcp-oregon
preset: gpu-l4-small-spot
image: quay.io/vessl-ai/torch:2.1.0-cuda12.2-r3
interactive:
max_runtime: 8h
jupyter:
idle_timeout: 120m
In the resources section, we are using the cluster available in vessl-gcp-oregon
, and the preset corresponds to the machine specification gpu-l4-small-spot
. We chose an image provided by Vessl that is based on Nvidia Cuda and Pytorch for production use. Regarding the interactive environment, the maximum runtime is set to 8 hours, and the maximum idle time allowed by Jupyter is set to 120 minutes.