Integrate with Hugging Face Transformers¶
Hugging Face Transformers provide general-purpose Machine Learning models for Natural Language Understanding (NLP). Transformers give you easy access to pre-trained model weights, and interoperability between PyTorch and TensorFlow.
Instrument Transformers with Comet to start managing experiments, create dataset versions and track hyperparameters for faster and easier reproducibility and collaboration.
| Comet SDK | Minimum SDK version | Minimum transformers version |
|---|---|---|
| Python-SDK | 3.31.5 | 4.43.0 |
Start logging¶
Connect Comet to your existing Transformers Trainer code by configuring it through environment variables.
Add the following lines of code to your script or notebook:
import comet_ml
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
# 1. Enable logging of model checkpoints
os.environ["COMET_LOG_ASSETS"] = "True"
# 2. Define your model
model = AutoModelForSequenceClassification.from_pretrained(
...
)
# 3. Train your model
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
Log automatically¶
By integrating with Transformers Trainer object, Comet automatically logs the following items, with no additional configuration:
- Metrics (such as loss and accuracy)
- Hyperparameters
- Assets (such as checkpoints and log files)
End-to-end example¶
Get started with a basic example of using Comet with the Transformers Trainer.
You can check out the results of this example Transformers experiment for a preview of what's to come.
Install dependencies¶
python -m pip install "comet_ml>=3.44.0" "transformers>=4.43.0" datasets torch scikit-learn accelerate
Run the example¶
import os
import comet_ml
# Enable logging of model checkpoints
os.environ["COMET_LOG_ASSETS"] = "True"
comet_ml.login(project_name="comet-example-transformers-trainer")
from datasets import load_dataset
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
DataCollatorWithPadding,
Trainer,
TrainingArguments,
)
PRE_TRAINED_MODEL_NAME = "distilbert/distilroberta-base"
raw_datasets = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(
PRE_TRAINED_MODEL_NAME, num_labels=2
)
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
def get_example(index):
return eval_dataset[index]["text"]
def compute_metrics(pred):
experiment = comet_ml.get_global_experiment()
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(
labels, preds, average="macro"
)
acc = accuracy_score(labels, preds)
if experiment:
epoch = int(experiment.curr_epoch) if experiment.curr_epoch is not None else 0
experiment.set_epoch(epoch)
experiment.log_confusion_matrix(
y_true=labels,
y_predicted=preds,
file_name=f"confusion-matrix-epoch-{epoch}.json",
labels=["negative", "postive"],
index_to_example_function=get_example,
)
return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(200))
eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(200))
training_args = TrainingArguments(
seed=42,
output_dir="./results",
overwrite_output_dir=True,
num_train_epochs=1,
do_train=True,
do_eval=True,
evaluation_strategy="steps",
eval_steps=25,
save_strategy="steps",
save_total_limit=10,
save_steps=25,
per_device_train_batch_size=8,
report_to=["comet_ml"],
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
data_collator=data_collator,
)
trainer.train()
Try it out!¶
Here's an example for using Comet with Transformers.
Configure Comet for Transformers¶
You can control which Transformers items are logged automatically, by setting the following environment variables:
export COMET_MODE=GET_OR_CREATE # Set to GET to always continue logging to an existing experiment or CREATE to always create a new experiment
export COMET_START_ONLINE=1 # Set to 0 to run an Offline Experiment
export COMET_LOG_ASSET=True # Set to False to disable logging model checkpoints
export COMET_PROJECT_NAME=<your project name> # Configure your project name
export COMET_OFFLINE_DIRECTORY=<path to offline directory> # Folder to use for saving offline experiments when `COMET_MODE` is "OFFLINE"
Note
Previous versions of the Transformers integration were using COMET_MODE to control whether to create an online or offline Experiment. You should now use the COMET_START_ONLINE to control whether to create an online or offline Experiment.
For more information about using environment parameters in Comet, see Configure Comet.