Monday, March 18, 2024

Constructing a Conformal Chatbot in Julia | by Patrick Altmeyer | Jul, 2023

Must read

Conformal Prediction, LLMs and HuggingFace — Half 1

Towards Data Science

Massive Language Fashions (LLM) are all the excitement proper now. They’re used for quite a lot of duties, together with textual content classification, query answering, and textual content technology. On this tutorial, we’ll present methods to conformalize a transformer language mannequin for textual content classification utilizing ConformalPrediction.jl.

Specifically, we have an interest within the activity of intent classification as illustrated within the sketch under. Firstly, we feed a buyer question into an LLM to generate embeddings. Subsequent, we prepare a classifier to match these embeddings to potential intents. In fact, for this supervised studying downside we want coaching knowledge consisting of inputs — queries — and outputs — labels indicating the true intent. Lastly, we apply Conformal Predition to quantify the predictive uncertainty of our classifier.

Conformal Prediction (CP) is a quickly rising methodology for Predictive Uncertainty Quantification. In case you’re unfamiliar with CP, chances are you’ll wish to first take a look at my 3-part introductory collection on the subject beginning with this put up.

Excessive-level overview of a conformalized intent classifier. Picture by creator.

We’ll use the Banking77 dataset (Casanueva et al., 2020), which consists of 13,083 queries from 77 intents associated to banking. On the mannequin facet, we’ll use the DistilRoBERTa mannequin, which is a distilled model of RoBERTa (Liu et al., 2019) fine-tuned on the Banking77 dataset.

The mannequin might be loaded from HF straight into our operating Julia session utilizing the Transformers.jl bundle.

This bundle makes working with HF fashions remarkably simple in Julia. Kudos to the devs! 🙏

Beneath we load the tokenizer tkr and the mannequin mod. The tokenizer is used to transform the textual content right into a sequence of integers, which is then fed into the mannequin. The mannequin outputs a hidden state, which is then fed right into a classifier to get the logits for every class. Lastly, the logits are then handed by a softmax perform to get the corresponding predicted possibilities. Beneath we run just a few queries by the mannequin to see the way it performs.

# Load mannequin from HF 🤗:
tkr =…

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article