Friday, September 20, 2024

From MOCO v1 to v3: In the direction of Constructing a Dynamic Dictionary for Self-Supervised Studying — Half 1 | by Mengliu Zhao | Jul, 2024

Must read


A mild recap on the momentum distinction studying framework

Towards Data Science

Have we reached the period of self-supervised studying?

Information is flowing in day-after-day. Individuals are working 24/7. Jobs are distributed to each nook of the world. However nonetheless, a lot knowledge is left unannotated, ready for the doable use by a brand new mannequin, a brand new coaching, or a brand new improve.

Or, it’ll by no means occur. It’s going to by no means occur when the world is working in a supervised trend.

The rise of self-supervised studying lately has unveiled a brand new path. As an alternative of making annotations for all duties, self-supervised studying breaks duties into pretext/pre-training (see my earlier put up on pre-training right here) duties and downstream duties. The pretext duties concentrate on extracting consultant options from the entire dataset with out the steering of any floor reality annotations. Nonetheless, this job requires labels generated mechanically from the dataset, often by in depth knowledge augmentation. Therefore, we use the terminologies unsupervised studying (dataset is unannotated) and self-supervised studying (duties are supervised by self-generated labels) interchangeably on this article.

Contrastive studying is a significant class of self-supervised studying. It makes use of unlabelled datasets and contrastive information-encoded losses (e.g., contrastive loss, InfoNCE loss, triplet loss, and so on.) to coach the deep studying community. Main contrastive studying consists of SimCLR, SimSiam, and the MOCO sequence.

MOCO — the phrase is an abbreviation for “momentum distinction.” The core thought was written within the first MOCO paper, suggesting the understanding of a pc imaginative and prescient self-supervised studying drawback, as follows:

“[quote from original paper] Pc imaginative and prescient, in distinction, additional issues dictionary constructing, because the uncooked sign is in a steady, high-dimensional house and isn’t structured for human communication… Although pushed by varied motivations, these (be aware: current visible illustration studying) strategies will be regarded as constructing dynamic dictionaries… Unsupervised studying trains encoders to carry out dictionary look-up: an encoded ‘question’ ought to be much like its matching key and dissimilar to others. Studying is formulated as minimizing a contrastive loss.”

On this article, we’ll do a mild evaluate of MOCO v1 to v3:

  • v1 — the paper “Momentum distinction for unsupervised visible illustration studying” was printed in CVPR 2020. The paper proposes a momentum replace to key ResNet encoders utilizing pattern queues with InfoNCE loss.
  • v2 — the paper “ Improved baselines with momentum contrastive studying” got here out instantly after, implementing two SimCLR structure enhancements: a) changing the FC layer with a 2-layer MLP and b) extending the unique knowledge augmentation by together with blur.
  • v3 — the paper “An empirical examine of coaching self-supervised imaginative and prescient transformers” was printed in ICCV 2021. The framework extends the key-query pair to 2 key-query pairs, which had been used to type a SimSiam-style symmetric contrastive loss. The spine additionally obtained prolonged from ResNet-only to each ResNet and ViT.
Picture supply: https://pxhere.com/en/picture/760197



Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article