Wednesday, March 6, 2024

Google’s Newest Approaches to Multimodal Foundational Mannequin | by Eileen Pangu | Aug, 2023

Must read

Multimodal foundational fashions are much more thrilling than massive language fashions. Let’s assessment Google analysis’s latest progress to have a glimpse of the bleeding edge.

Towards Data Science
Picture supply:


Whereas the hype on massive language mannequin (LLM) remains to be iron sizzling within the business, the main analysis organizations have turned their eyes to multimodal foundational fashions — fashions which have the identical scale and flexibility traits as LLM however can deal with knowledge past simply textual content, comparable to photographs, audio, sensor alerts, and so forth. Multimodal foundational fashions are believed by many to be the important thing to unlock the subsequent part of Synthetic Intelligence (AI) advance.

On this weblog publish, we take a better take a look at how Google approaches multimodal foundational fashions. The content material lined on this weblog publish is drawn from the important thing strategies and insights of Google’s latest papers, for which we offer references on the finish of this text.

Why Ought to You Care

Multimodal foundational fashions are thrilling, however why must you care? You might be:

  • an AI/ML practitioner who needs to meet up with the newest analysis growth of the sphere, however you don’t have the persistence to undergo dozens of latest papers and a whole bunch of pages of surveys.
  • a present or rising business chief who’s questioning what’s subsequent after massive language fashions, and is considering how one can align your enterprise with the brand new traits within the tech world.
  • a curious reader who might find yourself being the patron of present or future multimodal AI merchandise, and desires to get a visible and intuitive understanding of how issues work behind the scenes.

For all of the above audiences, this text will present an excellent overview to jump-start your understanding of multimodal foundational fashions, which is a nook stone for future extra accessible and useful AI.

Another factor to notice earlier than we dive in: when individuals speak about multimodal foundational fashions, they usually imply the enter is multimodal, consisting of textual content, photographs, movies, alerts, and many others. The output, nevertheless, is at all times simply textual content. The…

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article