Wednesday, April 3, 2024

AI-Generated Artificial Knowledge. Defined the very best method: with… | by Cassie Kozyrkov | Jul, 2023

Must read

Defined the very best method: with cats!

Towards Data Science

Why is AI-generated artificial information all the fashion as of late? On this article, I’ll clarify my favourite method: with cats!

Let’s say I need to practice a cat-not-cat classifier from scratch, however I solely have one picture to work with:

The creator’s cat, Huxley.

(The whole lot that follows is an analogy for what folks do with tabular information and textual content information, so it applies past picture information.)

Ideally, I’m going to want a dataset consisting of hundreds of cat and not-cat photographs. If I’ve a digicam and plentiful entry to cats, I can take a bunch of photographs just like the one I have already got, guaranteeing that I get precisely the dataset I designed:

A photograph I took in a park in Istanbul.

However what if I don’t have a digicam and I stay catless on the moon? I might get the pictures I want from a vendor, although I should watch out since inherited information is extra harmful than major information.

Thanks, Pixabay, for being a superb (free) vendor of cat photographs.

However what if there’s no vendor who’ll promote me some cat photographs? (Sure, operating out of cat photographs on the web is a scenario that’s extra sci-fi than residing on the moon, however bear with me.)

Nicely, if I can’t acquire them and I can’t purchase them, then I’ll should make them myself. Behold, my creation:

Your creator is a veritable Michelangelo.

No good? Yeah, drawing was by no means my robust go well with. One other option to make faux information is to repeat present datapoints, besides this isn’t going to be a lot use for offering tutorial selection.

This method fools nobody. I’ve nonetheless solely successfully acquired one datapoint.

It’ll be like instructing a human scholar by giving them the identical instance over and over, so all they be taught is that one factor. If my dataset is 30,000 copies of this Huxley picture…

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article