Friday, March 22, 2024

How I Leveraged Open Supply LLMs to Obtain Large Financial savings on a Massive Compute Challenge | by Ryan Shrott | Aug, 2023

Must read

Unlocking Value-Effectivity in Massive Compute Tasks with Open Supply LLMs and GPU Leases.

Towards Data Science
Photograph by Alexander Gray on Unsplash


On this planet of enormous language fashions (LLMs), the price of computation generally is a important barrier, particularly for intensive tasks. I just lately launched into a venture that required operating 4,000,000 prompts with a mean enter size of 1000 tokens and a mean output size of 200 tokens. That’s almost 5 billion tokens! The normal method of paying per token, as is frequent with fashions like GPT-3.5 and GPT-4, would have resulted in a hefty invoice. Nonetheless, I found that by leveraging open supply LLMs, I might shift the pricing mannequin to pay per hour of compute time, resulting in substantial financial savings. This text will element the approaches I took and evaluate and distinction every of them. Please observe that whereas I share my expertise with pricing, these are topic to alter and should range relying in your area and particular circumstances. The important thing takeaway right here is the potential value financial savings when leveraging open supply LLMs and renting a GPU per hour, reasonably than the particular costs quoted. For those who plan on using my beneficial options in your venture, I’ve left a few affiliate hyperlinks on the finish of this text.


I performed an preliminary check utilizing GPT-3.5 and GPT-4 on a small subset of my immediate enter information. Each fashions demonstrated commendable efficiency, however GPT-4 persistently outperformed GPT-3.5 in a majority of the instances. To offer you a way of the associated fee, operating all 4 million prompts utilizing the Open AI API would look one thing like this:

Complete value of operating 4mm prompts with enter size of 1000 tokens and 200 token output size

Whereas GPT-4 did supply some efficiency advantages, the associated fee was disproportionately excessive in comparison with the incremental efficiency it added to my outputs. Conversely, GPT-3.5 Turbo, though extra reasonably priced, fell brief when it comes to efficiency, making noticeable errors on 2–3% of my immediate inputs. Given these elements, I wasn’t ready to take a position $7,600 on a venture that was…

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article