To assist organizations scale their AI utilization with out over-extending their budgets, we’ve added two new methods to scale back prices on constant and asynchronous workloads:
- Discounted utilization on dedicated throughput: Prospects with a sustained degree of tokens per minute (TPM) utilization on GPT-4 or GPT-4 Turbo can request entry to provisioned throughput to get reductions starting from 10–50% primarily based on the scale of the dedication.
- Diminished prices on asynchronous workloads: Prospects can use our new Batch API to run non-urgent workloads asynchronously. Batch API requests are priced at 50% off shared costs, supply a lot increased charge limits, and return outcomes inside 24 hours. That is superb to be used instances like mannequin analysis, offline classification, summarization, and artificial knowledge technology.
We plan to maintain including new options targeted on enterprise-grade safety, administrative controls, and value administration. For extra info on these launches, go to our API documentation or get in contact with our crew to debate customized options in your enterprise.