How GPUse Billing Works

You only pay for actual GPU usage - not for provisioning, model loading, or idle time.

No Warmup Charges

Model loading is FREE

(~2-5 min)

Per-Second Billing

$0.0002028/second

(~$0.73/hr)

Auto Scale-to-Zero

Stops billing when idle

(after 5-15 min no requests)

When Billing Starts and Stops

1

PROVISIONING

FREE (~2-5 min)

  • Container starts and downloads model weights
  • Model loads into GPU memory
  • Health checks verify readiness

Billing: $0.00

2

FIRST REQUEST

BILLING STARTS

  • First HTTP request hits endpoint
  • Billing activates immediately
  • Actual work begins

Billing: $0.0002028/sec

3

ACTIVE USAGE

BILLING CONTINUES

  • Workload is processing
  • Instance stays warm for fast responses
  • Charges continue during pauses

Billing: $0.0002028/sec

4

SCALING DOWN

BILLING CONTINUES

  • No requests for 5-15 minutes
  • Instance scales to zero
  • Billing stops when scaled to zero

Billing: $0.0002028/sec

What You're Actually Paying For

Serverless GPU Instance

ComponentPer SecondPer HourPer Day (8 hrs)
GPUse Total Rate$0.0002028~$0.73~$5.84

Billing Granularity

  • Minimum: 100 milliseconds (0.1 seconds)
  • Increment: Per-second billing
  • Instance Minimum: 1 minute from startup to termination
  • Grace Period: 5 minutes FREE per project (before any billing)

What Does This Actually Cost?

Quick Inference Job

Transcribe a 30-minute podcast with Whisper

  • Provisioning: 3 minutes (FREE)
  • Processing: 5 minutes ($0.06)
  • $0.0002028 × 300 sec = $0.06
  • Scaling down: 5-15 min

Total: ~$0.24

Development Session

Test and iterate on a model deployment

  • Provisioning: 4 minutes (FREE)
  • Active development: 2 hours
  • $0.73/hr × 2 = $1.46
  • Scaling down: 5-15 min

Total: ~$1.64

Production Batch Job

Process 1000 images with SDXL

  • Provisioning: 5 minutes (FREE)
  • Processing: 8 hours continuous
  • $0.73/hr × 8 = $5.84
  • Scaling down: 5-15 min

Total: ~$6.02

GPUse vs Traditional GPU Providers

FeatureTraditional ProvidersGPUse
Charged for provisioning[✓] Yes (you pay for setup)[✗] No (FREE warmup period)
Charged for model loading[✓] Yes (~5-10 min)[✗] No (FREE until first request)
Billing startsAt deploymentAt first request
Billing granularityPer hour (rounded up)Per second
Scale to zeroManual or not availableAutomatic (5-15 min idle)
Minimum charge1 hour1 minute
Cold start timeChargedFREE
Idle time costFull rateAuto scales to zero

Bottom Line: Traditional providers charge ~$0.84-1.20/hr from deployment.
GPUse charges $0.73/hr ONLY when you're using the GPU.

You Stop Paying When...

Billing Stops When

  • Instance completes scaling to zero
  • You manually stop the instance via stop_compute
  • Grace period expires without checkout

Billing Continues During

  • Active processing and requests
  • Scale-to-zero period (5-15 min after last request)

Frequently Asked Questions

How does billing differ from traditional providers?

Traditional providers charge from the moment you click "deploy". GPUse charges from your first request - meaning provisioning and model loading are FREE.

Why do you charge during the scale-to-zero period?

Serverless GPUs stay warm for a period of time waiting for follow-up requests, which prevents 2-5 minute delays on every request. The instance will scale to zero within 5-15 minutes to save you paying for compute time that you're not using.

What if I forget to stop an instance?

The instance will automatically scale to zero after 5-15 minutes of no requests. You continue to be billed during this scale-down period, and billing stops once the instance becomes fully idle. You can also manually call stop_compute to terminate immediately and avoid the scale-down charges.

How accurate is the scale-to-zero timeline?

It can vary from 5-15 minutes depending on traffic patterns. GPUse does not control this timeline - it's managed by the underlying infrastructure. Scale-to-zero consistently occurs within this range, often completing in a shorter timeline.

Do I pay for failed requests or errors?

Yes - billing is based on instance runtime, not successful requests. If your workload fails, you still pay for the GPU time used. This is why the grace period is valuable for testing.

What happens if I run out of credits?

Your instances will continue running until they scale to zero or are manually stopped. You'll receive low-balance warnings via the MCP tools and dashboard.

Is the grace period per account or per project?

Per account. Use unique project_id values to get 5 minutes FREE for each account.

Why is provisioning FREE if it uses cloud resources?

We believe you should only pay for actual work. Model loading and container startup are infrastructure overhead, not your compute work. This policy sets GPUse apart from traditional providers.

How do I view my billing costs?

We provide detailed billing per GPU instance. Log in to your dashboard to view usage history showing each instance with timestamps and costs. You can filter by active instances or view terminated instances in the history tab. Each entry shows the exact runtime and cost for that GPU instance.

Why do I see irregular billing intervals in my usage history?

Our system checks your GPU usage every 30 seconds, but each check takes a few seconds to complete. This means billing intervals appear slightly irregular in your usage history—you might see entries at 33, 35, or 38-second intervals instead of exactly 30 seconds.

You are always billed for the exact time your GPU was running. Note that billing continues during the scale-to-zero period (up to 15 minutes after your last request) until the instance fully terminates.

Start with 5 Minutes FREE

Test GPUse risk-free with our grace period. No credit card required.

Your agent provisions a GPU, completes the work, then surfaces the checkout link only if you want to continue.

Get Started

Questions about billing? Email support@gpuse.com or log in to view detailed cost breakdowns for each GPU instance via your dashboard.